0% found this document useful (0 votes)
5 views

Hypothesis Testing

Hypothesis testing is a statistical method used to draw conclusions about a population based on sample data, primarily to determine if a treatment has an effect. The process involves stating null and alternative hypotheses, setting criteria for decision-making, collecting data, and making a conclusion based on statistical analysis. Errors in hypothesis testing can occur, such as Type I errors (false positives) and Type II errors (false negatives), which can mislead scientific understanding and literature.

Uploaded by

Kainat Munir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Hypothesis Testing

Hypothesis testing is a statistical method used to draw conclusions about a population based on sample data, primarily to determine if a treatment has an effect. The process involves stating null and alternative hypotheses, setting criteria for decision-making, collecting data, and making a conclusion based on statistical analysis. Errors in hypothesis testing can occur, such as Type I errors (false positives) and Type II errors (false negatives), which can mislead scientific understanding and literature.

Uploaded by

Kainat Munir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

INTRODUCTION TO

HYPOTHSIS TESTING

Instructor: Dr. Irum Naqvi


Hypothesis testing is a statistical procedure
that allows researchers to use sample data
to draw inferences about the population of
interest.
The general goal of a hypothesis test is to
rule out chance (sampling error) as a
plausible explanation for the results from a
research study.
Hypothesis testing is a technique to help
determine whether a specific treatment has
an effect on the individuals in a population.
Figure 8.2 shows that the researcher begins
with a known population. This is the set of
individuals as they exist before treatment.
For this example, we are assuming that the
original set of scores forms a normal
distribution with μ = 80 and σ = 20.
The purpose of the research is to determine
the effect of a treatment on the individuals
in the population.
That is, the goal is to determine what
happens to the population after the
treatment is administered.
You should recall (from Chapters of mean and
standard deviation that adding or subtracting) a
constant changes the mean but does not change
the shape of the population, nor does it change
the standard deviation.
Thus, we assume that the population after
treatment has the same shape as the original
population and the same standard deviation as
the original population. This assumption is
incorporated into the situation shown in Figure
8.2.
Note that the unknown population, after
treatment, is the focus of the research question.
Specifically, the purpose of the research is to
determine what would happen if the treatment
The goal of the hypothesis test is to
determine whether the treatment has any
effect on the individuals in the population
(see Figure 8.2).
Usually, however, we cannot administer the
treatment to the entire population so the
actual research study is conducted using a
sample.
The following example provides a concrete
foundation for introducing the hypothesis-
testing procedure.
Example
Previous research indicates that men rate
women as being more attractive when they are
wearing red (Elliot & Niesta, 2008).
Based on these results, Gueguen and Jacob
(2012) reasoned that the same phenomenon
might influence the way that men react to
waitresses wearing red. In their study,
waitresses in five different restaurants wore the
same T-shirt in six different colors (red, blue,
green, yellow, black, and white) on different
days during a six-week period.
Except for the T-shirts, the waitresses were
instructed to act normally and to record each
A researcher decided to test this result by
repeating the basic study at a local restaurant.
Waitresses (and waiters) at the restaurant
routinely wear white shirts with black pants
and restaurant records indicate that the
waitress’ tips from male customers average μ
= 15.8 percent with a standard deviation of σ
= 2.4 percentage points.
The distribution of tip amounts is roughly
normal. During the study, the waitresses are
asked to wear red shirts and the researcher
plans to record tips for a sample of n = 36
male customers.
If the mean tip for the sample is noticeably
different from the baseline mean, the
researcher can conclude that wearing the
color red does appear to have an effect on
tipping.
On the other hand, if the sample mean is
still around 15.8 percent (the same as the
baseline), the researcher must conclude
that the red shirt does not appear to have
any effect.
To emphasize the formal structure of a
hypothesis test, we will present hypothesis
testing as a four-step process that is used
STEP 1: STATE THE
HYPOTHESIS
As the name implies, the process of hypothesis
testing begins by stating a hypothesis about the
unknown population. Actually, we state two
opposing hypotheses.
The first and most important null hypothesis
states that the treatment has no effect. In
general, the null hypothesis states that there is
no change, no effect, no difference—nothing
happened, hence the name null.
The null hypothesis is identified by the symbol
H0. (The H stands for hypothesis, and the zero
subscript indicates that this is the zero-effect
hypothesis.)
For the study in Example, the null hypothesis
states that the red shirt has no effect on tipping
behavior for the population of male customers. In
symbols, this hypothesis is
H0: μ (with red shirt) = 15.8 (Even with a red shirt,
the mean tip is still 15.8 percent.)
The alternative hypothesis (H1) states that
there is a change, a difference, or a relationship for
the general population. In the context of an
experiment, H1 predicts that the independent
variable (treatment) does have an effect on the
dependent variable.
For this example, the alternative hypothesis states
that the red shirt does have an effect on tipping for
the population and will cause a change in the
mean score. In symbols, the alternative hypothesis
Notice that the alternative hypothesis simply
states that there will be some type of
change.
It does not specify whether the effect will be
increased or decreased tips. In some
circumstances, it is appropriate for the
alternative hypothesis to specify the
direction of the effect.
For example, the researcher might
hypothesize that a red shirt will increase tips
(μ > 15.8). For now we concentrate on non
directional tests, for which the hypotheses
simply state that the treatment has no effect
STEP 2: Set the criteria for
a decision
To formalize the decision process, we
determine exactly which sample means are
consistent with the null hypothesis and which
sample means are at odds with the null
hypothesis.
For our example, the null hypothesis states
that the red shirts have no effect and the
population mean is still μ = 15.8 percent, the
same as the population mean for waitresses
wearing white shirts. If this is true, then the
sample mean should have a value around
15.8. Therefore, a sample mean near 15.8 is
consistent with the null hypothesis.
On the other hand, a sample mean that is very
different from 15.8 is not consistent with the null
hypothesis.
To determine exactly which values are “near”
15.8 and which values are “very different from”
15.8, we will examine all of the possible sample
means that could be obtained if the null
hypothesis is true. For our example, this is the
distribution of sample means for n = 36.
According to the null hypothesis, this distribution
is centered at μ = 15.8. The distribution of sample
means is then divided into two sections.
1. Sample means that are likely to be obtained if
H0 is true; that is, sample means that are close to
the null hypothesis
2. Sample means that are very unlikely to be
Figure 8.4 shows the distribution of sample
means divided into these two sections.
Notice that the high-probability samples are
located in the center of the distribution and
have sample means close to the value specified
in the null hypothesis.
On the other hand, the low-probability samples
are located in the extreme tails of the
distribution. After the distribution has been
divided in this way, we can compare our sample
data with the values in the distribution.
Specifically, we can determine whether our
sample mean is consistent with the null
hypothesis (like the values in the center of the
distribution) or whether our sample mean is
The Alpha Level
Alpha level determines the probability of
obtaining sample data in the critical region. The
alpha level, or the level of significance, is a
probability value that is used to define the
concept of “very unlikely” in a hypothesis test.
The critical region is composed of the
extreme sample values that are very unlikely
(as defined by the alpha level) to be obtained if
the null hypothesis is true. The boundaries for
the critical region are determined by the alpha
level. If sample data fall in the critical region,
the null hypothesis is rejected.
To find the boundaries that separate the high-
probability samples from the low-probability
samples, we must define exactly what is
meant by “low” probability and “high”
probability.
The alpha (α) value is a small probability that
is used to identify the low-probability samples.
By convention, commonly used alpha levels
are α = .05 (5%), α = .01 (1%), and α = .001
(0.1%).
For example, with α = .05, we separate the
most unlikely 5% of the sample means (the
extreme values) from the most likely 95% of
The extremely unlikely values, as defined
by the alpha level, make up what is called
the critical region.
These extreme values in the tails of the
distribution define outcomes that are not
consistent with the null hypothesis; that is,
they are very unlikely to occur if the null
hypothesis is true.
Whenever the data from a research study
produce a sample mean that is located in
the critical region, we conclude that the
data are not consistent with the null
hypothesis, and we reject the null
The Boundaries for the
Critical Region
To determine the exact location for the
boundaries that define the critical region, we use
the alpha-level probability and the unit normal
table. In most cases, the distribution of sample
means is normal, and the unit normal table
provides the precise z-score location for the
critical region boundaries.
With α = .05, for example, the boundaries
separate the extreme 5% from the middle 95%.
Because the extreme 5% is split between two tails
of the distribution, there is exactly 2.5% (or
0.0250) in each tail
In the unit normal table, you can look up a
These values define the boundaries of the critical region
for a hypothesis test using α = .05
Similarly, an alpha level of α = .01 means that
1% or .0100 is split between the two tails. In
this case, the proportion in each tail is .0050,
and the corresponding z-score boundaries are z
= + 2.58.
For α = .001 means that 0.1 % or 0.00100 is
split between two tails. In this case, the
proportion in each tail is .00050, and the
corresponding z-score boundaries at z =+3.30
STEP 3: Collect data and
compute sample statistics
At this time, we begin recording tips for
male customers while the waitresses are
wearing red. Notice that the data are
collected after the researcher has stated
the hypotheses and established the criteria
for a decision.
This sequence of events helps ensure that
a researcher makes an honest, objective
evaluation of the data and does not tamper
with the decision criteria after the
experimental outcome is known.
Next, the raw data from the sample are
summarized with the appropriate statistics:
For this example, the researcher would
compute the sample mean.
Now it is possible for the researcher to
compare the sample mean (the data) with
the null hypothesis.
The comparison is accomplished by
computing a z-score that describes exactly
where the sample mean is located relative
to the hypothesized population mean from
H0. In Step 2,
we constructed the distribution of sample
Now we calculate a z-score that identifies where
our sample mean is located in this hypothesized
distribution. The z-score formula for a sample
mean is
z=M–μ
σM
In the formula, the value of the sample mean (M)
is obtained from the sample data, and the
value of μ is obtained from the null hypothesis.
Thus, the z-score formula can be expressed
in words as follows:
z = sample mean - hypothesized population mean
Standard error between M and SD
STEP 4: Make a decision
In the final step, the researcher uses the z-score
value obtained in Step 3 to make a decision
about the null hypothesis according to the
criteria established in Step 2. There are two
possible outcomes.
1st Possible Outcome
The sample data are located in the critical
region. By definition, a sample value in the
critical region is very unlikely to occur if the
null hypothesis is true. Therefore, we conclude
that the sample is not consistent with H0 and
our decision is to reject the null hypothesis.
For the example we have been considering,
suppose the sample produced a mean tip of M
= 16.7 percent. The null hypothesis states that
the population mean is μ = 15.8 percent and,
with n = 36 and σ = 2.4, the standard error for
the sample mean is
σM = σ/√n = 2.4/6 = 0.4
Thus, a sample mean of M = 16.7 produces
a z-score of
z = M – μ = 16.7 – 15.8 = 0.9/0.4 = 2.25
σM 0.4
With an alpha level of α = .05, this z-score
is far beyond the boundary of 1.96.
Because the sample z-score is in the critical
region, we reject the null hypothesis and
conclude that the red shirt did have an
effect on tipping behavior.
2nd Possible Outcome
The sample data are not in the critical region.
In this case, the sample mean is reasonably
close to the population mean specified in the
null hypothesis (in the center of the
distribution). Because the data do not provide
strong evidence that the null hypothesis is
wrong, our conclusion is to fail to reject the
null hypothesis. This conclusion means that
the treatment does not appear to have an
effect.
For the research study examining the effect of
a red shirt, suppose our sample produced a
mean tip of M = 16.1 percent. As before, the
These values produce a z-score of
z = M – μ = 16.1 – 15.8 = 0.3/0.4 = 0.75
σM 0.4
The z-score of 0.75 is not in the critical
region. Therefore, we would fail to reject
the null hypothesis and conclude that the
red shirt does not appear to have an effect
on male tipping behavior.
Errors in Hypothesis
Testing
Just because the sample mean (following
treatment) is different from the original
population mean does not necessarily
indicate that the treatment has caused a
change.
You should recall that there usually is some
discrepancy between a sample mean and
the population mean simply as a result of
sampling error.
Because the hypothesis test relies on
sample data, and because sample data are
not completely reliable, there is always the
risk that misleading data will cause the
EXAMPLE
Remember: samples are not expected to be
identical to their populations, and some extreme
samples can be very different from the
populations they are supposed to represent. If a
researcher selects one of these extreme
samples by chance, then the data from the
sample may give the appearance of a
strong treatment effect, even though there is
no real effect.
 In the previous section, for example, we
discussed a research study examining how the
tipping behavior of male customers is influenced
by a waitress wearing the color red. Suppose
the researcher selects a sample of n = 36
men who already were good tippers. Even if
Type I Errors
A Type I error occurs when the sample data
appear to show a treatment effect when, in fact,
there is none.
In this case the researcher will reject the null
hypothesis and falsely conclude that the
treatment has an effect.
Type I errors are caused by unusual,
unrepresentative samples. Just by chance the
researcher selects an extreme sample with the
result that the sample falls in the critical region
even though the treatment has no effect.
The hypothesis test is structured so that Type I
errors are very unlikely; specifically, the
Reasons
An extreme or non-representative sample
Dangers of TYPE 1 ERROR
Type 1 error is false report
It will add into scientific literature
Built a Theory on false results
It can’t be replicate
Research is useless
Type II Error
A Type II error occurs when the sample does
not appear to have been affected by the
treatment when, in fact, the treatment does
have an effect.
In this case, the researcher will fail to reject the
null hypothesis and falsely conclude that the
treatment does not have an effect.
Type II errors are commonly the result of a very
small treatment effect. Although the treatment
does have an effect, it is not large enough to
show up in the research study.
The sample mean is in the critical region
Directional test
When a research study predicts a specific
direction for the treatment effect (increase
or decrease), it is possible to incorporate
the directional prediction into the
hypothesis test.
The result is called a directional test or a
one-tailed test. A directional test includes
the directional prediction in the statement
of the hypotheses and in the location of the
critical region.
For example, if the original population has
a mean of μ = 80 and the treatment is
predicted to increase the scores, then the
null hypothesis would state that after
treatment:
H0: μ = 80 (there is no increase)
H1: μ > 80 (there is increase)

In this case, the entire critical region would


be located in the right-hand tail of the
distribution because large values for M
would demonstrate that there is an
Alpha Levels for Hypothesis
test through Z-Test
If we revisit the z-score for 5% and 1%, we can
identify the critical regions for the critical
rejection areas from the unit standard normal
table.
A two-tailed test at the 5% level has a critical
boundary Z score of +1.96 and -1.96
A one-tailed test at the 5% level has a critical
boundary Z score of +1.64 or -1.64
A two-tailed test at the 1% level has a critical
boundary Z score of +2.58 and -2.58
A one-tailed test at the 1% level has a critical
boundary Z score of +2.33 or -2.33.
Measuring Effect Size
A hypothesis test evaluates the statistical
significance of the results from a research
study. That is, the test determines whether or
not it is likely that the obtained sample mean
occurred without any contribution from a
treatment effect.
The hypothesis test is influenced not only by
the size of the treatment effect but also by the
size of the sample.
Thus, even a very small effect can be
significant if it is observed in a very large
sample.
Because a significant effect does not
necessarily mean a large effect, it is
recommended that the hypothesis test be
accompanied by a measure of the effect
size.

 We use Cohen’s d as a standardized


measure of effect size.

Much like a z-score, Cohen’ s d measures


the size of the mean difference in terms of
the standard deviation.
The standard deviation is included in the
calculation to standardize the size of the mean
difference in much the same way that z-scores
standardize locations in a distribution.
For example, a 15-point mean difference can be
a relatively large treatment effect or a relatively
small effect depending on the size of the
standard deviation.
For example the results of a treatment that
produces a 15-point mean difference in SAT
scores; before treatment, the average SAT score
is μ = 500, and after treatment the average is
515. Notice that the standard deviation for SAT
scores is σ = 100, so the 15-point difference
appears to be small. For this example, Cohen’s
d is
Cohen’s d = mean difference/Standard
Deviation
= 15/100 = 0.15

 Now consider the treatment effect produces a


15-point mean difference in IQ scores; before
treatment the average IQ is 100, and after
treatment the average is 115. Because IQ scores
have a standard deviation of σ = 15, the 15-point
mean difference now appears to be large. For
this example, Cohen’s d is

Cohen’s d = mean
Power of a Hypothesis
Test
The power of a hypothesis test is defined is
the probability that the test will reject the
null hypothesis when the treatment does
have an effect. The power of a test
depends on a variety of factors including:
Sample size: the size of the treatment
effect and the size of the sample are
related. Larger the sample produces
greater power for hypothesis test.
Alpha Level: Reducing the alpha level of
the test also reduce the power of the test.
Lowering from .05 to .01 lower the power of
hypothesis test.
Critical region on the right-hand side begins at
z = 1.96. If a were changed to .01, the
boundary would be moved farther to the right,
out to z = 2.58.
It should be clear that moving the critical
boundary to the right means that a smaller
portion of the treatment distribution (the
distribution on the right-hand side) will be in
the critical region. Thus, there would be a lower
probability of rejecting the null hypothesis and
a lower value for the power of the test.
One tailed vs. Two tailed test: If the
treatment effect is in the predicted direction,
then changing from a regular two-tailed test to
a one-tailed test increases the power of the
hypothesis test.
The boundaries for the critical region using a
two-tailed test with α = .05 so that the critical
region on the right-hand side begins at z =
1.96. Changing to a one-tailed test would move
the critical boundary to the left to a value of z
= 1.65.
Moving the boundary to the left would cause a
larger proportion of the treatment distribution
HYPOTHESIS TESTING
THROUGH Z-SCORE
Alpha Levels for Hypothesis
test through Z-Test
If we revisit the z-score for 5% and 1%, we can
identify the critical regions for the critical
rejection areas from the unit standard normal
table.
A two-tailed test at the 5% level has a critical
boundary Z score of +1.96 and -1.96
A one-tailed test at the 5% level has a critical
boundary Z score of +1.64 or -1.64
A two-tailed test at the 1% level has a critical
boundary Z score of +2.58 and -2.58
A one-tailed test at the 1% level has a critical
boundary Z score of +2.33 or -2.33.
Example: 1
We start with a normal shaped population
with a mean of μ = 80 and a standard
deviation of σ = 10. A researcher plans to
select a sample of n = 25 individuals from
this population and administer a treatment
to each individual. It is expected that the
treatment will have an 8-point effect; that
is, the treatment will add 8 points to each
individual’s score.
Step 1 : State the Hypothesis
H0 : μ = 80
H1: μ > 80
Step 2: Select Alpha Level
Step 3: Calculate the test statistics
z=M–μ
σM
z = 80-88/2 = -4.0
Cohen’s d = 0.8
Step 4 : Make a Decision
Sample means will be in the critical region and
we will reject the null hypothesis. In practical
terms, this means that the research study is
almost guaranteed to be successful. If the
researcher selects a sample of n = 25
individuals, and if the treatment really does have
Example: 2
A study examines self-esteem and
depression in teenagers. A sample of 25
teens with a low self-esteem are given the
Beck Depression Inventory. The average
score for the group is 20.9. For the general
population, the average score is 18.3 with σ
= 12. Use a two-tail test with α = 0.05 to
examine whether teenagers with low self-
esteem show significant differences in
depression.
Answer: we will accept the null hypothesis
with 0.22 Effect size
Example: 3
You get hired as a server at a local restaurant,
and the manager tells you that servers’ tips are
$42 on average but vary about $12 (μ = 42, σ
= 12). You decide to track your tips to see if
you make a different amount, but because this
is your first job as a server, you don’t know if
you will make more or less in tips. After working
16 shifts, you find that your average nightly
amount is $44.50 from tips. Test for a difference
between this value and the population mean at
the α = 0.05 level of significance while
computing effect size as well
Answer: we will accept the null hypothesis with
Example: 4
A researcher begins with a known
population—in this case, scores on a
standardized test that are normally
distributed with μ = 65 and σ = 15. The
researcher suspects that special training in
reading skills will produce a change in the
scores for the individuals in the population.
Because it is not feasible to administer the
treatment (the special training) to everyone
in the population, a sample of n = 25
individuals is selected, and the treatment is
given to this sample. Following treatment,
the average score for this sample is M =
70. Is there evidence that the training has
Example: 5
The psychology department is gradually
changing its curriculum by increasing the
number of online course offerings. To evaluate
the effectiveness of this change, a random
sample of n = 36 students who registered for
Introductory Psychology is placed in the online
version of the course. At the end of the
semester, all students take the same final exam.
The average score for the sample is M = 76. For
the general population of students taking the
traditional lecture class, the final exam scores
form a normal distribution with a mean of μ =
71. If the population standard deviation is σ =
Example: 6
A random sample of n = 25 scores is selected
from a normal population with a mean of μ = 40.
After a treatment is administered to the
individuals in the sample, the sample mean is
found to be M = 44.
a. If the population standard deviation is σ = 5, is
the sample mean sufficient to conclude that the
treatment has a significant effect? Use a two-
tailed test with α = .05 and compute effect size.
(Reject Null Hypothesis) (Cohe’s d = 0.80)
b. If the population standard deviation is σ = 15,
is the sample mean sufficient to conclude that the
treatment has a significant effect? Use a two-

You might also like