Lecture 18 - Statistics and Data Analysis I 2
Lecture 18 - Statistics and Data Analysis I 2
Avner Halevy
Let’s summarize the steps used in the hypothesis test from the last lecture:
Step 1: State the hypotheses. The hypotheses always involve some unknown population parameter.
For us, this is the population proportion p of people who like chocolate.
H0 : p = 0.8
H1 : p < 0.8
Step 2: Set the criteria for a decision. This means that we have to divide the set of all possible
sample statistic (for us this is P̂ ) values into two kinds: values that are consistent with H0 and values
that are not. As we know, the latter set is known as the rejection region. If we ultimately have a
value in this region, representing extremely unlikely outcomes, we reject H0 . Before we can compute
the boundary of the critical region, we need to decide on a significance level, denoted by α. Unless
otherwise stated, we shall always assume α = 0.05, which means – we shall soon see – the rejection
region will consist of all those values that fall in the most extreme 5%.
Under the null hypothesis we know
p(1 − p) 0.8(0.2)
= N 0.8, 0.042
P̂ ∼ N p, = N 0.8,
n 100
To find the critical value separating rejection values from acceptance values, we compute:
X0.05 − 0.8
−1.65 = Z0.05 =
0.04
This leads to X0.05 = 0.734. Thus, any value to the left of 0.734 would be considered extreme and
lead us to reject H0 .
Step 3: Collect data and compute the value of the test statistic. We collected a random
sample of n = 100 people and determined that 77 of them like chocolate. Our statistic is the sample
proportion:
77
P̂ = = 0.77
100
Step 4: Make a decision. Since 0.734 < 0.77 (the value of our statistic does not fall in the rejection
region), we decide not to reject H0 . At the 0.05 significance level, we have insufficient evidence to
reject H0 .
It is common to also compute the p-value:
0.77 − 0.8
p-value = P (P̂ < 0.77) = P Z < = P (Z < −0.75) = 0.2266
0.04
1
Comparing the p-value to the significance level α, we would reject if the p-value were smaller. Since
0.05 < 0.2266, we again decide not to reject. Using the p-value instead of the rejection region to make
a decision always leads to the same decision, but provides more information about how extreme (or
not) the value observed is.
Given a parameter p and a fixed value p0 , there are three models for hypothesis testing: the first two
are called one-sided and the third one is called two-sided (see the figure below).
Model I
This is the one we have already seen, where the rejection region is located on the left and the hypotheses
have the following form:
H0 : p = p0
H1 : p < p0
In this model, the p-value is the probability of observing a value of the statistic that is lower than the
observed value.
Model II
In this model, the alternative hypothesis states that the value of the parameter is higher than pre-
viously believed. The rejection region is therefore located on the right and the hypotheses have the
following form:
H0 : p = p0
H1 : p > p0
2
In this model, the p-value is the probability of observing a value of the statistic that is higher than
the observed value.
Model III
In this model, the alternative hypothesis states that the value of the parameter is simply different from
what was previously believed, without suggesting a particular direction for the change. The rejection
region is thus divided into two regions, one on the left and one on the right, each having an area of
α/2, and the hypotheses have the following form:
H0 : p = p0
H1 : p 6= p0
In this model, the p-value is the probability of observing a value of the statistic that is more extreme
(in either direction) than the observed value. Thus, if the value observed was higher than p0 , the
p-value is twice the area to the right of it, and if the value observed was lower than p0 , the p-value is
twice the area to the left of it.
Example: we flip a coin 10,000 times and count 5167 H’s. Is this a fair coin? Conduct the test using
α = 0.05.
Step 1:
H0 : p = 0.5
H1 : p 6= 0.5
Step 2:
Under the null hypothesis we know
p(1 − p) 0.5(0.5)
= N 0.5, 0.0052
P̂ ∼ N p, = N 0.5,
n 10000
3
We compute the p-value:
0.5167 − 0.5
p-value = 2P (P̂ > 0.5167) = 2P Z> = 2P (Z > 3.34) = 2(0.0004) = 0.0008
0.005
Since this p-value is smaller than α = 0.05, it would once again lead us to reject H0 . Furthermore,
since the p-value is much smaller than α, we see that the observed value is quite extreme and thus
highly statistically significant.
We note that the procedure we have described under Model III is equivalent to the following procedure
which uses the confidence interval we constructed in lecture 16. Given the significance level α, we
construct a 1 − α confidence interval for the parameter p and reject H0 if the value of p under H0 does
not belong to the interval. In our example, we have:
s s
P̂ − Z1− α P̂ (1 − P̂ ) P̂ (1 − P̂ )
, P̂ + Z1− α2
2 n n
r r !
0.5167(1 − 0.5167) 0.5167(1 − 0.5167)
= 0.5167 − 1.96 , 0.5167 + 1.96
10000 10000
= (0.5069, 0.5265)
Since p = 0.5 does not belong to the interval, we would reject H0 , as before.