8_Chapter5_Testformean_part1
8_Chapter5_Testformean_part1
노승화
1 / 38
E XAMPLE
Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?
2 / 38
HYPOTHESIS TEST
3 / 38
HYPOTHESIS TEST
3 / 38
1. P OPULATION AND QUESTION OF INTEREST
4 / 38
E XAMPLE - C ONT ’ D
Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. Is
there evidence that the proportion of Democratic students has
changed compared to 1970?
• The population of interest is Emory students.
• The parameter of interest is the proportion of Democratic
Emory students in 2023, p.
5 / 38
P OPULATION VS S AMPLE
6 / 38
POPULATION VS SAMPLE - QUANTITATIVE VAR
7 / 38
POPULATION VS SAMPLE - BINARY VAR
Suppose that the proportion of American adults who support
the expansion of solar energy is p =0.88, the population
parameter and we have a random sample of 50 observations.
8 / 38
• When performing hypothesis test, the first thing you need
to do is to concretely define the question of interest.
• This includes defining population concretely.
• This also includes defining the parameter of interest.
• We will consider the following two parameters.
✓ µ = true population mean
✓ p = true population proportion
9 / 38
2. T HE NULL AND THE ALTERNATIVE HYPOTHESIS
10 / 38
• We conduct the hypothesis test
assuming that the null hypothesis is true.
• We then evaluate the test results to determine if there is
enough evidence to reject the null in favor of the
alternative (what we hope to show).
• We will reject the null and choose the alternative
hypothesis only if we have enough evidence.
• Staying at the status quo usually does not mean that we
are choosing the null hypothesis. It implies that we do not
have enough evidence to conclude on something.
11 / 38
E XAMPLE C ONT ’ D
• The null hypothesis is the one with no effect - that the true
proportion of Democrats has not changed.
H0 : p = 0.6
• The alternative hypothesis is one of an effect - that the true
proportion of Democrats has changed.
HA ∶ p ≠ 0.6
12 / 38
D IFFERENT TYPES OF THE ALTERNATIVE HYPOTHESIS
Null Hypothesis
I think that the true proportion of Emory H0 : p = 0.6
students who are Democrat is 0.6.
Alternative Hypothesis
I think that the true proportion of Emory HA : p ≠ 0.6
students who are Democrat differs from 0.6. two-sided
13 / 38
P OLL
• 200 Emory students were asked how many colleges they
applied: the sample had an average of 9.7 college
applications with a standard deviation of 7. The College
Board website states that counselors recommend students
apply to roughly 8 colleges.
• You want to test if the data provides convincing evidence
that the average number of colleges Emory students apply
to is higher than recommended.
15 / 38
3. T EST STATISTIC
D IGRESSION 1 - STATISTIC AND SAMPLING VARIABILITY
16 / 38
• This sample statistic, whether it is p̂ or x has sampling
variation.
• This is because each sample is different from each other.
• What does this mean?
17 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE - B INARY
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
not support not support
sample2 sample3
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
19 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE -Q UANTITATIVE
The height data
19 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE -Q UANTITATIVE
The height data
19 / 38
S AMPLING DISTRIBUTION OF A STATISTIC
20 / 38
• The sampling distribution refers to the probability
distribution of a statistic such as x and p̂.
• Hence the statistic is a random variable that has a
probability distribution and the sample statistic you obtain
would be one realization of all possible outcomes.
21 / 38
• Since test statistic is a transformation of the sample
statistic, it is also a random.
• A test statistic is a transformation of the sample statistic
such that the transformed test statistic has a well-defined
distribution that does not depend on population
parameters.
• How can we define a test statistic that has a well-defined
distribution which does not depend on population
parameters?
• This is possible once we find the distribution of the sample
statistic and standardize.
22 / 38
3. T EST STATISTIC
D IGRESSION 2 - SAMPLING DISTRIBUTION AND CENTRAL LIMIT THEOREM
a σ
x ∼ N (µ, √ )
n
a
∼ can be interpreted as approximately having such distribution
when n → ∞.
⋆ As n increases, would the precision of an estimator increase
or decrease?
23 / 38
POLL
24 / 38
In case of the binary categorical variable, we are interested in
estimating the proportion p.
• Why?
• Note that xi has mean p and variance p(1 − p) because each
xi would be the case with one trial for the binomial
distribution with probability of success p.
25 / 38
S AMPLING DISTRIBUTION OF p̂
26 / 38
S AMPLING DISTRIBUTION OF x
27 / 38
xi randomly sampled with beta distribution with expectation
0.25. 28 / 38
• Why is central limit theorem important?
• Because it provides the distribution which is well known
(Normal distribution) for the statistic of interest regardless
of what was the distribution of population.
• Then, what are the conditions for the central limit theorem
to hold?
• Random sample and n → ∞!
• n → ∞ is practically not possible. The rule of thumb would
be np ≥ 10 and n(1 − p) ≥ 10 for the binary variable and
n ≥ 30 for not highly skewed quantitative variable.
29 / 38
3. T EST STATISTIC
• With the central limit theorem, the statistic x or p̂ has well
defined distribution but still the distribution depends on
parameters, p or σ that we do not know.
• Those parameters are something we do not know and
should be estimated. Hence we standardize such that the
distribution does not have other parameters we know.
• This will let us have a unified approach when performing
hypothesis test.
• We standardize using the z-score
30 / 38
3. T EST STATISTIC
• With the central limit theorem, the statistic x or p̂ has well
defined distribution but still the distribution depends on
parameters, p or σ that we do not know.
• Those parameters are something we do not know and
should be estimated. Hence we standardize such that the
distribution does not have other parameters we know.
• This will let us have a unified approach when performing
hypothesis test.
• We standardize using the z-score
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n
30 / 38
P OLL
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n
31 / 38
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n
32 / 38
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n
32 / 38
•
x−µ
Z=
√s
n
p̂ − p
Z= √
p̂(1−p̂)
n
33 / 38
•
x−µ
Z=
√s
n
p̂ − p
Z= √
p̂(1−p̂)
n
33 / 38
• For µ and p in the numerator, we will be replacing with the
value under our null hypothesis H0 .
• This is in line with the idea that we will be imposing the
null hypothesis to be true and find evidence against it.
• Let the null hypothesis value to be µ0 and p0 respectively.
• Then, the test statistic becomes
x − µ0
Z=
√s
n
p̂ − p0
Z= √
p̂(1−p̂)
n
Often it is also called t-statistic and written as
x − µ0
t= s
√
n
p̂ − p0
t= √
p̂(1−p̂)
n
34 / 38
E XAMPLE
Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?
H0 ∶ p = 0.6 H0 ∶ p ≠ 0.6
35 / 38
E XAMPLE
Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?
H0 ∶ p = 0.6 H0 ∶ p ≠ 0.6
35 / 38
E XAMPLE
200 Emory students were asked how many colleges they applied: the
sample had an average of 9.7 college applications with a standard
deviation of 7. The College Board website states that counselors
recommend students apply to roughly 8 colleges. You want to test if
the data provides convincing evidence that the average number of
colleges Emory students apply to is higher than recommended.
H0 ∶ µ = 8 H0 ∶ µ > 8
36 / 38
E XAMPLE
200 Emory students were asked how many colleges they applied: the
sample had an average of 9.7 college applications with a standard
deviation of 7. The College Board website states that counselors
recommend students apply to roughly 8 colleges. You want to test if
the data provides convincing evidence that the average number of
colleges Emory students apply to is higher than recommended.
H0 ∶ µ = 8 H0 ∶ µ > 8
9.7 − 8
Z= = 3.4345
√7
200
36 / 38
D ISCUSSION ON THE DISTRIBUTION OF THE TEST
STATISTIC
37 / 38
In sum,
x − µ0 a
Z= ∼ N(0, 1)
√s
n
p̂ − p0 a
Z= √ ∼ N(0, 1)
p̂(1−p̂)
n
x − µ0
t= ∼ tn−1 when the population distribution is normal
√s
n
38 / 38