0% found this document useful (0 votes)
6 views

8_Chapter5_Testformean_part1

The document outlines the process of hypothesis testing, particularly for estimating population means using sample averages. It details the steps involved, including defining the population and hypotheses, computing test statistics, and concluding based on evidence. Examples illustrate how to apply these concepts in real-world scenarios, such as assessing changes in proportions of political affiliations among students.

Uploaded by

sunvssky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

8_Chapter5_Testformean_part1

The document outlines the process of hypothesis testing, particularly for estimating population means using sample averages. It details the steps involved, including defining the population and hypotheses, computing test statistics, and concluding based on evidence. Examples illustrate how to apply these concepts in real-world scenarios, such as assessing changes in proportions of political affiliations among students.

Uploaded by

sunvssky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

ECO1005 I NTRODUCTION TO E CONOMIC S TATISTICS

I NFERENCE FOR POPULATION MEAN


5.1, 5.2, 5.3, 6.1, 7.1

노승화

O CTOBER 14, 2024


• When we collect data and analyze data, often the purpose
is to answer some question.
• This is called hypothesis test or inference.
• In this note, we would be considering hypothesis test
especially when we are interested in estimating population
mean (expectation) with sample average.
• We will consider two sample averages, p̂ and x.
• Can you figure out that p̂ is also a sample average?

1 / 38
E XAMPLE

Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?

Two possible explanations


• The true population proportion really has changed.
• The true population proportion has stayed at 0.60, and the
observed increase is simply a result of sampling variability.

This is why hypothesis test is necessary.

2 / 38
HYPOTHESIS TEST

Steps for hypothesis test


1. Define the population and the question of interest.
2. Define the null and alternative hypothesis : concretely
express the question of interest using population
parameter.
3. Compute the relevant test statistic
4. Find out the distribution of the test statistic
5. Rejection Region
critical value, p-value, confidence interval
6. Conclude

3 / 38
HYPOTHESIS TEST

Steps for hypothesis test


1. Define the population and the question of interest.
2. Define the null and alternative hypothesis : concretely
express the question of interest using population
parameter.
3. Compute the relevant test statistic
4. Find out the distribution of the test statistic
5. Rejection Region
critical value, p-value, confidence interval
6. Conclude

3 / 38
1. P OPULATION AND QUESTION OF INTEREST

• When performing hypothesis test, the first thing you need


to do is concretely defining the question of interest.
• This includes defining population of interest.
• This also includes defining the parameter of interest.

4 / 38
E XAMPLE - C ONT ’ D

Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. Is
there evidence that the proportion of Democratic students has
changed compared to 1970?
• The population of interest is Emory students.
• The parameter of interest is the proportion of Democratic
Emory students in 2023, p.

5 / 38
P OPULATION VS S AMPLE

• The population distribution refers to the true distribution


of related to the world you would like to know.
• The data distribution refers to the distribution of observed
values from a sample.
✓ We could observe 100 SAT scores. The distribution of
sample can be obtained by histogram. It can be
summarized with statistics like x̄ and s (sample mean and
sample standard deviation)
✓ We could observe 100 students whether each of the student
is Democrat or not. We can summarize the distribution
with frequency table or bar plot. It can be summarized with
statistic like the sample proportion, p̂.

6 / 38
POPULATION VS SAMPLE - QUANTITATIVE VAR

7 / 38
POPULATION VS SAMPLE - BINARY VAR
Suppose that the proportion of American adults who support
the expansion of solar energy is p =0.88, the population
parameter and we have a random sample of 50 observations.

8 / 38
• When performing hypothesis test, the first thing you need
to do is to concretely define the question of interest.
• This includes defining population concretely.
• This also includes defining the parameter of interest.
• We will consider the following two parameters.
✓ µ = true population mean
✓ p = true population proportion

9 / 38
2. T HE NULL AND THE ALTERNATIVE HYPOTHESIS

• Next you would change your question in words into an


equation that is written in terms of the population
parameter of interest. These are called the null and the
alternative hypotheses.
✓ The null hypothesis (H0 ) represents a status quo. This is
generally a statement of no effect.
✓ The alternative hypothesis (HA ) represents a claim that a
researcher is claiming for. It is often represented by a range
of possible parameter values.

10 / 38
• We conduct the hypothesis test
assuming that the null hypothesis is true.
• We then evaluate the test results to determine if there is
enough evidence to reject the null in favor of the
alternative (what we hope to show).
• We will reject the null and choose the alternative
hypothesis only if we have enough evidence.
• Staying at the status quo usually does not mean that we
are choosing the null hypothesis. It implies that we do not
have enough evidence to conclude on something.

11 / 38
E XAMPLE C ONT ’ D

A random sample of 180 Emory students in 2014 showed that the


proportion of Democratic students was 0.76, whereas the proportion
of Democratic Emory students in 1970 was 0.60. Is there any
evidence that the proportion of Democratic students has changed?

• The null hypothesis is the one with no effect - that the true
proportion of Democrats has not changed.
H0 : p = 0.6
• The alternative hypothesis is one of an effect - that the true
proportion of Democrats has changed.
HA ∶ p ≠ 0.6

12 / 38
D IFFERENT TYPES OF THE ALTERNATIVE HYPOTHESIS

Null Hypothesis
I think that the true proportion of Emory H0 : p = 0.6
students who are Democrat is 0.6.

Alternative Hypothesis
I think that the true proportion of Emory HA : p ≠ 0.6
students who are Democrat differs from 0.6. two-sided

I think that the true proportion of Emory HA : p < 0.6


students who are Democrat is less than 0.6. one-sided

I think that the true proportion of Emory HA : p > 0.6


students who are Democrat is greater than 0.6. one-sided

13 / 38
P OLL
• 200 Emory students were asked how many colleges they
applied: the sample had an average of 9.7 college
applications with a standard deviation of 7. The College
Board website states that counselors recommend students
apply to roughly 8 colleges.
• You want to test if the data provides convincing evidence
that the average number of colleges Emory students apply
to is higher than recommended.

What would be the relevant H0 and HA ?


A. H0 ∶ µ = 9.7, HA ∶ µ > 9.7
B. H0 ∶ µ = 8, HA ∶ µ > 8
C. H0 ∶ x̄ = 8, HA ∶ x̄ > 8
D. H0 ∶ µ = 8, HA ∶ µ > 9.7
E. H0 ∶ µ > 8, HA ∶ µ = 9.7
14 / 38
P OLL
You are interested in the proportion of U.S. adults living with
one or more chronic conditions. Specifically, you think that the
proportion of U.S. adults living with one or more chronic
conditions is less than 50%. In 2013, the Pew Research
Foundation reported that in their sample, 45% live with one or
more chronic conditions”.
What would be the null and alternative hypothesis?
A H0 ∶ p = 0.45, HA ∶ p < 0.5
B H0 ∶ p = 0.5, HA ∶ p < 0.5
C H0 ∶ p̂ < 0.5, HA ∶ p̂ = 0.5
D H0 ∶ p̂ = 0.5, HA ∶ p̂ < 0.5
E H0 ∶ p = 0.45, HA ∶ p ≠ 0.45
F H0 ∶ p = 0.5, HA ∶ p > 0.5

15 / 38
3. T EST STATISTIC
D IGRESSION 1 - STATISTIC AND SAMPLING VARIABILITY

• We have already defined the parameter of interest.


• After collecting the data, you would estimate this
parameter of interest using the sample statistic.
• When the parameter of interest is the proportion p, the
relevant sample statistic would be the sample proportion p̂
• When the parameter of interest is the expectation (or
population mean) µ, then the relevant sample statistic
would be the sample average, x.

16 / 38
• This sample statistic, whether it is p̂ or x has sampling
variation.
• This is because each sample is different from each other.
• What does this mean?

17 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE - B INARY

The proportion of American adults who support the expansion


of solar energy revisited
population distribution sample1

0.8
0.8

0.6
0.6

0.4
0.4

0.2
0.2
0.0

0.0
not support not support

sample2 sample3
0.8

0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

not support not support

The sample proportion would differ from one sample to


another. In other words, p̂ has sampling variation.
18 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE -Q UANTITATIVE
The height data

19 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE -Q UANTITATIVE
The height data

19 / 38
D ATA DISTRIBUTION CHANGES WITH DIFFERENT SAMPLE -Q UANTITATIVE
The height data

The sample mean would differ from one sample to another. In


other words, x̄ has sampling variation.

19 / 38
S AMPLING DISTRIBUTION OF A STATISTIC

• Suppose that you were able to collect all possible values of


x coming from different samples and found the probability
distribution of x.
• Then this would be the sampling distribution of the
statistic x.
• We can do the same thing for p̂. Suppose that you were
able to collect all possible values of p̂ coming from different
samples and found the probability distribution of p̂.

20 / 38
• The sampling distribution refers to the probability
distribution of a statistic such as x and p̂.
• Hence the statistic is a random variable that has a
probability distribution and the sample statistic you obtain
would be one realization of all possible outcomes.

21 / 38
• Since test statistic is a transformation of the sample
statistic, it is also a random.
• A test statistic is a transformation of the sample statistic
such that the transformed test statistic has a well-defined
distribution that does not depend on population
parameters.
• How can we define a test statistic that has a well-defined
distribution which does not depend on population
parameters?
• This is possible once we find the distribution of the sample
statistic and standardize.

22 / 38
3. T EST STATISTIC
D IGRESSION 2 - SAMPLING DISTRIBUTION AND CENTRAL LIMIT THEOREM

Central Limit Theorem


Suppose that we have a random sample with n observations
where xi is a random sample from a distribution with mean µ
and standard deviation σ < ∞. Then, the following is satisfied
by the central limit theorem.

a σ
x ∼ N (µ, √ )
n
a
∼ can be interpreted as approximately having such distribution
when n → ∞.
⋆ As n increases, would the precision of an estimator increase
or decrease?

23 / 38
POLL

As n increases, would the precision of the estimator x


increase or decrease?
A Increase
B Decrease

24 / 38
In case of the binary categorical variable, we are interested in
estimating the proportion p.

Central Limit Theorem for proportion


We would have n observations xi randomly sampled from a
population with population proportion p. Then, p̂ = n1 ∑ xi
would satisfy the following by the central limit theorem

a p(1 − p)
p̂ ∼ N (p, n )

• Why?
• Note that xi has mean p and variance p(1 − p) because each
xi would be the case with one trial for the binomial
distribution with probability of success p.

25 / 38
S AMPLING DISTRIBUTION OF p̂

xi randomly sampled from population with parameter p = 0.88.

26 / 38
S AMPLING DISTRIBUTION OF x

xi randomly sampled from population N(150,30).

27 / 38
xi randomly sampled with beta distribution with expectation
0.25. 28 / 38
• Why is central limit theorem important?
• Because it provides the distribution which is well known
(Normal distribution) for the statistic of interest regardless
of what was the distribution of population.
• Then, what are the conditions for the central limit theorem
to hold?
• Random sample and n → ∞!
• n → ∞ is practically not possible. The rule of thumb would
be np ≥ 10 and n(1 − p) ≥ 10 for the binary variable and
n ≥ 30 for not highly skewed quantitative variable.

29 / 38
3. T EST STATISTIC
• With the central limit theorem, the statistic x or p̂ has well
defined distribution but still the distribution depends on
parameters, p or σ that we do not know.
• Those parameters are something we do not know and
should be estimated. Hence we standardize such that the
distribution does not have other parameters we know.
• This will let us have a unified approach when performing
hypothesis test.
• We standardize using the z-score

30 / 38
3. T EST STATISTIC
• With the central limit theorem, the statistic x or p̂ has well
defined distribution but still the distribution depends on
parameters, p or σ that we do not know.
• Those parameters are something we do not know and
should be estimated. Hence we standardize such that the
distribution does not have other parameters we know.
• This will let us have a unified approach when performing
hypothesis test.
• We standardize using the z-score

x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n

30 / 38
P OLL

x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n

What is the distribution of the Z-score above when n is


large?
A N(0, 1)
B Depends on the distribution of xi

31 / 38
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n

• However, the Z-score we have written above is not a test


statistic yet. The test statistic should be practically
computable once we obtain the data but here due to p and
σ it is not computable yet.
• How should be change the Z score that it becomes test
statistic?

32 / 38
x−µ
Z=
√σ
n
p̂ − p
Z= √
p(1−p)
n

• However, the Z-score we have written above is not a test


statistic yet. The test statistic should be practically
computable once we obtain the data but here due to p and
σ it is not computable yet.
• How should be change the Z score that it becomes test
statistic?
• Replace the standard deviation with estimated standard
deviation which is called standard error(SE).

32 / 38

x−µ
Z=
√s
n
p̂ − p
Z= √
p̂(1−p̂)
n

• The standard deviation √σn of x is estimated with √sn . This


estimated standard deviation is called standard error (SE)
of x. √
p(1−p)
• For p̂, the standard deviation is and it is estimated
√ n
p̂(1−p̂)
by n
which is standard error (SE) of p̂.

33 / 38

x−µ
Z=
√s
n
p̂ − p
Z= √
p̂(1−p̂)
n

• The standard deviation √σn of x is estimated with √sn . This


estimated standard deviation is called standard error (SE)
of x. √
p(1−p)
• For p̂, the standard deviation is and it is estimated
√ n
p̂(1−p̂)
by n
which is standard error (SE) of p̂.
• However note that still we have p and µ which we do not
know.

33 / 38
• For µ and p in the numerator, we will be replacing with the
value under our null hypothesis H0 .
• This is in line with the idea that we will be imposing the
null hypothesis to be true and find evidence against it.
• Let the null hypothesis value to be µ0 and p0 respectively.
• Then, the test statistic becomes
x − µ0
Z=
√s
n
p̂ − p0
Z= √
p̂(1−p̂)
n
Often it is also called t-statistic and written as
x − µ0
t= s

n
p̂ − p0
t= √
p̂(1−p̂)
n
34 / 38
E XAMPLE

Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?

H0 ∶ p = 0.6 H0 ∶ p ≠ 0.6

35 / 38
E XAMPLE

Suppose that you’ve heard from your friend that the proportion of
Democratic Emory students in 1970 was 0.60. You want to find out if
the proportion of Democratic Emory students in 2023 has changed. A
random sample of 180 Emory students in 2023 showed that the
proportion of Democratic students was 0.76. Is there any evidence
that the proportion of Democratic students has changed?

H0 ∶ p = 0.6 H0 ∶ p ≠ 0.6

p̂ − 0.6 0.76 − 0.6


Z= √ =√ = 5.026
p̂(1−p̂) 0.76(1−0.76)
n 180

35 / 38
E XAMPLE

200 Emory students were asked how many colleges they applied: the
sample had an average of 9.7 college applications with a standard
deviation of 7. The College Board website states that counselors
recommend students apply to roughly 8 colleges. You want to test if
the data provides convincing evidence that the average number of
colleges Emory students apply to is higher than recommended.

H0 ∶ µ = 8 H0 ∶ µ > 8

36 / 38
E XAMPLE

200 Emory students were asked how many colleges they applied: the
sample had an average of 9.7 college applications with a standard
deviation of 7. The College Board website states that counselors
recommend students apply to roughly 8 colleges. You want to test if
the data provides convincing evidence that the average number of
colleges Emory students apply to is higher than recommended.

H0 ∶ µ = 8 H0 ∶ µ > 8

9.7 − 8
Z= = 3.4345
√7
200

36 / 38
D ISCUSSION ON THE DISTRIBUTION OF THE TEST
STATISTIC

• By the central limit theorem, even if xi does not have


normal distribution Z still has standard normal
distribution if n is large enough.
• If xi has a normal distribution from the beginning then Z
has a student-t distribution, tn−1 , with degrees of freedom
n − 1 regardless of n being large or small because we do
not have to rely on the central limit theorem.
• When n → ∞, the student-t distribution is identical to the
standard normal distribution. Hence when n is large
enough there would be no difference whether you define
as student-t distribution or standard normal distribution.

37 / 38
In sum,

x − µ0 a
Z= ∼ N(0, 1)
√s
n
p̂ − p0 a
Z= √ ∼ N(0, 1)
p̂(1−p̂)
n
x − µ0
t= ∼ tn−1 when the population distribution is normal
√s
n

38 / 38

You might also like