0% found this document useful (0 votes)
5 views

Unit 4 - Introduction to Statistical Inference Vs2

Unit 4 of ECON 1005 covers introductory statistical inference, focusing on sampling distributions, inference for population means and proportions, and hypothesis testing. Key concepts include the Central Limit Theorem, confidence intervals, sample size calculations, and the distinction between probability distributions and sampling distributions. The unit emphasizes the importance of statistical inference in drawing conclusions from data and understanding sampling variability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Unit 4 - Introduction to Statistical Inference Vs2

Unit 4 of ECON 1005 covers introductory statistical inference, focusing on sampling distributions, inference for population means and proportions, and hypothesis testing. Key concepts include the Central Limit Theorem, confidence intervals, sample size calculations, and the distinction between probability distributions and sampling distributions. The unit emphasizes the importance of statistical inference in drawing conclusions from data and understanding sampling variability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

ECON 1005: Introductory Statistics

UNIT 4: INTRODUCTION TO STATISTICAL INFERENCE

Unit Outline

4.1. Sampling Distributions

4.1.1. Definitions

4.1.2. Steps in creating sampling distributions

4.1.3. Sampling distributions for means

4.1.4. Central Limit Theorem

4.1.5. Distinguishing between a probability distribution of a random variable and the sampling distribution

of the mean

4.2. Basic Introduction to Inference

4.3. Inference for Population Means

4.3.1. Confidence intervals

4.3.2. Sample size calculation

4.3.3. Hypothesis testing (using p-values)

4.4. Inference for Population Proportions

4.4.1. Sampling Distribution

4.4.2. Confidence intervals

4.4.3. Sample size calculation

4.4.4. Hypothesis testing (using p-values)

4.1 – SAMPLING DISTRIBUTIONS

4.1.1 – Definitions

Parameter: A ________________ calculated from the _________________________________.

Statistic: A ___________________ calculated from the _________________________________.


ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

A sampling distribution is the ___________________________________________________ for a

_____________________________.

4.1.2 – Steps in Creating a Sampling Distribution

1. Create a list of all population elements.

2. Obtain all _______________ samples of size ______.

3. Determine the ________________________ for each sample.

4. Calculate the ______________________________ of each statistic occurring.

5. Create a ____________________________________________.

6. Calculate the _____________________ and ________________________ of the probability distribution.

Example: Suppose your population is made of up of the following 4 elements – 1, 2, 3 and 4. Create the
sampling distribution for the sample mean assuming we want a sample size of n = 2.

Step 1: List all the population elements: 1 2 3 4

Step 2: Obtain all possible size of size n = 2:


Step 3: Determine the statistic (in the case the sample mean) for each sample:
Step 4: Calculate the probability of each statistic (sample mean) occurring:

Step 2: Sample Step 3: Statistic Step 4: Prob. Of Occurrence

2
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Step 5: Create the probability distribution

Step 6: Calculate the mean and variance of the probability distribution

3
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

4.1.3 –Sampling Distributions for the Mean

If we take different samples ________________________________ we can calculate the mean (statistic)


of each of them and display the distribution of those means.

For the population:


There is only 1 value
of the mean = µ

For the sample:


There are many values
of 𝑥̅ .

Characteristics of Sampling Distributions of the Mean

Mean :- The mean of the ____________________ and the _______________________________________ of

the mean are ______________________________________________ to each other.

Spread :- The spread of the _____________________________________________________ of the mean is

______________________________________ than that for the _____________________________.

Symmetry :- The ___________________________________________ of the sample mean is fairly

______________________________.

The overall conclusion

 The mean of the sampling distribution of the sample mean is ___________________.

 The standard deviation of the sampling distribution of the mean is _____________________________.

How close is the estimate (the sample mean 𝑥̅ ) to the parameter (the population mean μ)? What would you think
would be best?

1) A narrow sampling distribution or 2) A wide sampling distribution

4
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

How can we get a narrow sampling distribution?

 What measurement determines the spread of a distribution? ________________________________

 The standard deviation of the sampling distribution of the mean is _____________________________.

 How can we make this smaller? _________________________________________________________

4.1.4 - The Central Limit Theorem

If a population is N(µ, σ), then the distribution of the sample mean of n independent observations is N(µ, σ/√n).

i.e. ________________________.

If a ______________________________________________________________________________________

is drawn from _____________________________________________ with mean μ and finite standard

deviation σ, when ______________________________________ the _________________________________

____________________________________________ 𝑥̅ is __________________________________________

with mean = ___________________ and standard deviation = ______________________________.

4.1.4 – Distinguishing between the distribution of a random variance & the of Sampling Distribution of the
Sample Mean

Example: The distribution of times taken by male runners to complete a marathon follows roughly a Normal
distribution. Let X = the distribution of time taken by male runners to complete the marathon.
Therefore X ~ N(4.3hrs,1hr).

1. What is the approximate probability that a single male runner randomly chosen from the entire population
will complete a marathon in more than 5.0 hours?

5
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

2. Take a simple random sample of 50 male runners that have participated in the Cosmo City marathon. What
are the mean and standard deviation of the sampling distribution of sample means of size 50?

3. What is the approximate probability that the mean time to complete the marathon of these 50 runners is 5.0
hours or higher?

It is important to know the difference between the questions which are asking for regular probability from a normal
distribution and those asking for a probability related to the sampling distribution of the mean (i.e. sample mean).

Phrasing of Question Type of Distribution Formula to Use

What is the probability that a …?

What is the probability that the sample


average/mean is …?

6
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Examples: For the following questions, determine whether the situation relates to the:
 Distribution of a random variable
 Sampling distribution of the sample mean
Also write the corresponding probability statement.

1. A small resort is interested in the distribution of its occupancy rates. Data are collected daily for 3 weeks. The
resort would like to determine the probability of having over 10 rooms occupied on a given day during the
following year.

distribution of a random variable sampling distribution of the sample mean

Probability statement: __________________________

2. A company examines the distribution of the average time spent exercising in order to determine if a health
initiative is needed. They took samples from each of their 25 locations to compute the distribution. The
company takes an additional sample and determines the average hours a week exercising. They hope the
probability the sample mean is less than 1 hour is very small.

distribution of a random variable sampling distribution of the sample mean

Probability statement: __________________________

3. The distribution of the average amount of time spent watching TV on a weeknight is determined by looking at
the average of samples from all the elementary schools in NYC. They will run further analysis to examine the
probability that a sample of children spend, on average, between 2 and 4 hours watching TV on a weeknight.

distribution of a random variable sampling distribution of the sample mean

Probability statement: __________________________

4.2 – BASIC INTRODUCTION TO INFERENCE

 Why do we even bother analysing data? We want to draw conclusions from the data.

 Why can’t we just accept our sample mean or sample proportion as the official mean or proportion for the
population?

Every time we estimate the statistics 𝑥̅ (the sample mean) or 𝑝̂ (the sample proportion), we get a different
answer due to sampling variability.

7
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Inference

Two most common types of formal statistical inference are confidence intervals and significance tests.

Confidence Interval: Given a certain ___________________________________________________, it is the

_________________________________________________________________________________________

of the statistic that __________________________________________________________________________.

What we have: the __________________ of the __________________ (e.g. sample mean (𝑥̅ ) or sample proportion
(𝑝̂ )).

What we want: to _____________________the value of the _______________________________________ (e.g.


population mean (µ) or population proportion (p)).

Significance Test (aka: Hypothesis Test): It ________________ the ____________________________

provided by the ___________________________________ that the __________________________________

_________________________________________________________________________________________

(given a certain significance level).

4.3 – INFERENCE FOR POPULATION MEANS

4.3.1 – Confidence Intervals

Steps in Creating a Confidence Interval

1. Obtain the _________________________________.

2. Get the __________________________________ from statistical tables.

8
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

3. Calculate the _____________________________________________. This is just the standard deviation


of the sampling distribution for the statistic of interest.

4. Complete the calculations


using the format:

NB: The margin of error is everything _____________________ the ± sign.

A confidence interval ______________________ relates to the _______________________________________.

Confidence Intervals for single population means – population standard deviation KNOWN

𝜎 𝜎
We report the interval as (𝑥̅ − 𝑧 ∗ ( ) , 𝑥̅ + 𝑧 ∗ ( ))
√𝑛 √𝑛

Confidence Level 80% 90% 95% 99% 99.5%


Z* 1.645 1.960 2.576

THIS TABLE WILL BE PROVIDED ON THE EXAM

Interpretation

Short form: We are ___% confident that the population mean _____________________ <insert what is being

studied> is between ______ <LCL> and ______ <UCL>.

Formal form: If we take repeated samples of size _____ <insert sample size> from the same population of

_________________ <insert what we are studying>, then we expect _____% of the confidence

intervals calculated to contain the true population mean.

9
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Example A specialty tea shop is studying the possibility of opening up at a Liguanea location. Before taking such a
decision they want to study the market. Out of the entire population of Liguanea residents that are 18 years old and
older, they select a SRS of 100. Each individual in the SRS is asked the question “how many boxes of tea have you
bought in the last year?” The sample mean turns out to be 3.0 and it is known that the standard deviation of the
population is 2.

a) Calculate the 90% confidence interval for the mean number of tea boxes bought by the entire population
that is 18 years old and older in Liguanea, in the last year.

𝑥̅ = _______ σ = ________ n = _________ C = _________

z* = _______

Interpretation of result: We are 90% confident that the value of the mean number of boxes of tea

bought in the last year for the entire population is between _____________ and ________________.

b) Is it true that the number of tea boxes bought last year by 90% of the entire Liguanea population that is 18

years old and older is within the interval calculated in (a)? Answer: _____________________

4.3.2 – Sample Size Calculation

If C = 0.9 (90%), our value of 𝑥̅ is one out of 100, and we know that µ is within the interval in 90% of the cases.

Without changing C, can we have a smaller margin of error? If so, then the value of µ would be within a smaller
interval (in C% of cases).

The margin of error is:

m=

10
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

We want to control the margin of error. We can do that by modifying the sample size:

n=

 For the margin of error (width) to ↓ (decrease), we need the sample size to ______________________.

Example: Using the same example above answer the following question:

a) What is the margin of error for the 90% confidence interval?

𝑥̅ = _______ σ = ________ n = _________ C = _________

z* = _______

b) What should be the sample size if we want a margin of error of 0.1 for a 90% confidence interval?

𝑥̅ = _______ σ = ________ m = ______ C = _________

z* = _______ n = ????

We need a sample size of at least __________.

ALWAYS ROUND YOUR SAMPLE SIZE UP!

11
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

4.3.3 – Hypothesis Testing (aka: significance test)

Example of when a significance test will be useful:


You have been told that the average grade in a certain course is 60/100. You take a simple random sample of
students taking that course and collect the grades of all of them. You calculate the statistic: sample mean and obtain
90/100. This looks like a pretty high grade!

 Assuming that μ = 60, is what you measured just a rare case?

 How rare is it? Is there some evidence that maybe the average grade for the population is larger than
60/100?

Steps in Significance Tests

1. State the null and alternative hypothesis


2. Calculate a test statistic to measure the compatibility between the null hypothesis and the data. The test
statistic = (estimate from data – null hypothesis value) / standard deviation of the distribution of the
estimate
3. Calculate the probability of the estimate (the statistic you measured) under the null hypothesis. P-value.
4. Stat a conclusion regarding evidence against the null hypothesis.

Step 1: Hypotheses
Write the null and alternative hypotheses – all hypothesis statements have 2 parts.

Null Hypothesis (H0): Usually represents the ______________________. It is the characteristic of the

population that is being tested. It always has an _______________.

Alternative Hypothesis (Ha): The statement we ____________________________ is ____________.

It is the _______________________________ of the null hypothesis. It ____________ has an equality.

There are two types of alternative hypotheses:


 Two-sided hypothesis:

a) Always has ______ in the alternative hypothesis

b) Key words: __________________________________________________________________


(or any other synonym)

c) Example: A researcher wants to determine if the mean family size is no longer 3.18 members

Example:
H0: µ = 3.18
Ha: µ ≠ 3.18

12
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

The general form is:


H0:

Ha:

where _______ is the __________________________.

 One-sided hypothesis: There are two types:


a) Lower Tail:

 Aka: ______________________________________

 Always has a ______ in the alternative hypothesis

 Key words: ___________________, ____________________, _________________


(or any other synonym)

 Example: A researcher wants to determine if the mean family size is less than 3.18
members

Example: H0: µ = 3.18


Ha: µ < 3.18

The general form is:


H0:

Ha:

where _______ is the __________________________.

b) Upper Tail:

 Aka: ______________________________________

 Always has a ______ in the alternative hypothesis

 Key words: ___________________, ____________________, ____________________


(or any other synonym)

13
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

 Example: A researcher wants to determine if the mean family size is more than 3.18
members

Example:
H0: µ = 3.18
Ha: µ > 3.18

The general form is:


H0:

Ha:

where _______ is the __________________________.

Key Points to Remember when Writing a Hypothesis

1. The hypothesis always has two parts: ______________________________________________________


(these are usually written in a vertical form as illustrated in the examples above)

2. It ALWAYS uses the _________________ for the _________________________________________

3. H0 ALWAYS has an ____________________________________

4. The value in H0 and the value in Ha are ALWAYS the _______________________.

Step 2: Test Statistics


Calculate a test statistic to measure the compatibility between the null hypothesis and the data

We want to calculate the test statistic that will allow is to calculate probabilities. Given the measured from the data
(sample size n), we want to calculate z (z Test for a population mean) assuming that the mean of the population is
the one under the null hypothesis (H0: µ = µ0)

z =

 Based on the CLT 𝑥̅ comes from a N(µ0, 


n
). We are assuming that the mean value from the population
is µ0.
14
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Step 3: P-Values
Calculate the probability of the estimate under the null hypothesis

A p-value is the __________________ of having a value as extreme or more extreme than the one we measured
under the null hypothesis (H0).

Your p-value is determined by your ________________________________:


Remember to
shade this area –
 If Ha is µ > µ0: p-value = ____________________ it is the p-value.

If the p-value is very small under H0.


Under Ha (with µ > µ0), since the curve
will shift to the right, the probability of Sampling Distribution of the mean µ0
such extreme values will be higher.
Distribution of the test statistic (Z) 0 Z

 If Ha is µ < µ0: p-value = _________________________

Remember to
shade this area –
it is the p-value.

Sampling Distribution of the mean x µ0

Distribution of the test statistic (Z) Z 0

 If Ha is µ ≠ µ0: p-value = ______________________________________

Remember to
shade these areas
– it is the p-value.
Note: x can be on either side of µ0 for
this case (z can also be on either side
of 0 in this case).

Sampling Distribution of the mean µ0


Distribution of the test statistic (Z) 0

15
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Step 4: Conclusions
State a conclusion regarding evidence against the null hypothesis

 α = significance level
 We usually use α = 0.05, unless the problem states otherwise

P-value

Your conclusion must always:

1. include the word population


2. be written in terms of the story
3. be written in terms of the null hypothesis

How to write your conclusion template:

We __________________________ (Reject/DO NOT REJECT) the null hypothesis and conclude that there

________________________________ (is/IS NOT ENOUGH) evidence that the population mean

___________________________ <insert what we are studying> is _______________________ <alternative

hypothesis in words>.

Your conclusion ___________________________ relates to the _____________________________________.

16
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Example: A business analyst in 2015 stated that the average income of statisticians with Masters degrees is
US$50,000 per annum. A research group randomly selected 50 statisticians with Masters degrees and asked them
to report their annual income. They concluded from their study that the sample average was US$51,600.
Assuming that the standard deviation of the population is US$4,000, conduct a significance test to see if there is
evidence of an increase in true population mean income when compared to the 2015 income. Use α=0.01.

What is given: 𝑥̅ = _____________ σ = ______________ n = ______ µ0 = ____________

Keyword(s): ______________________________________________

Step 1: Write the null and alternative hypotheses

H0:

Ha:

Step 2: Calculate a test statistic to measure the compatibility between the null hypothesis and the data

𝑥̅ − 𝜇0
𝑧= 𝜎 =
⁄ 𝑛

This graph is in terms of the std.


Step 3: Calculate the probability of the estimate under the null hypothesis normal distribution (ie. in terms of the
test statistic)

Step 4: State a conclusion regarding evidence against the null hypothesis

We __________________________ (Reject/DO NOT REJECT) the null hypothesis and conclude that there

________________________________ (is/IS NOT ENOUGH) evidence that the population mean

___________________________________________________________ <insert what we are studying> is

________________________________________________________ <alternative hypothesis in words>.

17
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

As an exercise, complete the table below. For guidance, you may use the table in the following section (page 21).

One Tailed
Two Tailed
Upper Lower

H 0:

H a:

Test Statistic:

P-Value:

4.4 – INFERENCE FOR POPULATION PROPORTIONS

How do we define a “proportion”?

A proportion is a _______________________________________ with only _______ possible values.

Examples of responses: Yes/No Success/Failure Agree/Disagree

Example: The proportion of boys (6-10 years old) that like to play soccer. Get an SRS of size n. We ask each boy
in the sample the question “do you like to play soccer?” We count the number of YES (successes).

(Note that ‘success’ does not always mean something good. In these cases, ‘success’ relates to the outcome of
interest).

Proportion = 𝑝̂ = _________________________________ = _________

Notation:
 The ____________________ proportion is denoted by ________.

 The ____________________ proportion is denoted by ________ - called 𝑝̂ (p-hat).

18
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

4.4.2 – Sampling Distribution

 The mean of the sampling distribution of the sample proportion is ___________________.

 The standard deviation of the sampling distribution of the sample proportion is __________________.

 The distribution of the sampling distribution of the proportion is _________________________.

 The standardization formula is:

Example: In recent years, convertible sports cars have become very popular in Japan. Toyota is currently shipping
Celicas to Los Angeles, where a customizer does a roof lift and ships them back to Japan. Suppose that 25% of all
Japanese in a given income and lifestyle category are interested in buying Celica convertibles. A random sample of
100 Japanese consumers in the category of interest is to be selected. What is the probability that at least 20% of
those in the sample will express an interest in a Celica convertible?

What have we been given: 𝑝̂ = __________________ p = _________________ n = __________

19
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

4.4.3 – Confidence Intervals

The confidence interval for proportions is:

Remember to circle the standard of error & note formula for the margin of error

The confidence interval above may be used when 3 conditions hold:

1) The confidence level is 90%, 95% or 99%

2) 𝑛𝑝̂ ≥ 10 AND 𝑛(1 − 𝑝̂ ) ≥ 10

3) The population is at least 20 times n.

Note:
 A proportion takes on values between _______ and ______

 𝑛𝑝̂ = __________________________________________________

 𝑛(1 − 𝑝̂ ) = _____________________________________________

4.4.4 – Sample Size Calculation

The sample size for a desired margin of error, n is equal to:

p* is a guessed value (based on previous knowledge) OR 0.5 if a guessed value is not given.

Remember ALWAYS round your sample size UP to the nearest whole number!

20
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

4.4.4 – Hypothesis Testing

The hypotheses have to be stated based on population proportions.

One Tailed
Two Tailed
Upper Lower

H 0: p = p0 p = p0 p = p0

H a: p ≠ p0 p > p0 p < p0

Test Statistic:

P-Value: 2 P(Z > |z|) P(Z > z) P(Z < z)

This test statistic is okay when 3 conditions hold:

1) np0 ≥10

2) n(1 - p0) ≥10

3) the population size is at least 20 times n.

Example (Mango farms affected by drought in early March): Droughts in early March can seriously damage mango
production. Imagine that such was the situation a year ago in Trinidad. In order to learn how bad the damage was,
a team was set to record data from a simple random sample of 100 mango trees belonging to 10 different farms. 80
of 100 mango trees were reported to have suffered damage due to the drought in early March.

1) Given an estimate of the proportion 𝑝̂ (p_hat) of mango trees damaged by the drought in early March.

X = ______________ n = ____________

21
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

2) Give an estimate of the standard error of 𝑝̂ .

3) Give a 95% confidence interval for the true proportion (population) of mango trees damaged by drought in
early March.

C = ______ z* = _________ p̂ = _________ n = __________

4) Are these results valid?


Is this okay?

 npˆ = ___________________________________________ __________

 n(1  pˆ ) = _______________________________________ __________

 20 * n = _________________________________________ __________

Valid? _____________

22
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

5) With a sample of 100 mango trees, at the 95% confidence level, the margin of error is (1.96 * 0.04) =
0.0784. That results in a CI (as calculated in Question 3) of [0.7216, 0.8784]. Imagine that you would like
to provide a confidence interval for the population proportion smaller than the one above.

If the study would like a margin of error of 0.03 when calculating the true population proportion of mango
trees damaged, how big a sample should be collected? The guessed value for the proportion of affected
trees is 0.7.

p* = ______ C = ________ z* = _________ m = __________

6) It is known from data recorded over the years, that the population proportion of damaged mango trees by
drought in early March is 0.7. Based on the data collected in this study, do we have evidence that the
population proportion of damaged mango trees is larger than 0.7?

Step 1: Write the null and alternative hypotheses

H0:

Ha:

Step 2: Calculate a test statistic

𝑝̂ − 𝑝0
𝑧= =
𝑝 (1−𝑝0 )
√ 0
𝑛

Step 3: Calculate the probability of the estimate under the null hypothesis (ie the p-value)
This graph is in terms of the density of the
sample proportion (ie. in terms of p_hat)

0.7 0.8
23 p0
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference

Step 4: State the conclusion

We ______________ (Reject/Do Not Reject) the null hypothesis and conclude that there

________________ (Is/Is Not) evidence that the population proportion of ___________________

____________________ <insert what is being measured> is ______________________________

________________________________ <insert what is being tested in the alternative hypothesis>.

General Reminders:

Hypothesis Test vs Confidence Interval

We use a Confidence Interval when ________________________________________________________

______________________________________________________________________________________

______________________________________________________________________________________

We use a Hypothesis Test when ____________________________________________________________

______________________________________________________________________________________

______________________________________________________________________________________

______________________________________________________________________________________

24

You might also like