Unit 4 - Introduction to Statistical Inference Vs2
Unit 4 - Introduction to Statistical Inference Vs2
Unit Outline
4.1.1. Definitions
4.1.5. Distinguishing between a probability distribution of a random variable and the sampling distribution
of the mean
4.1.1 – Definitions
_____________________________.
5. Create a ____________________________________________.
Example: Suppose your population is made of up of the following 4 elements – 1, 2, 3 and 4. Create the
sampling distribution for the sample mean assuming we want a sample size of n = 2.
2
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
3
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
______________________________.
How close is the estimate (the sample mean 𝑥̅ ) to the parameter (the population mean μ)? What would you think
would be best?
4
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
If a population is N(µ, σ), then the distribution of the sample mean of n independent observations is N(µ, σ/√n).
i.e. ________________________.
If a ______________________________________________________________________________________
____________________________________________ 𝑥̅ is __________________________________________
4.1.4 – Distinguishing between the distribution of a random variance & the of Sampling Distribution of the
Sample Mean
Example: The distribution of times taken by male runners to complete a marathon follows roughly a Normal
distribution. Let X = the distribution of time taken by male runners to complete the marathon.
Therefore X ~ N(4.3hrs,1hr).
1. What is the approximate probability that a single male runner randomly chosen from the entire population
will complete a marathon in more than 5.0 hours?
5
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
2. Take a simple random sample of 50 male runners that have participated in the Cosmo City marathon. What
are the mean and standard deviation of the sampling distribution of sample means of size 50?
3. What is the approximate probability that the mean time to complete the marathon of these 50 runners is 5.0
hours or higher?
It is important to know the difference between the questions which are asking for regular probability from a normal
distribution and those asking for a probability related to the sampling distribution of the mean (i.e. sample mean).
6
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Examples: For the following questions, determine whether the situation relates to the:
Distribution of a random variable
Sampling distribution of the sample mean
Also write the corresponding probability statement.
1. A small resort is interested in the distribution of its occupancy rates. Data are collected daily for 3 weeks. The
resort would like to determine the probability of having over 10 rooms occupied on a given day during the
following year.
2. A company examines the distribution of the average time spent exercising in order to determine if a health
initiative is needed. They took samples from each of their 25 locations to compute the distribution. The
company takes an additional sample and determines the average hours a week exercising. They hope the
probability the sample mean is less than 1 hour is very small.
3. The distribution of the average amount of time spent watching TV on a weeknight is determined by looking at
the average of samples from all the elementary schools in NYC. They will run further analysis to examine the
probability that a sample of children spend, on average, between 2 and 4 hours watching TV on a weeknight.
Why do we even bother analysing data? We want to draw conclusions from the data.
Why can’t we just accept our sample mean or sample proportion as the official mean or proportion for the
population?
Every time we estimate the statistics 𝑥̅ (the sample mean) or 𝑝̂ (the sample proportion), we get a different
answer due to sampling variability.
7
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Inference
Two most common types of formal statistical inference are confidence intervals and significance tests.
_________________________________________________________________________________________
What we have: the __________________ of the __________________ (e.g. sample mean (𝑥̅ ) or sample proportion
(𝑝̂ )).
_________________________________________________________________________________________
8
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Confidence Intervals for single population means – population standard deviation KNOWN
𝜎 𝜎
We report the interval as (𝑥̅ − 𝑧 ∗ ( ) , 𝑥̅ + 𝑧 ∗ ( ))
√𝑛 √𝑛
Interpretation
Short form: We are ___% confident that the population mean _____________________ <insert what is being
Formal form: If we take repeated samples of size _____ <insert sample size> from the same population of
_________________ <insert what we are studying>, then we expect _____% of the confidence
9
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Example A specialty tea shop is studying the possibility of opening up at a Liguanea location. Before taking such a
decision they want to study the market. Out of the entire population of Liguanea residents that are 18 years old and
older, they select a SRS of 100. Each individual in the SRS is asked the question “how many boxes of tea have you
bought in the last year?” The sample mean turns out to be 3.0 and it is known that the standard deviation of the
population is 2.
a) Calculate the 90% confidence interval for the mean number of tea boxes bought by the entire population
that is 18 years old and older in Liguanea, in the last year.
z* = _______
Interpretation of result: We are 90% confident that the value of the mean number of boxes of tea
bought in the last year for the entire population is between _____________ and ________________.
b) Is it true that the number of tea boxes bought last year by 90% of the entire Liguanea population that is 18
years old and older is within the interval calculated in (a)? Answer: _____________________
If C = 0.9 (90%), our value of 𝑥̅ is one out of 100, and we know that µ is within the interval in 90% of the cases.
Without changing C, can we have a smaller margin of error? If so, then the value of µ would be within a smaller
interval (in C% of cases).
m=
10
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
We want to control the margin of error. We can do that by modifying the sample size:
n=
For the margin of error (width) to ↓ (decrease), we need the sample size to ______________________.
Example: Using the same example above answer the following question:
z* = _______
b) What should be the sample size if we want a margin of error of 0.1 for a 90% confidence interval?
z* = _______ n = ????
11
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
How rare is it? Is there some evidence that maybe the average grade for the population is larger than
60/100?
Step 1: Hypotheses
Write the null and alternative hypotheses – all hypothesis statements have 2 parts.
Null Hypothesis (H0): Usually represents the ______________________. It is the characteristic of the
c) Example: A researcher wants to determine if the mean family size is no longer 3.18 members
Example:
H0: µ = 3.18
Ha: µ ≠ 3.18
12
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Ha:
Aka: ______________________________________
Example: A researcher wants to determine if the mean family size is less than 3.18
members
Ha:
b) Upper Tail:
Aka: ______________________________________
13
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Example: A researcher wants to determine if the mean family size is more than 3.18
members
Example:
H0: µ = 3.18
Ha: µ > 3.18
Ha:
We want to calculate the test statistic that will allow is to calculate probabilities. Given the measured from the data
(sample size n), we want to calculate z (z Test for a population mean) assuming that the mean of the population is
the one under the null hypothesis (H0: µ = µ0)
z =
Step 3: P-Values
Calculate the probability of the estimate under the null hypothesis
A p-value is the __________________ of having a value as extreme or more extreme than the one we measured
under the null hypothesis (H0).
Remember to
shade this area –
it is the p-value.
Remember to
shade these areas
– it is the p-value.
Note: x can be on either side of µ0 for
this case (z can also be on either side
of 0 in this case).
15
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Step 4: Conclusions
State a conclusion regarding evidence against the null hypothesis
α = significance level
We usually use α = 0.05, unless the problem states otherwise
P-value
We __________________________ (Reject/DO NOT REJECT) the null hypothesis and conclude that there
hypothesis in words>.
16
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Example: A business analyst in 2015 stated that the average income of statisticians with Masters degrees is
US$50,000 per annum. A research group randomly selected 50 statisticians with Masters degrees and asked them
to report their annual income. They concluded from their study that the sample average was US$51,600.
Assuming that the standard deviation of the population is US$4,000, conduct a significance test to see if there is
evidence of an increase in true population mean income when compared to the 2015 income. Use α=0.01.
Keyword(s): ______________________________________________
H0:
Ha:
Step 2: Calculate a test statistic to measure the compatibility between the null hypothesis and the data
𝑥̅ − 𝜇0
𝑧= 𝜎 =
⁄ 𝑛
√
We __________________________ (Reject/DO NOT REJECT) the null hypothesis and conclude that there
17
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
As an exercise, complete the table below. For guidance, you may use the table in the following section (page 21).
One Tailed
Two Tailed
Upper Lower
H 0:
H a:
Test Statistic:
P-Value:
Example: The proportion of boys (6-10 years old) that like to play soccer. Get an SRS of size n. We ask each boy
in the sample the question “do you like to play soccer?” We count the number of YES (successes).
(Note that ‘success’ does not always mean something good. In these cases, ‘success’ relates to the outcome of
interest).
Notation:
The ____________________ proportion is denoted by ________.
18
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
The standard deviation of the sampling distribution of the sample proportion is __________________.
Example: In recent years, convertible sports cars have become very popular in Japan. Toyota is currently shipping
Celicas to Los Angeles, where a customizer does a roof lift and ships them back to Japan. Suppose that 25% of all
Japanese in a given income and lifestyle category are interested in buying Celica convertibles. A random sample of
100 Japanese consumers in the category of interest is to be selected. What is the probability that at least 20% of
those in the sample will express an interest in a Celica convertible?
19
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
Remember to circle the standard of error & note formula for the margin of error
Note:
A proportion takes on values between _______ and ______
𝑛𝑝̂ = __________________________________________________
𝑛(1 − 𝑝̂ ) = _____________________________________________
p* is a guessed value (based on previous knowledge) OR 0.5 if a guessed value is not given.
Remember ALWAYS round your sample size UP to the nearest whole number!
20
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
One Tailed
Two Tailed
Upper Lower
H 0: p = p0 p = p0 p = p0
H a: p ≠ p0 p > p0 p < p0
Test Statistic:
1) np0 ≥10
Example (Mango farms affected by drought in early March): Droughts in early March can seriously damage mango
production. Imagine that such was the situation a year ago in Trinidad. In order to learn how bad the damage was,
a team was set to record data from a simple random sample of 100 mango trees belonging to 10 different farms. 80
of 100 mango trees were reported to have suffered damage due to the drought in early March.
1) Given an estimate of the proportion 𝑝̂ (p_hat) of mango trees damaged by the drought in early March.
X = ______________ n = ____________
21
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
3) Give a 95% confidence interval for the true proportion (population) of mango trees damaged by drought in
early March.
20 * n = _________________________________________ __________
Valid? _____________
22
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
5) With a sample of 100 mango trees, at the 95% confidence level, the margin of error is (1.96 * 0.04) =
0.0784. That results in a CI (as calculated in Question 3) of [0.7216, 0.8784]. Imagine that you would like
to provide a confidence interval for the population proportion smaller than the one above.
If the study would like a margin of error of 0.03 when calculating the true population proportion of mango
trees damaged, how big a sample should be collected? The guessed value for the proportion of affected
trees is 0.7.
6) It is known from data recorded over the years, that the population proportion of damaged mango trees by
drought in early March is 0.7. Based on the data collected in this study, do we have evidence that the
population proportion of damaged mango trees is larger than 0.7?
H0:
Ha:
𝑝̂ − 𝑝0
𝑧= =
𝑝 (1−𝑝0 )
√ 0
𝑛
Step 3: Calculate the probability of the estimate under the null hypothesis (ie the p-value)
This graph is in terms of the density of the
sample proportion (ie. in terms of p_hat)
0.7 0.8
23 p0
ECON 1005: Introductory Statistics Unit 4: Introduction to Inference
We ______________ (Reject/Do Not Reject) the null hypothesis and conclude that there
General Reminders:
______________________________________________________________________________________
______________________________________________________________________________________
______________________________________________________________________________________
______________________________________________________________________________________
______________________________________________________________________________________
24