L06 Inference
L06 Inference
Quantitative Methods 1
Introduction to Statistical Inference
Examples:
• There is a survey about the federal election. What can we learn about the real election’s outcome?
• The scores from students in QM1 Tutorial 25 are known. Can we infer the average score of the whole QM1 class?
• A bucket of water from lake Ontario is examined. How is the water quality of the lake?
• 100 volunteers’ results from a clinical trial for a new vaccine are observed. What is the vaccine’s effectiveness?
2
Simple Random Sample
Simple random sample
1. ABC Radio National / Sky News ask for people to dial in and say who they are going to vote for.
2. Telephone polling for the federal election.
3. Check the solders in the hospital to evaluate the effectiveness of the new helmet.
3
Statistical Inference and Random Variables
We use a random variable to represent a numerical measure of the population.
{
𝑋= 1 , voter prefers Democratic 𝑃 ( 𝑋 =1 )=𝑝
0 , voter prefers Republican 𝑃 ( 𝑋 =0 ) =1 −𝑝
Democratic wins, Republic wins Value is unobserved.
Q: we have a sample of 900 voters. Can we say something about the real value of ? 4
Estimation of a Proportion
5
Estimation of a Proportion
Suppose QM1 marks are distributed between 0 and 100. Let X denote the
mark of a random selected student and denote the probability distribution
of X by p(x). The mean QM1 mark would be
is unknown to us.
does not matter as we only care about the mean in this example.
The answer is yes.
We use the sample mean to estimate the population mean.
7
Estimation of the Mean
General framework to estimate the population mean: • Estimator: the sample mean is random
before the data is sampled/observed.
• Consider a population whose distribution has mean .
• Take a random sample of observations from this It is a methodology/rule to transform a
population. sample () into a value .
Therefore, it is a random variable
Let denote the random variable for the first observation.
It is random because until the sample is taken its value is
uncertain. • Estimate: the value of the sample mean
Another sample would (generally) give a different value for . (unfortunately, same name😭) that you
calculated.
Similarly let correspond to the other observations.
It is a realization from the random variable.
The sample mean is an estimator for the population mean
8
Estimation of the Mean
Example. QM1 marks
The mean mark for semester 1 2021 is 69% (you have the data from Week 1). i.e., 69 (we are
lucky, this time. Usually is an unknown number)
Only a simple random sample of marks may be available.
e.g., = 63, 68, 57, 51, 63, 73, 58, 57, 63, 71, 50, 76, 84, 71, 51, 74, 50, 86, 70, 66
The sample mean is an estimator of the population mean . This is NOT the true , but we
The number 65.1 is an estimate of the population mean . hope that it is a good
approximation.
Is it really good? Yes, more
Consider as a random variable and 65.1 is one random draw from it. details are coming.
This is the same idea if you see
A different sample will give different value of in general
9
M&M Colors
"Were the rankings of the University of Melbourne an important factor in your choice to study
here?”
Out of 1021 responses, 884 said "yes", the rest will be treated as "no".
For now, think of these 1021 students as the population.
12
Population Proportion and Population Mean
Define a random variable
13
Sample and Sample Mean
Define a random variable
14
Estimation
Question: can we use the observed (data) to infer the unobserved (e.g., population mean)
• Our interest is the population mean (also population proportion in the example)
• We can use the sample mean to estimate the population mean (also in the example).
• Could be unreasonably far from . If so, what is our chance of being so unlucky.
If it is random, then it could have a mean… this helps us to understand the central tendency of .
( ) ( )
𝑛
1 1 1 1
𝐸 ( 𝑋 )= 𝐸 ∑ 𝑋 𝑖 =𝐸 𝑋 1+ 𝑋 2 +…+ 𝑋 𝑛
𝑛 𝑖=1 𝑛 𝑛 𝑛
1 1 1
¿ 𝐸 ( 𝑋 1 ) + 𝐸 ( 𝑋 2 ) +… 𝐸 ( 𝑋 𝑛)
𝑛 𝑛 𝑛
1 1 1
¿ 𝜇+ 𝜇+ …+ 𝜇=𝜇
𝑛 𝑛 𝑛
16
Sampling Distribution
𝐸 ( 𝑋 )=𝜇
Read it as: the expected value of the sample mean is the population mean
Or
The mean of the sample mean is the population mean.
Interpretation: the sample mean (as a random variable) is centered around the true
value of the population mean.
Heuristic example:
1. You and MANY of your classmates each take a random sample of 20 observations.
2. Each of you calculate a sample mean from your own data.
3. In general, each of you have a different value of the sample mean.
4. The average of the sample means from you and your classmates is the true mean.
18
Unbiased Estimator
Definition
19
ECON10005
Quantitative Methods 1
Consistency
Recall: In our example, is random, we can find its mean to understand its central tendency.
We can also check its variance to evaluate its dispersion from its mean.
If the variance is large, a single is more likely to be far away from the true mean .
If the variance is small, a single is more likely to be close to the true mean
21
Variance of the Sampling Distribution
( ) ( )
𝑛
1 1 1 1
𝑣𝑎𝑟 ( 𝑋 )=𝑣𝑎𝑟 ∑ 𝑋 𝑖 =𝑣𝑎𝑟 𝑋 1+ 𝑋 2+…+ 𝑋 𝑛
𝑛 𝑖=1 𝑛 𝑛 𝑛
1 1 1
¿ 2
𝑣𝑎𝑟 ( 𝑋 1 ) + 2
𝑣𝑎𝑟 ( 𝑋 2 ) + …+ 2
𝑣𝑎𝑟 ( 𝑋 𝑛 ) Each pair because of independence
𝑛 𝑛 𝑛
1 2 1 2 1 2
¿ 2
𝜎 + 2
𝜎 + …+ 2
𝜎
𝑛 𝑛 𝑛
𝜎2
¿
𝑛 22
Summary
For a simple random sample , where and ,
the sample mean
24
Consistency
is said to be consistent because
• It is unbiased (for QM1 now and can be relaxed in more advanced subjects)
• Its variance converges towards zero as n grows
• Another random sample with 20 (different) observations may give (18 answer “yes”)
𝑛=20
26
Consistency
is said to be consistent because
• It is unbiased
• Its variance converges towards zero as n grows
𝑛=20
As the sample size increases, the sampling
𝑛=50 distribution of is more concentrated around the
𝑛=100 true mean .
A large sample may end up with a value far from the truth
• It is unbiased
• Its variance converges towards zero as n grows
This is consistency
28
ECON10005
Quantitative Methods 1
Central Limit Theorem
30
Dice: sum of 2
31
Dice: sum of 3
32
Dice: sum of 4
33
Dice: sum of 5
34
Dice: sum of 6
35
Dice: summary
37
Central Limit Theorem
Shift toward 0 and squeeze (or stretch) it to have unit variance by construct the ”Z-score”
You can verify that Z has zero mean and its variance is 1
38
Central Limit Theorem
If are independent with and
Define
39
Central Limit Theorem
If are independent with and Define
Example:
1. You roll dice and record their average . We can derive that one dice
and .
2. Your friends will do the same as you did. She could have a different value of , hence a
different
3. You have MANY friends, and each does the same experiment.
40
n=2
The distribution of a dice’s outcome is discrete, hence NOT continuous
n=2 is a very small sample size
41
n=3
You and MANY of your friends toss 3 dice each and compute the Z score,
42
n=10
You and MANY of your friends toss 10 dice each and compute the Z score,
43
From 2 to 30
Note:
Structure of poll: survey randomly chosen American adults, ask if they believe in ghosts.
Let denote population proportion of belief in ghosts. (we have shown that it is also the population mean)
45
CLT: Application
The sample mean is not a Binomial random variable. We can approximate it by a normal
distribution through CLT.
From the dice example, it won’t be surprising that the score associated with converges to the
standard normal distribution.
46
CLT: Application
-score’s distribution for different sample sizes
49
CLT: Another Application
50
CLT and related randomness
is a random variable
A value is observed
Define
52
𝑋 −μ
CLT Q&A 𝑍=
σ
√𝑛
Q: Why CLT?
Q: Since I can ONLY see one value from my data, does it mean I have only one -score?
A: yes
Q: What is the point of learning a distribution of or if I can only see one value?
53