2006 Geog090 Week06 Lecture01 CentralLimitTheorem
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
Inferential Statistics
Estimation (of a population parameter) - The estimation
part of the process calculates an estimate of the parameter
from our sample (called a statistic), as a kind of guess
as to what the population parameter value actually is
Hypothesis testing - This takes the notion of estimation a
little further; it tests to see if a sampled statistic is really
different from a population parameter to a significant
extent, which we can express in terms of the probability
of getting that result
Estimation
Another term for a statistic is a point estimate, which is
simply an estimate of a population parameter
The formula you use to compute a statistic is an
estimator, e.g.
i=n
Point
Estimate
x=
Sx
i=1
Estimator
Estimation
It is quite unlikely that our statistic will be exactly the
same as the population parameter (because we know
that sampling error does occur), but ideally it should be
pretty close to right, perhaps within some specified
range of the parameter
We can define this in terms of our statistic falling within
some interval of values around the parameter value (as
determined by our sampling distribution)
But how close is close enough?
of each sample
When n increases:
1. The distributions becomes more and more normal
2. The spread of the distributions decreases
Source: http://davidmlane.com/hyperstat/A14461.html
Standard Error
The standard deviation of the sampling distribution
of the mean (X) is formulated as:
X =
99.7%
95%
68%
f(x)
-3
-2
P(Z>=2.0) = 0.0228
P(Z>=1.96) = 0.025
-1
+1
+2
+3
s
N
2
s
N
pr
1.96
x 1.96
0.95
N
N
pr 1.96
x 1.96
0.95
n
n
pr
x 1.96
x 1.96
0.95
n
n
pr
Standard
error
error
x z
x z
1
n
n
Example
Income Example: Suppose we take a sample of
75 students from UNC and record their family
incomes. Suppose the incomes (in thousands of
dollars) are:
28 29
35
42 158 167
235
x 89.96, s 51.68
pr
x 1.96
x 1.96
0.95
n
n
Source: http://www.stat.wmich.edu/s160/book/node46.html
Example
pr 89.96 1.96
89.96 1.96
0.95
75
75
pr
51.68
51.68
89.96 1.96
89.96 1.96
0.95
75
75
Example
pr
51.68
51.68
89.96 1.96
89.96 1.96
0.95
75
75
Constructing a Confidence
Interval - Steps
1. Select our desired level of confidence
Lets suppose we want to construct an interval
using the 95% confidence level
2. Calculate and /2
Constructing a Confidence
Interval - Steps
4. Multiply the z-score by the standard error to find
the margin of error
Z / 2
1.96
n
n
( x 1.96
, x 1.96
)
n
n
/2
Z/2
90%
0.1
0.05
1.645
95%
0.05
0.025
1.96
99%
0.01
0.005
2.58
Constructing a Confidence
Interval - Example
Suppose we conduct a poll to try and get a sense of the
outcome of an upcoming election with two candidates.
We poll 1000 people, and 550 of them respond that they
will vote for candidate A
How confident can we be that a given person will cast
their vote for candidate A?
1. Select our desired levels of confidence
Were going to use the 90%, 95%, and 99% levels
Constructing a Confidence
Interval - Example
2. Calculate and /2
Our values are 0.1, 0.05, and 0.01 respectively
Our /2 values are 0.05, 0.025, and 0.005
3. Look up the corresponding z-scores
Our Z/2 values are 1.645, 1.96, and 2.58
5. Multiply the z-score by the standard error to find
the margin of error
First we need to calculate the standard error
Constructing a Confidence
Interval - Example
5. Find the interval by adding and subtracting this
product from the mean
In this case, we are working with a distribution we have
not previously discussed, a normal binomial
distribution (i.e. a vote can choose Candidate A or B, a
binomial function)
We have a probability estimator from our sample, where
the probability of an individual in our sample voting for
candidate A was found to be 550/1000 or 0.55
We can use this information in a formula to estimate the
standard error for such a distribution:
Constructing a Confidence
Interval - Example
5. Multiply the z-score by the standard error cont.
For a normal binominal distribution, the standard
error can be estimated using:
X =
(p)(1-p)
n
(0.55)(0.45)
1000
= 0.0157
Constructing a Confidence
Interval - Example
5. Multiply the z-score by the standard error cont.
We calculate the margin of error and add and subtract
that value from the mean (0.55 in this case) to find the
bounds of our confidence intervals at each level of
confidence:
CI
Z/2
Margin
of error
Bounds
Lower
Upper
90%
1.645
0.026
0.524
0.576
95%
1.96
0.031
0.519
0.581
99%
2.58
0.041
0.509
0.591
t-distribution
The central limit theorem applies when the sample size
is large, only then will the distribution of means
possess a normal distribution
When the sample size is not large, the frequency
distribution of the sample means has what is known as
the t-distribution
t-distribution is symmetric, like the normal distribution,
but has a slightly different shape
The t distribution has relatively more scores in its tails
than does the normal distribution. It is therefore
leptokurtic
t-distribution
The t-distribution or Student's t-distribution is a
probability distribution that arises in the problem of
estimating the mean of a normally distributed population
when the sample size is small
It is the basis of the popular Student's t-tests for the
statistical significance of the difference between two
sample means, and for confidence intervals for the
difference between two population means
t-distribution
The derivation of the t-distribution was first published
in 1908 by William Sealy Gosset. He was not allowed to
publish under his own name, so the paper was written
under the pseudonym Student
The t-test and the associated theory became well-known
through the work of R.A. Fisher, who called the
distribution "Student's distribution"
Student's distribution arises when (as in nearly all
practical statistical work) the population standard
deviation is unknown and has to be estimated from the
data
pr
14.43
14.43
21.93 2.045
21.93 2.045
0.95
30
30
pr
14.43
14.43
21.93 2.045
21.93 2.045
0.95
30
30