0% found this document useful (0 votes)
86 views

Inferential PDF

To estimate the population mean μ, the point estimator x̄ is used. A confidence interval provides a range of values that is likely to contain the true population mean μ based on the sample data. For a large sample size (n ≥ 30), a 95% confidence interval for the population mean μ can be calculated as x̄ ± 1.96(σ/√n), where x̄ is the sample mean, σ is the population standard deviation, and 1.96 is the z-score associated with 95% confidence. This interval captures the true population mean μ 95% of the time if multiple samples are taken.

Uploaded by

Luis Molina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Inferential PDF

To estimate the population mean μ, the point estimator x̄ is used. A confidence interval provides a range of values that is likely to contain the true population mean μ based on the sample data. For a large sample size (n ≥ 30), a 95% confidence interval for the population mean μ can be calculated as x̄ ± 1.96(σ/√n), where x̄ is the sample mean, σ is the population standard deviation, and 1.96 is the z-score associated with 95% confidence. This interval captures the true population mean μ 95% of the time if multiple samples are taken.

Uploaded by

Luis Molina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1 Inferential Statistic

Population versus Sample, parameter versus statistic


A population is the set of all individuals the researcher intends to learn about. A sample is
a subset of the population and represents the individuals from which data will actually be
sampled.
Parameters describe the population, e.g. a population mean µ (the mean GPA of all students
at MacEwan)or a standard deviation σ. To learn more about the value of parameters, sample
data is collected and statistics are calculated. The sample mean is a a statistic which we will
see can give us insight into the value of the population mean (a parameter).

The objective of inferential statistics is to use sample data to obtain results about
the whole population.

In a first step the goal is to describe an underlying population and estimate the parameters
with the help of statistics. There are two different approaches for estimating: Point Estimation
and Interval Estimation.

• For Point Estimation one value is given as an estimate a parameter, which is hopefully
close to the true unknown value. We can not expect to find the precise value describing
the population when only using data of a sample.
• For Interval Estimation you give an interval of likely values, where the width of the
interval will depend on the confidence you require to have in this interval.

Example 1
Assume we were interested in the mean income of all Canadians. In order to calculate the
value of this parameter we would have to conduct a census, and get this information from each
individual. Doable???
How much can we learn from a sample? Assume drawing a simple random sample of 500
Canadians and obtaining their income. The sample mean of these numbers should present
some insight into the mean income of all Canadians. But we can not expect that the sample
mean equals the population mean, also if we take another ransom sample, we can not expect
that the two sample means are the same. This describes the sampling variability of a statistic.

In order to learn how sample data can be used to learn about parameters we need to apply
what we learned about the sampling distributions in the previous section.

1
2 Estimation of a population mean µ
2.1 Point Estimator for µ

Definition:

• The parameter is the population characteristic one would like to know.

• An estimator is a statistic (based on sample data) for obtaining estimates for the param-
eter.

• A point estimate of a possible characteristic is a single number that is obtained through


an estimator and represents a plausible value of the characteristic.

Example:

• The sample mean x̄ is a point estimator for the parameter population mean µ.

• The sample standard deviation s is a point estimator for the population standard deviation
σ.

Example:

• To estimate the average height µ of students in this class, we take a sample of size 10 and
calculate a sample mean (estimator) of x̄ =172.9cm (estimate). We estimate the mean
height in this class is 172.9cm!

• The sample standard deviation s (estimator) in the sample of 10 students from this class
is s =9.3cm. We estimate that the population standard deviation σ of the height in this
class is 9.3cm(estimate).

A point estimator gives a single value (estimate) that is supposed to be close to the true value
in the population but it doesn’t tell how close the estimate is.
One desirable property of an estimator is that the mean of it’s distribution equals the parameter
it is supposed to estimate.

Definition:
An estimator is said to be unbiased estimator for a parameter if the mean of its distribution is
equal to the true value of the parameter. Otherwise is said to be biased.

The sample mean x̄ of n observation is an unbiased estimator for the population mean µ, since
we saw that µx̄ = µ.

But every single observation is also an unbiased estimate!


Intuition tells us that the sample mean is the better estimator. And the larger the sample the
better the estimate. Why?

Remark:

2
Given a choice between several unbiased statistics for a given population characteristic, the best
statistic to choose is the one with the smallest standard deviation of its distribution. Because
a smaller standard deviation means that on average the estimates from all the different samples
will be closer to the true value.

Since the standard deviation of the the mean σx̄ = σ/ n and the standard deviation of a
single observation is σ, this remark leads us to choose the sample mean as the better estimator
(unbiased statistic), and prefer larger samples since the larger the sample the smaller σx̄ . The
intuition was right!

Remark:
The sample standard variance Pn
2 − x̄)2
n=1 (xi
s =
n−1
is an unbiased estimator for estimating the population variance σ 2 . In fact, the denominator
has √to be n − 1 in order for this statistic to be unbiased. (This statement doesn’t imply that
s = s2 is an unbiased estimator for σ, in fact it usually under estimates the true value of σ.)

Definition:
The distance between an estimate and the true parameter is called the error of estimation.

Definition:
The standard error of a statistic is the standard deviation of the statistic.

Remark: For unbiased estimators, the error of estimation will be most likely (with probability
0.95 for normal distributions) less than 1.96 standard errors (SE)–(Compare Empirical Rule).
But on the other hand we find that for large populations P(x̄ = µ) = 0, a frustrating result,
because we are 100% certain that the value we give is wrong. We only know that x̄ should be
close to µ, but again do not know how close.
To deal with this dilemma we give an interval for estimating µ instead of just one value.

3
Summary:
Estimation of a population mean µ
To estimate the population mean µ, the point estimator x̄ is unbiased, with the standard error
estimated as
σ
SE = √
n

2.2 Confidence Intervals

As an alternative to point estimation we can report not just a single value for the population
characteristic, but an entire interval of reasonable values based on sample data. A measure of
confidence will be connected to such an interval.
For example we could give
σ
x̄ ± 2 √
n
to estimate µ.
Then the chances that we capture µ with such an interval is about 95% (it actually 0.9544).
(This means that for 95% of samples the resulting interval (calculated using this formula)
captures the true population value).
If we do not use 2, but 3 as a factor in the interval this chance will increase.
If make the factor smaller the probability to capture µ will decrease.
In general:

Definition: A C%–confidence interval for a population characteristic is an interval of values


for the characteristic. It is constructed such that the proportion of samples with intervals
capturing the true value of the characteristic equals the confidence level C.

Remark:
• The confidence level provides information on how much confidence we can have in the
method (formula) used to construct the interval estimate.
If we were to use the method for different samples, C gives the proportion of intervals,
that the true value falls into the calculated intervals.
• You also can give the confidence level in percent.
• Usual choices for the confidence level are 90%, 95%, or 99%.
• Most confidence intervals are of the form

(point estimator) ± margin of error

• Commonly used critical values


Confidence
Coefficient
C z∗
0.90 1.645
0.95 1.96
0.99 2.58

4
2.2.1 Large-Sample z -Confidence Interval for a Population Mean µ
If we use the statistic x̄ for estimating the population mean µ, we can use the following infor-
mation from the Central Limit Theorem in order to obtain a confidence interval for µ.

• µx̄ = µ

• σx̄ = σ/ n standard error of x̄.

• If n ≥ 30, we can assume that the sampling distribution of x̄ is approximately normal.

This leads to the following confidence interval for the population mean µ.

The One-Sample z Confidence Interval for µ


If n ≥ 30 and the standard deviation σ is known an (1−α) confidence interval for the population
mean µ is given by !
∗ σ
x̄ ± z √
n

With z ∗ being the (1 − C)/2) percentile of the standard normal distribution (Table A).
Usually σ is unknown. In the case, that σ is unknown, it can be approximated by the sample
standard deviation s when the sample size is large (n ≥ 30) and the approximate confidence
interval is

Example: Bardwell, Ensign & Mills (2005) assessed the moods of 60 male U.S. Marines
following a month-long training exercise in the arctic. Mean mood scores were compared to
population norms for college men is 8.9. The Marine mean is 13.33 pts. and the sample sd is
2.0 pts. (which we will use instead of σ, since we do not have the population value.)
Do the data indicate that the mood of the Marines was higher after the exercise?
Find a 95% confidence interval for the mean mood of U.S. Marines after an arctic exercise.
!
s
x̄ ± zα/2 √
n
!
2.0
13.33 ± 1.96 √
60

13.33 ± 1.96(0.258) → 13.33 ± 0.506

Resulting in the interval [12.824, 13.836]. We can be 95% confident that the mean mood of
U.S. marines after arctic exercises falls between 12.8 and 13.8. Since the entire interval falls
above 8.9, we can also be 95% confident that the mean mood of the U.S. Marines is after such
an exercise higher than for college men.
If we would have wished to calculate a 93% confidence interval, we needed to find the appropriate
z∗:
C = 0.93, then (1 − C)/2 = 0.035, use table A to find z ∗ = −1.81, or just use the positive
z ∗ = 1.81.

5
To find z ∗ from the table remember to locate the value closest to 0.035 inside the table, and
find -1.81 on the margin.

Example: A scientist interested in monitoring chemical contaminants in food, and thereby


the accumulation of contaminants in human diets, selected a random sample of n = 50 male
adults. It was found that the average daily intake of dairy products was x̄ = 756grams with a
standard deviation of s = 35grams.
An approximate 95% confidence interval for the mean daily intake of dairy products for men
is then:
!
∗ s
x̄ ± z √
n
!
35
756 ± 1.96 √
50

756 ± 9.70

Hence, the 95% confidence interval for µ is from 746.30 to 765.70 grams per day.
The true mean daily intake of diary products for men is with confidence 0.95 in the interval
from 746.30 to 765.70 grams per day.

Remember:
Being ”95% confident” means, if you were to construct 100 95% confidence intervals from 100
different random samples. Of the 100 intervals you expect 95 to capture the true mean, and 5
not to capture the mean.
In conclusion, you can not be sure that a specific confidence interval captures the true mean µ.

Comment: The margin of error for the estimation of µ is


σ
E = z∗ √
n

it determines the precision in the estimation of µ. For a fixed confidence level, increasing the
sample size decreases the margin of error and improves the precision of estimation .

2.2.2 Choosing the Sample Size


One of the important decisions, before drawing a sample, is how many experimental units from
the population should be sampled. That is: what is the appropriate sample size?
The answer depends on the specific object of investigation and the precision or accuracy one
wants to insure. A measure for the accuracy in estimation is the margin of error, E.

Argument: Suppose you want to estimate the average daily yield µ of a chemical process and
you want to insure with a high level of confidence that the estimate is not more than 4 tons of
the true mean yield µ.
In this situation you would require that the sampling error of x̄ in a C100% confidence interval
is less than 4 tons.

6
This will ensure, that if you would take 100 samples the distance between the true mean and
the sample mean from about C100 samples will be at most 4 tons.
In general for a given confidence level one choose the required precision for the estimation by
determining the largest value for the margin of error which seems acceptable.

From this the necessary sample size can be determined by solving E = zα/2 √σn for n. We require
that the margin of error in a C confidence interval is less or equal than E.
2
σ z∗σ

z∗ √ ≤ E ⇔ ≤n
n E

Go back to the example. Plan to do a 95% confidence interval for µ, where we allow a margin
of error not greater than E = 4.
At this point we still do not know σ, the standard deviation of the daily yield of this chemical
process.
If σ is unknown, what is the realistic case, you can use the best approximation available:

• An estimate s obtained from a previous sample.

• A range estimate based on knowledge of the largest and smallest possible measurement:
σ ≈ Range/4.

In this example assume a previous sample would have shown a sample standard deviation of
s = 21tons. Then  ∗ 2
z σ 1.9621 2
 
n≥ = = 105.8
E 4
We obtain that the sample size has to be at least 106 in order to estimate µ with a 95%
confidence interval, with a margin of error smaller than 4.

Find that this result is only approximate since we had to use an approximation for σ, but this
is still better than just choosing any number.

Example:
The financial aid office wishes to estimate the mean cost of textbooks per quarter for students
at a particular college. For the estimate to be useful, it should be used be within $20 of the
true population mean. How large a sample should be used to be 95% confident of achieving
this level of accuracy?
The financial aid knows that the amount spent varies between $50 and $450.
A reasonable estimate of σ is then
range 450 − 50
= = 100
4 4
The required sample size is
2 2
1.96σ 1.96 · 100
 
n≥ = = 9.82 = 96.04
E 20
So that in this case a sample size of at least 97 is required.

7
2.2.3 t-confidence interval for a mean µ
The problem with the large sample confidence interval for µ is that it requires us to know σ
the population standard deviation. This assumption is strong and never met.
For that reason we should replace the large sample confidence interval with an alternative, that
does not require σ.

So far the confidence interval was based on


x̄ − µ
Z = √ ∼ N (0, 1)
σ/ n
Now we will replace σ by the sample standard deviation s, which gives us
x̄ − µ
t= √
s/ n
this score follows a t-distribution with n − 1 degrees of freedom if the sample size is large, or the
population is normally distributed. The t-distribution is described in table IV in the textbook.
The t-distribution is also called ”Student’s t-distribution”. It was introduced by a mathemati-
cian called W.S. Gosset in 1908, who used the pen name ”Student”.

Student’s t distribution
Consider the t-score
x̄ − µ
t= √
s/ n
The distribution of the t-score only depends on one parameter, which is called the degrees of
freedom (df). ”Student” showed that the t-score is t distributed with n − 1 degrees of freedom
(df = n − 1). The appendix provides a table (Table IV) with values from this distribution for
different choices for the df .
The table gives uppertail areas.

t-confidence interval for µ


s
x̄ ± t∗n−1 √
n
where t∗n−1 is the (1 − C)/2 percentile of the t-distribution with df = n − 1.

Example: In the 1994 General Social Survey in the U.S. respondents were asked to rate their
political views on a seven point scale, where 1 = extremely liberal, 4 = moderate, and 7 =
extremely conservative. A report gives the following results
-------------------------------------------
N Mean Std Dev Std Err
2879 4.171 1.390 0.0259
-------------------------------------------
Since the sample size is so high there is no concern that the data is not coming from a normal
distribution when finding a confidence interval for the mean political view (on a 7 point scale).
Find a 99% confidence interval for the mean political view u sing a t-ci:
s
x̄ ± t∗n−1 √
n

8

x̄ = 4.171, s/ n = 0.0259, α = 0.01, α/2 = 0.005
Since df = n − 1 = 2878 is greater than any value in the table, we use the largest df and find
t∗n−1 ≈ 2.578, giving for the 99% ci

4.171 ± 2.578(0.0259) → 4.171 ± 0.0667 → [4.104, 4.238]

Since the interval falls entirely above 4 (moderate) we find that in 1994 on average Americans
were more conservative than liberal.

Example: A scientist is interested in monitoring the daily intake of dairy products in a pop-
ulation.
A sample of n = 50 people let to a sample mean of x̄ = 756g with a standard deviation of
s = 35g.
We will find a 95% confidence interval for µ =the mean daily intake of dairy products in this
population.
α = 0.05, so α/2 = 0.025 (upper tail area needed for finding the percentile in table IV),
df = n − 1 = 49, from table VI find t∗40 = 2.021 (use df=40), the largest value that is smaller
than the true df.
!
35
756 ± 2.021 √ → 756 ± 10.002 → [745.998, 766.002]
50
We are 95% confident that the mean daily intake of dairy products in this population falls
between 746g and 766g.

You might also like