0% found this document useful (0 votes)
118 views37 pages

Statistics: Shaheena Bashir

This document discusses interval estimation and confidence intervals for population means. It explains how to construct large sample confidence intervals for a population mean when the population standard deviation is known and the sample size is greater than or equal to 30. The formula uses the sample mean, standard deviation, sample size, and t-statistic. It also discusses how to construct small sample confidence intervals when the population standard deviation is unknown and the sample size is less than 30, using the t-distribution. Factors that influence the width of the confidence interval are also addressed.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views37 pages

Statistics: Shaheena Bashir

This document discusses interval estimation and confidence intervals for population means. It explains how to construct large sample confidence intervals for a population mean when the population standard deviation is known and the sample size is greater than or equal to 30. The formula uses the sample mean, standard deviation, sample size, and t-statistic. It also discusses how to construct small sample confidence intervals when the population standard deviation is unknown and the sample size is less than 30, using the t-distribution. Factors that influence the width of the confidence interval are also addressed.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

1/37

Statistics

Inferential Statistics
Interval Estimation

Shaheena Bashir

FALL, 2019
2/37
Outline

Inferential Statistics

Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Small Sample Confidence Interval for Population Mean µ
Confidence Interval for Population Proportion

Confidence Interval for the difference between 2 Population Means


Confidence Interval for the Difference between 2 Means: Large
Samples
Confidence Interval for the Difference between 2 Means: Small
Independent Samples
Confidence Interval for the Mean Difference: Paired Samples
Confidence Interval for the difference in two Population
Proportions
o
3/37
Inferential Statistics

Introduction

I Statistical inference is the act of generalizing from a sample to


a population with calculated degree of certainty.
I Use a set of sample data to draw inferences (make
statements) about some aspect of the population which
generated the data (the sample needs to be drawn randomly).

o
4/37
Inferential Statistics

I Intuitively: Absolute certainty about population


characteristics cannot be attained based on a finite sample of
observations
o
5/37
Inferential Statistics

o
6/37
Inferential Statistics

Statistical Inference: Estimation

How can we use sample data to estimate values of population


parameters?
I Estimation
I Point estimate: A single statistic value that is the “best
guess” for the parameter value, e.g., the average salary of
accountants is Rs.100, 000
I Interval estimate: An interval of numbers around the point
estimate, that has a fixed “confidence level” of containing the
parameter value, i.e., what region of parameter values is most
consistent with the data? Also called a confidence interval,
e.g., We are 95% confident that the average salary of
accountants is between Rs.80, 000 & Rs.120, 000

o
7/37
Inferential Statistics

Point Estimate: Examples

Most common is to use sample values, e.g.,

Statistic Estimates (Population Parameter)


µ̂ or x̄ µ
σ̂ or s σ
ρ̂ or r ρ
β̂ or b β

o
8/37
Interval Estimation

Interval Estimate

I A confidence interval (CI) is an interval of numbers believed


to contain the parameter value.
I The probability the method produces an interval that contains
the parameter is called the confidence level. Most studies use
a confidence level close to 1, such as 0.95 or 0.99.
I Most CIs have the form:
point estimate ± margin of error
I reminding that the point estimates have variability

o
9/37
Interval Estimation

Interval Estimate: Examples

I A very naive example, “I will arrive there at 10:00am, plus and


minus 5 minutes.”
I The average diastolic BP is 80. Based on a random sample of
25 males, we’ll be testing how accurately we can be able to
predict the diastolic BP of males within a given confidence
interval.

o
10/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ

I Sample mean x̄ is the point estimate of the population mean µ


I Due to sampling error sample mean x̄ will be different from
the population mean µ
I How close is the sample mean x̄ to the population mean µ?
I To gain insight into its precision, we surround the point
estimate with a margin of error

o
11/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ, σ known & n ≥ 30

I A specific interval estimate of a parameter is determined by


using the data obtained from the sample by using the specific
confidence level.
I The confidence level of an interval estimate is the probability
that the interval will contain the true parameter, e.g.,
1 − α = 0.95
I If repeated samples were taken and the 95% confidence
interval was computed for each sample, 95% of the intervals
would contain the population mean

o
12/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ


σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n

I Margin of Error: zα/2 √σn is called the margin of error or


maximum error of estimate. This is the maximum likely
difference between the point estimate of a parameter and the
actual value of the parameter.
I For n ≥ 30, the distribution of means is approximately normal
even if the original distribution of the variable departs from
normality.
I Confidence interval for mean can also be used to test the
hypothesis about mean. If the interval does not contain the
hypothesized mean µ, reject the null hypothesis Ho : µ = µo
o
13/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

o
14/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Factors Influencing Confidence Interval for Mean µ

σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n

I Width of confidence interval?


I Effect of σ on the confidence interval?
I Effect of n on the confidence interval?
o
15/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Factors Influencing Confidence Interval for Mean µ

σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n
A C.I. can be used as an indication of the precision of the
estimation:
I Short C.I.: precise estimation
I Long C.I.: imprecise estimation, much uncertainty

Confidence level zα/2


90% 1.645
95% 1.96
99% 2.58

o
16/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ: Example

Body temperature of a random sample of 130 humans gave a


mean temperature of 98.25 degrees and a standard deviation of
0.73 degrees.
I Construct a 95% confidence interval for the average body
temperature of healthy people
I Does the confidence interval in part 1 contains the value of
98.6 degrees, the usual average body temperature cited in
literature? If not what conclusions can you draw?

o
17/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

o
18/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ

Interpretation

o
19/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ, σ unknown & n < 30

s s
x̄ − tα/2 √ < µ < x̄ + tα/2 √
n n

I The degrees of freedom df are n − 1


I used for a situation when σ is unknown & n < 30
I tα/2 √sn is called the margin of error or maximum error of
estimate.

o
20/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ

t-Distribution Table

The shaded area is equal to α for t = tα .

df t.100 t.050 t.025 t.010 t.005


1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
32 1.309 1.694 2.037 2.449 2.738
34 1.307 1.691 2.032 2.441 2.728
36 1.306 1.688 2.028 2.434 2.719 o
38 1.304 1.686 2.024 2.429 2.712
∞ 1.282 1.645 1.960 2.326 2.576
21/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ: Example

Organic chemists often purify organic compounds by a method


known as fractional crystallization. An experimenter wanted to
prepare and purify 4.85 grams of aniline. Ten 4.85g quantities of
aniline were individually prepared and purified. The following dry
yields were recorded:

3.85 3.80 3.88 3.85 3.90


3.36 3.62 4.01 3.72 3.82

Construct a 95% confidence interval for the mean grams of dry


purified yield.

o
22/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ

Confidence Interval for Mean µ: Summary

n ≥ 30 n < 30
σ 2 known x̄ ± zα/2 √σn x̄ ± zα/2 √σn
σ 2 unknown x̄ ± zα/2 √sn x̄ ± tα/2,n−1 √sn

o
23/37
Interval Estimation
Confidence Interval for Population Proportion

Confidence Interval for the Population Proportion p

r r
p̂ q̂ p̂ q̂
p̂ − zα/2 < p < p̂ + zα/2
n n
q
I Margin of Error: zα/2 p̂nq̂ is called the margin of error or
maximum error of estimate. This is the maximum likely
difference between the point estimate of a parameter and the
actual value of the parameter.
I Confidence interval for proportion valid only if np ≥ 5 &
nq ≥ 5

o
24/37
Interval Estimation
Confidence Interval for Population Proportion

Example

Gallup poll of n = 1018 adults found 39% believe in evolution.


Construct a 95% confidence interval for the proportion of all adults
who believe in evolution.

o
25/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples

Comparison of Two Means

o
26/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples

Confidence Interval for the Difference Between 2 Means

s s
σ12 σ22 σ12 σ22
(x̄1 − x̄2 ) − zα/2 + < µ1 − µ2 < (x̄1 − x̄2 ) + zα/2 +
n1 n2 n1 n2

I used for a situation when σ12 & σ22 known


I When n1 ≥ 30 & n2 ≥ 30, but σ12 & σ22 unknown, replace σ12
& σ22 by s12 & s22
I If the confidence interval includes 0 we can say that there is
no significant difference between the means of the two
populations, at a given level of confidence.

o
27/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples

o
28/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples

Example

The dataset ”Normal Body Temperature” contains 130


observations of body temperature, along with the gender of each
individual. The data set separated for the two genders provides the
following information:

Gender Sample Size Sample Mean Sample SD


M 65 98.105 0.699
F 65 98.394 0.743

I Compute a 99% Confidence interval for the difference between


the mean body temperatures for men and women

o
29/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Small Independent Samples

Confidence Interval for the Difference between 2 Means

r r
1 1 1 1
(x̄1 −x̄2 )−tα/2 sp + < µ1 −µ2 < (x̄1 −x̄2 )+tα/2 sp +
n1 n2 n1 n2

I variances are assumed equal, i.e., σ12 = σ22


I df = n1 + n2 − 2
s
(n1 − 1)s12 + (n2 − 1)s22
sp =
n1 + n2 − 2

o
30/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Small Independent Samples

Example

We previously considered a subsample of n = 10 participants


attending the 7th examination of the Offspring cohort in the
Framingham Heart Study. The following table contains descriptive
statistics on the systolic blood pressure in the subsample stratified
by sex.

Gender Sample Size Sample Mean Sample SD


M 6 117.5 9.7
F 4 126.8 12.0

Construct a 95% confidence interval for the difference in mean


systolic blood pressures between men and women using these data.

o
31/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples

Small Dependent Samples: Paired Design

One of the Three Basic Principles of a successful randomized


control study is to control the effects of confounding variables by
comparing treatment to control. One way to do a comparison is a
matched pairs study, where individuals are matched in pairs.
I Before-after Data
I Twin Data
I Matched Case Control

o
32/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples

Small Dependent Samples

sd sd
d¯ − tα/2 √ < µd < d¯ + tα/2 √
n n

df = n − 1

o
33/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples

Example

Fifteen students were randomly selected from a population of 1000


students. The sampling method was simple random sampling. All
of the students were given a standardized English test and a
standardized math test. Test results are summarized below. Find
the 90% confidence interval for the mean difference between
student scores on the math and English tests. Assume that the
mean differences are approximately normally distributed.

o
34/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples

Student English Math Difference, d


1 95 90 5
2 89 85 4
3 76 73 3
4 92 90 2
5 91 90 1
6 53 53 0
7 67 68 -1
8 88 90 -2
9 75 78 -3
10 85 89 -4
11 90 95 -5
12 85 83 2
13 87 83 4
14 85 83 2
15 85 82 3
o
35/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions

r r
p̂1 q̂1 p̂2 q̂2 p̂1 q̂1 p̂2 q̂2
(p̂1 −p̂2 )−zα/2 + < p1 −p2 < (p̂1 −p̂2 )+zα/2 +
n1 n2 n1 n2
q
I Margin of Error: zα/2 p̂n1 q̂1 1 + p̂n2 q̂2 2 is called the margin of
error or maximum error of estimate. This is the maximum
likely difference between the point estimate of a parameter
and the actual value of the parameter.
I Confidence interval valid only if n1 p̂1 ≥ 5 n1 q̂1 ≥ 5, n2 p̂2 ≥ 5
& n2 q̂2 ≥ 5

o
36/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions

Confidence intervals in Decision Making

I When a confidence interval for p1 − p2 does not cover 0 it is


reasonable to conclude that the two population proportions
differ
I A value not in a confidence interval can be rejected as a likely
value for the population parameter

o
37/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions

Example

A large study was conducted to test the effectiveness of an


experimental blood thinner, clopidogrel to ward off heart attacks &
strokes. A total of 19185 (heart attack or stroke) patients were
randomly assigned into aspirin group or clopidogrel for a period of
1-3 years. Of 9925 patients taking aspirin, 5.3% suffered heart
attacks, strokes, or death from cardiovascular disease; the
corresponding percentage in 9260 clopidogrel patients was 5.8%.
Construct a 95% confidence interval for the difference in
proportion of patients who suffered any cardiovascular disease in
the two treatment groups.

You might also like