0% found this document useful (0 votes)
57 views

C-7 Sampling Distribu

1) A sampling distribution describes the distribution of all possible values of a statistic computed from random samples of the same size from a population. 2) The sampling distribution of a sample mean is the theoretical distribution of all possible means from random samples of size n. 3) As sample size increases, the sampling distribution approximates the normal distribution according to the central limit theorem. The standard error of the mean decreases with larger sample sizes.

Uploaded by

Biruk Mengstie
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

C-7 Sampling Distribu

1) A sampling distribution describes the distribution of all possible values of a statistic computed from random samples of the same size from a population. 2) The sampling distribution of a sample mean is the theoretical distribution of all possible means from random samples of size n. 3) As sample size increases, the sampling distribution approximates the normal distribution according to the central limit theorem. The standard error of the mean decreases with larger sample sizes.

Uploaded by

Biruk Mengstie
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 109

Sampling distribution

01/23/2023 1
Learning objectives

• Define sampling distribution


• Describe properties of sampling distributions
• Define point and interval estimate
• Compute confidence interval for single mean and
proportion as well as two means and two population
proportion

01/23/2023 2
Sampling distribution

• A sampling distribution is a distribution of all


possible values of a statistic computed from samples
of the same size randomly selected from the same
population
• Serves to answer probability questions about sample
statistics

01/23/2023 3
Con…

• The sampling distribution of a sample mean is a


theoretical probability distribution; it describes the
distribution of all sample means from all possible
random samples of the same size taken from a
population
• A sampling distribution is a distribution of a statistic
over all possible samples
01/23/2023 4
• When sampling a discrete, finite population, a sampling

distribution can be constructed

• However, this construction is difficult with a large

population and impossible with an infinite population

01/23/2023 5
Con…
• We consider sample statistics as random variables

Example:

• Age of individuals is a random variable


– Similarly, mean age is a random variable

• Take a sample (n) from N and calculate the statistic, e.g.,


mean

• Take another sample (same size) and calculate mean

• 01/23/2023
Repeat & repeat & repeat & ……….. 6
Con…

• Do you expect all the sample means the same? NO

• Sampling variability: the value of any statistic ( or ) varies in

repeated random sampling.

• They will vary BUT less variation

• Put all these sample statistics together to get a distribution of sample

statistics (Frequency distribution).

01/23/2023 7
Sampling Distribution of Sample Mean
• Distribution of values taken by statistic in all possible samples
of size n from the same population
• Model assumption: our observations xi are sampled from a
population with mean  and variance 2
Sample 1 of size n x
Sample 2 of size n x
Sample 3 of size n x Distribution
Population Sample 4 of size n x of these values?
Sample 5 of size n x
Unknown Sample 6 of size n x
Parameter:  Sample 7 of size n x
Sample 8 of size n x
.
.
.
01/23/2023 8
Sample Vs Population

• A parameter is an unknown value for an entire


population. Or it is a numerical property of a population.
• Therefore, it has to be estimated using a smaller sample
representative of the entire population.
• A parameter does not vary, but each sample drawn from
the larger population will have its own value of any
statistic that is used to estimate the parameter

01/23/2023 9
Con…

• Parameters are assigned Greek letters and statistics


(sample) are assigned letters
• Statistics: a numerical property of a sample from a
population used to estimate the value of the
corresponding property parameter.

01/23/2023 10
Sample statistic corresponding population parameter
Sample statistic population parameter
• (sample mean) • μ (population mean)

• s2 ( sample variance) • σ2 ( population variance)


• s (sample Standard deviation) • σ (population standard deviation)
• p ( sample proportion) • P or π (Population proportion)

01/23/2023 11
• One may generate the sampling distribution of means as
follows:
1. Obtain a sample of n observations selected completely
at random from a large population
– Determine their mean and then replace the observations in
the population.

2. Obtain another random sample of n observations from


the population, determine their mean and again replace
the observations
01/23/2023 12
Con…

3) Repeat the sampling procedure indefinitely,


– Calculating the mean of the random sample of n each time
and subsequently replacing the observations in the population.

4) The result is a series of means of samples of size n.


– If each mean in the series is now treated as an individual
observation and arrayed in a frequency distribution, one
determines the sampling distribution of means of samples of
size n

01/23/2023 13
Con…

• A Sampling Distribution is a theoretical probability


distribution that shows the functional relation between
possible values of a given statistic based on a sample of
n cases and the probability (density) associated with each
value, for all possible samples of size n drawn from a
particular population

01/23/2023 14
How large is large enough?
 For most distributions, n > 30 will give a sampling
distribution that is nearly normal
 For fairly symmetric distributions, n > 15

 For normal population distributions, the sampling


distribution of the mean is always normally distributed.
 However, the general answer depends on the shape of the
distribution of the sampled population

01/23/2023 15
Con…

Standard deviation of sampling distribution


• Because the scores ( s) in the sampling distribution of
means are themselves means (of individual samples),

• We shall use the notation σx for the standard deviation

of the distribution.
• The standard deviation of the sampling distribution of
means is called the standard error of the mean.
01/23/2023 16
E.g. Suppose you have a population having four
members with values 10,20,30 and 40 . If you take all
conceivable samples of size 2 with replacement:
a. What is the frequency distribution of the sample
means i.e. Sampling distribution?
b. Find the mean and standard deviation of the
distribution (standard error of the mean).
01/23/2023 17
Possible samples xi (sample mean)
(10, 20) or (20, 10) 15
(10, 30 ) or (30, 10) 20
(10, 40) or (40, 10) 25
(20, 30) or (30, 20) 25
(20, 40) or (40, 20) 30
(30, 40) or (40, 30) 35
(10, 10) 10
(20, 20) 20
(30, 30) 30
(40, 40) 40
01/23/2023 18
A) frequency distribution of sample means

Sample mean (xi) Frequency (fi)


10 1
15 2
20 3
25 4
30 3
35 2
40 1

01/23/2023 19
Con…
i) The mean of the sampling distribution
=Σ xifi /Σfi
= 400 / 16 = 25
ii) The standard deviation of the mean =
σx = Σ ( xi - μ)2/Σfi
= {Σ (10 - 25)2 + (15 - 25)2 + …. + ( 40 - 25)2 } / 16
= 1000 / 16 = √62.5 = 7.9

01/23/2023 20
Eg2. For the population given above (10,20,30 and 40)
a) Find the population mean and
 Show that the population mean ( μ ) = the mean of the
sampling distribution
b) Find the population standard deviation and show that the
standard error of the mean (σx = σ/√n )

 that is, the standard error of the mean is equal to the


population standard deviation divided by the square root
of the sample size
01/23/2023 21
Con…
Answers to example 2

a) μ = Σ xi / N = (10 + 20 + 30 + 40) / 4 = 25

b) σ2 = Σ(xi - μ)2 / N

= ( 225+ 25+ 25 + 225)/4 = 125


Hence, σ = √ 125 = 11.18 and

• σx (standard error) =σ / √n

=11.180 /1.414 = 7.9


01/23/2023 22
Example 2

• Suppose we have a population of size N=4,

constituting the ages of four outpatients.

• x, Age (years): 18, 20, 22, 24

01/23/2023 23
01/23/2023 24
01/23/2023 25
01/23/2023 26
01/23/2023 27
Con…

• We note that the mean of the sampling distribution of


has the same value as the mean of the original
population.
• However, the variance is different from the original
population variance; but is equal to the population
variance divided by the sample size used to obtain
sampling distribution.
01/23/2023 28
Properties of Sampling Distributions

The mean
 The mean of all possible estimates obtained from
samples of identical size is equal to the true population
mean.
 Although estimates obtained from individual samples
will vary, their overall mean (the mean of the sampling
distribution) will always be equal to the population
value.
01/23/2023 29
The standard deviation
 The standard deviation of the sampling distribution
decreases as the sample size increases.
 The standard deviation of a sampling distribution takes a
special name, standard error, often indicated by the letters
SE.
 The standard deviation represents the variability in the
individual data
 The standard error represents the variability in the
sample estimates
01/23/2023 30
• The standard deviation of the sampling distribution
measures how much the sample statistic varies from
sample to sample
• It is smaller than the standard deviation of the
population by a factor of √n
– Averages are less variable than individual observations

01/23/2023 31
The standard deviation
– The standard error depends inversely on the square root of the
sample size used.
For example, the standard error of the sample mean is:

where σ is the population standard deviation and n is the sample size.


The estimated standard deviation is s/√n

01/23/2023 32
The central limit theorem and the shape

• The shape of the sampling distribution is


approximately normal when the sample size is large
• This property is known as the Central Limit Theorem
• It is the most important of all the three properties
and it is also the most surprising
• It says that, when the sample size is large, the
distribution of the sample estimates tends to normal.
01/23/2023 33
Con…

 This happens even if the distribution of the original

data are not normal

 If the original distribution is approximately normal,

the sampling distribution is normal even at small

sample sizes.
01/23/2023 34
Con….
 A random sample of n observations was obtained from a
population with mean μ and standard deviation σ.
 If n sufficiently large, sampling distribution of sample
mean approximates a normal distribution with mean μ &
mean standard deviation i.e. standard error σ/√n.
 The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
01/23/2023 35
Estimation
• In statistics, the observations in the sample are used
to learn about the population.
• Obviously, in order for the sample to learn us
something about the population, the sample needs to
be drawn randomly
• This procedure, in which information from the
sample is used to draw conclusions about the
population is called statistical inference or estimation
01/23/2023 36
Con…
• Inference is the process of making interpretations or
conclusions from sample data for the totality of the population
• Inferential statistics uses the sample results to make decisions
and draw conclusions about the population from which the
sample is drawn
• In statistics there are two ways though which inference can be
made
– Statistical estimation
– Statistical hypothesis testing
01/23/2023 37
Con…
Statistical Estimation:
• This is one way of making inference about the
population parameter where the investigator does not
have any prior notion about values or characteristics of
the population parameter
• Estimation is the procedure by which values are
assigned to a population parameter based on the
information collected from the sample
01/23/2023 38
Con…
The estimation procedure involves the following steps
Select a sample
Collect the required information from the member of
the sample
Calculate the value of the sample statistic
Assign value(s) to the corresponding population
parameter
01/23/2023 39
Con…

There are two ways estimation:


I. Point Estimation
II. Interval estimation

01/23/2023 40
Point estimate
• A single numerical value used to estimate the
corresponding population parameter
– is a single number based on sample data that can be
used as an estimator of the population parameter

 µ p̂ p

01/23/2023 41
Properties of best estimator

The following are some qualities of an estimator


• It should be unbiased
• It should be consistent
• It should be relatively efficient

01/23/2023 42
Con….
To explain these properties let θˆ be an estimator of θ
• Unbiased Estimator: An estimator whose expected value
is the value of the parameter being estimated

• Consistent Estimator: An estimator which gets closer to


the value of the parameter as the sample size increases.

• Relatively Efficient Estimator: The estimator for a


parameter with the smallest variance

01/23/2023 43
Interval Estimation (large samples)

 A point estimate does not give any indication on how far


away the parameter lies.
 A more useful method of estimation is to compute an
interval which has a high probability of containing the
parameter.
 Definition: An interval estimate is a statement that a
population parameter has a value lying between two
specified limits
01/23/2023 44
Con…

• It is the procedure that results in the interval of values


as an estimate for a parameter,
– Which is interval that contains the likely values of
a parameter
• It deals with identifying the upper and lower limits of
a parameter

01/23/2023 45
Interval estimation
 Provides range of Values
 Based on observations from one sample
 Gives information about closeness to unknown
population parameter
 Stated in terms of probability
 Level of confidence is denoted as (1-α)100%

01/23/2023
Never 100%Sure 46
01/23/2023 47
Con…
• A confidence interval is a specific interval estimate of a
parameter determined by using data obtained from a
sample and the specific confidence level of the estimate
• The confidence level is the probability that the value of
the parameter falls within the range specified by the
confidence interval surrounding the statistic

01/23/2023 48
Confidence interval estimation …..Continue

• Consider the standard normal distribution and the statement


Pr (-1.96≤ Z ≤1.96) = 0. 95
• This is merely a shorthand algebraic statement that 95% of the
standard normal curve lies between – 1.96 and + 1.96.
• If one chooses the sampling distribution of means (a normal
curve with mean μ and standard deviation σ /√ n), then ,
 Pr(-1.96 ≤ ( X - μ)/(σ /√n) ≤ 1.96) = 0.95

01/23/2023 49
Con….

• A little manipulation without altering the probability


value of 95 percent gives
 Pr( X - 1.96(σ /√n) ≤ μ ≤ X + 1.96(σ /√n) )
• The range X -1.96(σ /√n) to X + 1.96(σ /√n) ) is called
the 95% confidence interval;
 X -1.96(σ /√n) is the lower confidence limit while
 X + 1.96(σ /√n) is the upper confidence limit

01/23/2023 50
Con…

01/23/2023 51
01/23/2023 52
Con….

• The general formula for an interval estimation is:


– Interval estimation = Point Estimator ± Error Margin

• Error margin is a measure of precision


• Error margin is a product of reliability coefficient
corresponding to your confidence and standard
error of the estimator

01/23/2023 53
Factors Affecting Interval Width

• Data Variation measured by σ


• Sample Size (n)

• Level of Confidence (1 ‐ α)

01/23/2023 54
Con…

• If many different random samples are taken, and if


the confidence interval for each is determined, then it
is expected that 95% of these computed intervals will
contain the population mean (μ)
• If we take 100 samples & calculate 95% Confidence
Intervals for each p, expect 95 of them to contain the
population proportion π
01/23/2023 55
Con…

• If more than 95% certainty regarding the population


mean:
• Say, a 99% C.I. were desired, the only change needed
is to use ±2.58 (the point enclosing 99% of the
standard normal curve),
which gives X ± 2.58 (σ/√n)

01/23/2023 56
CI for single mean
100(1-α)% CI for μ when σ is known (sampling from normal
population or large sample)

Interpretation:
a. Probabilistic: in repeated sampling; 100(1-α)%, of all
intervals will include μ
b. Practical: we are 100(1-α)% confident that a single
interval contains μ
01/23/2023 57
If the population has a normal distribution and σ is
known, then a (1-α)100% confidence interval for μ is
given by:

01/23/2023 58
E.g. 2) A random sample of 100 drug-treated patients has a
mean survival time of 46.9 months. If the SD of the
population is 43.3 months, find a 95% confidence interval
for the population mean.
– (The population consists of survival times of cancer patients who
have been treated with a new drug)
– 46.9 ± (1.96) (43.3 /√100)

= 46.9 ± 8.5 = (38.4 to 55.4 months)


• Hence, there is 95% certainty that the limits ( 38.4 , 55.4)
embrace the mean survival times in the population from which
the sample arose.
01/23/2023 59
Exercise

• A data on 199 patients on systolic blood pressure


gives a mean value of 125.8 mmHg. Let us assume
that the standard deviation for this patient
population is known to be 20 mmHg. Construct a 95
percent confidence interval for the population mean.

01/23/2023 60
 If the population variance σ2 is not known and n is
Small (n<30) the population is normal:

• In most practical research, the standard deviation for the


population of interest is not known
• In this case, the standard deviation σ is replaced by the
estimated standard deviation S
• Since the standard error is an estimate for the true value of
the standard deviation, the distribution of the sample mean X
is no longer normal with mean μ and standard deviation
σ/√n
01/23/2023 61
Con…

• Instead, the sample mean follows the t -distribution with mean


X and standard deviation S/√n

• The t -distribution is also described by its degrees of freedom

• For a sample of size n, the t -distribution will have n-1


degrees of freedom.

• The notation for a t -distribution with n-1 degrees of freedom

is t(n-1)
01/23/2023 62
Con…

• When the true standard deviation is not known and

researchers use the sample standard deviation in its

place, the means are no longer normally distributed.

• Gosset named the distribution of means when the

sample standard deviation is used as the t distribution.


01/23/2023 63
• The t-distribution density curve is bell shaped and
symmetrical about zero.
• As the sample size n increases, the t -distribution
becomes closer to the normal distribution, since the
standard error approaches the true standard deviation for
large n

01/23/2023 64
 Different curves for different df (i.e. sample sizes)
and for very large df t will be very close to Z

01/23/2023 65
01/23/2023 66
01/23/2023 67
01/23/2023 68
Example
1. In a study of preeclampsia, Kaminski and Rechberger found
the mean systolic blood pressure of 10 healthy, non
pregnant women to be 119 with a standard deviation of 2.1.
A. What is the estimated standard error of the mean?
B. Construct the 99% confidence interval for the mean of the
population from which the 10 subjects may be presumed to
be a random sample.
C. What is the precision of the estimate?
01/23/2023 69
C. Precision = 3.250 X 0.66
01/23/2023 = 2.16 70
01/23/2023 71
2. Sample of 20 houses studied to estimate the mean sprayable area

of house for controlling of malaria epidemic. The result was mean

of 22.9m2, SD is 6.0m. Construct CI for mean sprayable of area of

the population with 95% confidence.

•Solution:-given mean = 22.9m2 SD = 6.0 m

•α=0.05, α/2= 0.025 and degree of freedom (n-1)=19

•tn-1, α/2=2.09

01/23/2023 72
Con…

• = 22.9 ± 2.09 (6/√20) =22.9 ± 2.09(1.34)

=22.9 ± 2.8
=22.9-2.8, 22.9+2.8
=20.01, 25.7
• We are 95% confident that the total sprayable area of
a house is b/n 20.01 and 25,7m2

01/23/2023 73
Confidence interval for the difference of means
 Consider two different populations.
 The first population ( X ) has mean μx and standard deviation σx,

 The second ( Y ) has mean μy and standard deviation σy

 From the first population take a sample of size nx and compute


its mean x ;
 From the second population take independently a sample of size
ny and compute mean y ;

Then determine x
01/23/2023 mean – y mean 74
Con…

• Do this for all pairs of samples that can be chosen


independently from the two populations

• The differences, x mean – y mean, are a new set of scores


which form the sampling distribution of differences of
means

01/23/2023 75
Con…

• Population variances σ21 and σ22 are known; both


populations are normally distributed

• Procedure: Draw samples of sizes n1 and n2 from the

two populations, get the respective means X1 and X2

and find the value X1-X2, then estimate of μ1-μ2

01/23/2023 76
Con…

• If the population is normally distributed and the


variances are known, then a (1-α) 100% confidence
interval for μ1-μ2 is given by:

01/23/2023 77
Con…

 N.B: If the two populations have the same variances σ2 ,


that is, σ11=σ22=σ2 , and then a(1- α)100% confidence

interval for μ1 – μ2 is given by:

01/23/2023 78
• The characteristics of the sampling distribution of
differences of means are:

1. The mean of the sampling distribution of differences of


means equals the difference of the population means

( Mean = μx - μy )

2. The standard deviation of the sampling distribution of


differences of means, also called the standard error of

differences of means is denoted by σ ( x - y )


01/23/2023 79
Con…

• The sampling distribution is normal if both populations


are normal, and is approximately normal if the samples
are large enough (even if the populations aren’t normal)
– In practice, it is assumed that the sampling
distribution of differences of means is normal if both
nx and ny are ≥30

01/23/2023 80
Eg1. If a random sample of 50 non-smokers have a mean
life of 76 years with a standard deviation of 8 years,
and a random sample of 65 smokers live 68 years with
a standard deviation of 9 years,
A) What is the point estimate for the difference of the
population means?
B) Find a 95% C.I. for the difference of mean lifetime of
non-smokers and smokers.
01/23/2023 81
Solutions
 Population x (non-smokers)

nx=50, x mean = 76, σ = 8,

σ12 /n1= 82/50 =1.28 yrs

 Population y (smokers) ny= 65, y mean = 68, σ = 9,

σ22 /n2 =92/65 =1.25 years

01/23/2023 82
A) A point estimate for the difference of population means
(μx- μy)
=X mean - y mean =76-68 = 8 years
B) At a 95% confidence level,
Z = ± 1.96, σ(X mean - y mean )
=
=
= 1.59

01/23/2023 83
Hence, 95% C.I. for μx- μy

= (X mean - y mean ) ± 1.96 σ(X mean - y mean )

= 8 ± 1.96 (1.59)
= 8 ± 3.12 = (4.88 to 11.12 years)

01/23/2023 84
Con…
• The confidence intervals for the difference in means
provide a range of likely values for (μ1-μ2)

• It is important to note that all values in the confidence


interval are equally likely estimates of the true value of
(μ1-μ2)

• If there is no difference between the population means,


then the difference will be zero (i.e., (μ1-μ2)= 0)
01/23/2023 85
Con…

• Zero is the null value of the parameter (in this case the
difference in means)
• If a 95% confidence interval includes the null value,
then there is no statistically meaningful or statistically
significant difference between the groups
• If the confidence interval does not include the null
value, then we conclude that there is a statistically
significant difference between the groups
01/23/2023 86
Exercise
 An anthropologist who wanted to study the heights of adult
men and women took a random sample of 128 adult men and
100 adult women and found the following summary results.
 Adult men
Mean height= 170 cm
Standard deviation= 8 cm
 Adult women
Mean height= 164 cm
Standard deviation= 6 cms

Find a 95% C.I for the difference of mean height of


adult men and women.
01/23/2023 87
Confidence interval for a single proportion

• The sample proportion is derived from counts or


frequency data.
• Easier and more reliable, does not depend on variance.
• Sample proportion =
• Population proportion = P or ᴨ

01/23/2023 88
01/23/2023 89
Con…
• The probability distribution of a sample proportion
is called its sampling distribution
• It gives different values that can assume and their
probabilities

01/23/2023 90
Con…
• The shape of sampling distribution of Pˆ is inferred from
central limit theorem
• According to the central limit theorem, the sampling
distribution of p is approximately normal for a sufficient
sample size
• In case of proportion, the sample size is considered to be
sufficiently large if np and nq are both greater than 5.
01/23/2023 91
Assumptions
– Two Categorical Outcomes
– Population Follows Binomial Distribution
– Normal approximation can be used
– n × p ≥5 & n × (1 - p) ≥ 5

01/23/2023 92
Con…
Therefore for large sample
• The sampling distribution of the sample proportion p
is (approximately) normal
• The mean of the sampling distribution of p is equal to
the population proportion P

• The standard deviation σp of the sampling distribution


of the sample proportion p is ; where q=1-p
01/23/2023 93
σp= Standard deviation of the sampling distribution of
proportions
= Standard error of proportions

δp =

01/23/2023 94
Con…

01/23/2023 95
Con…

• The confidence interval for the population proportion


(p) is given by the formula:

01/23/2023 96
• Example: An epidemiologist is worried about the ever
increasing trend of malaria in a certain locality and wants to
estimate the proportion of persons infected in the peak
malaria transmission period. If he takes a random sample of
150 persons in that locality during the peak transmission
period and finds that 60 of them are positive for malaria, find
• a) 95% b)90% c)99% confidence intervals for the proportion of the
whole infected people in that locality during the peak malaria
transmission period
01/23/2023 97
Con…
• Sample proportion = 60 / 150 = 0.4
• The standard error of proportion depends on the
population P.
• However, the population proportion (P) is unknown.
In such situations,

01/23/2023 98
A. 95% C.I for the population proportion (the proportion of the
whole infected people in that locality)
= 0.4 ± 1.96 (0.04) = (0.4 ± .078) = (0.322, 0.478)

B. 90% C.I for the population proportion ( the proportion of the


whole infected people in that locality)

= 0.4 ± 1.64 (0.04) = (0.4 ± .066) = (0.334, 0.466)

C. 99% C.I for the population proportion ( the proportion of the


whole infected people in that locality)
= .4 ± 2.58 (.04) = (.4 ± .103) = (0.297, 0.503)

01/23/2023 99
• A study on dental health practice. Of 300 adults
interviewed, 123 said that they regularly had a dental
check‐up twice a year.
What is the 95% C.I. for π?
• P = 123/300 = 0.41 a point estimator of π.
• α = 0.05 ⇒ Z0.025 = 1.96

=0.41±1 .96

=(0.36, 0.46).

01/23/2023 100
C.I. for the difference between two population
proportions

• We assess the probability associated with a difference


in proportions computed from samples drawn from
each of these populations
• The appropriate distribution is the distribution of the
difference between two sample proportions

01/23/2023 101
Con…
• The sampling distribution of the difference between two
sample proportions is constructed in a manner similar to
the difference between two means.

• Independent random samples of size n1 and n2 are drawn


from two populations of dichotomous variables where the
proportions of observations with the character of interest
in the two populations are p1 and p2, respectively

01/23/2023 102
Con…

• Consider two different populations N1and N2

• The first population has proportion of P1and the second

population has proportion of P2

• From the first population take a sample of size n1 and

compute its sample proportion p1

• From the second population take a sample size of n2 and

compute its sample proportion p2


01/23/2023 103
Con…

• Then determine p1-p2

• Do for all pairs of samples that can be chosen


independently from the two populations

• The Difference between two sample proportion p1-p2


is the point estimator for the difference between two
population proportion P1-P2

01/23/2023 104
Con…

01/23/2023 105
Con…
• The distribution of the difference between two sample
proportions, p1-p2, is approximately normal

• By the same analogy, the C.I. for the difference of


proportions (Px – Py) is given by the following formula

• C.I. for Px - Py = (px - py) ± Z σ (Px – Py)

• Where Z is determined by the confidence coefficient

σ (Px - Py)=
01/23/2023 106
Con…
• Example: Each of two groups consists of 100 patients
who have leukemia. A new drug is given to the first group
but not to the second (the control group)

• It is found that in the first group 75 people have remission


for 2 years; but only 60 in the second group

• Find 95% confidence limits for the difference in the


proportion of all patients with leukemia who have
remission for 2 years
01/23/2023 107
 px =0.75, qx = 0.25, nx = 100,

– σPx = Pxqx/nx = 0.75 x 0.25 / 100 = 0.001875


 py = 0.60, qy = 0.40, ny = 100,
– σPy = Pyqy/ny = 0.60 x 0.40 / 100 = 0.0024
 Then
 √0.001875+0.0024 =0.065

• At a 95% Confidence level, Z = ± 1.96 and

• The difference of the two samples is (0.75 - 0.60) =0.15

• Therefore, a 95 % C.I. for the difference in the proportion with 2-


year remission is (0.15 ± 1.96 (0.065) ) = (0.15 ± 0.13)

01/23/2023
= (0.02 to 0.28) 108
Thank you
01/23/2023 109

You might also like