0% found this document useful (0 votes)
11 views

Business Statistics CH (2)

Lecture 5 covers the concepts of sampling and sampling distributions, including random sampling, the sampling distribution of the sample mean, and the Central Limit Theorem. It explains the differences between sample statistics and population parameters, and how sample statistics can be treated as random variables with their own probability distributions. The lecture also discusses the sampling distribution of the difference between two means and sample proportions, providing examples and theorems to illustrate these concepts.

Uploaded by

Teferi Geta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Business Statistics CH (2)

Lecture 5 covers the concepts of sampling and sampling distributions, including random sampling, the sampling distribution of the sample mean, and the Central Limit Theorem. It explains the differences between sample statistics and population parameters, and how sample statistics can be treated as random variables with their own probability distributions. The lecture also discusses the sampling distribution of the difference between two means and sample proportions, providing examples and theorems to illustrate these concepts.

Uploaded by

Teferi Geta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Lecture 5

Sampling and Sampling Distributions

 Sampling Distribution
 Random Sampling
 Sampling Distribution of the Sample Mean
 Central Limit Theorem
 Sampling Distribution of the Difference Between
Two Sample Means
 Sampling Distribution of the Sample Proportion
 Sampling Distribution of the Difference Between
Two Sample Proportions

02/05/24 Lecture 5 1
Statistic and Parameter
 In lectures 3 and 4, we discussed probability distributions of
discrete and continuous random variables. This lecture
extends the concept of probability distribution to that of a
sample statistic.
 A sample statistic is a numerical summary measure calculated
for sample data.
 The mean, median, quartiles, mode, variance, standard
deviation, and so on for sample are called sample statistics.

 On the other hand, the same numerical summary measures


calculated for population data are called population
parameters.
 A population parameter is always a constant.

02/05/24 Lecture 5 2
Sampling Distribution
 Because sample measurements are observed values of
random variables, the value of a sample statistic will
vary from sample to sample in a random manner.
Therefore sample statistics are random variables. Hence
a sample statistic also has a probability distribution.

 The probability distribution of a sample statistic is


called a sampling distribution.

 The standard deviation of the sampling distribution is


usually called the standard error of the statistic.

02/05/24 Lecture 5 3
Example
Population = {2, 4, 6, 8, 10}. We draw a random sample of size 2
without replacement. There are C52 = 10 different samples:
Sample X
2, 4 3
2, 6 4
2, 8 5
2, 10 6
4, 6 5
4, 8 6
4, 10 7
6, 8 7
6, 10 8
8, 10 9
The population mean  = (2 + 4 + 6 + 8 + 10)/5 = 6.

02/05/24 Lecture 5 4
Example 1 (Cont.)
 Since the sample is random, so each outcome has probability
1/10. Then the sampling distribution of X is:

X 3 4 5 6 7 8
9
P(X) .1 .1 .2 .2 .2 .1 .1

Then the expected value of X (a discrete r.v.) is given by:


E(X) = (3)(.1)+(4)(.1)+(5)(.2)+(6)(.2)+(7)(.2)+(8)(.1)+(9)(.1)
=6
E(X) = (This is in fact a general property).

02/05/24 Lecture 5 5
Random Sampling
 A basic reason for using random sampling is to ensure
that the inferences made from the sample data are not
distorted by a selection bias

 How we collect data is at least as important as how we


analyze. In particular, a sample should be representative
of the population. If a sample is not random, it may be so
biased that it is worse than useless.

 One of the simplest and most frequently used sampling


procedures produces what is known as a simple random
sample.

02/05/24 Lecture 5 6
A Simple Random Sample
 Definition: If n elements are selected from a population in such
a way that every possible combination of n elements in the
population has an equal probability of being selected, the n
elements are said to be a simple random sample (or a random
sample).
 In other words, a simple random sample is a sample that is
selected in such a way that each member of the population has
the same chance of being included in the sample.
 With a complete list of the population, using the Random
Number Table will generate random samples. Other commonly
used methods for selecting a random sample are the Systematic
sampling, Stratified sampling and Cluster sampling.

02/05/24 Lecture 5 7
Sampling Distribution Of Sample Mean
 THEOREM 1: If a random sample of n observations is
taken from a population with mean  and variance 2,
then E(X) =  and Var(X) = 2/n.

Proof:
X = (  Xi )/n = (X1 + X2 + ... + Xn)/n ,
E(X) = [E(X1) + E(X2) + ... + E(Xn)]/n
= [ +  + ... +  ]/n = (n )/n =  .
Var(X) = [Var(X1)+Var(X2)+...+Var(Xn)]/n2
= [ 2 + 2 + ... + 2 ]/n2
= (n2)/ n2 = 2 / n .

02/05/24 Lecture 5 8
Sampling Distribution Of Sample Mean
 From the variance of X , we can see that the larger the
sample, the more accurately X estimates the
population mean .

 Note: If the population is finite and the sampling is


done without replacement, then
2 N  n 2
Var ( X )  X  
N 1 n
 where N is the size of the population. The factor (N-
n)/(N-1) is called a finite population correction factor.
02/05/24 Lecture 5 9
Example 1 (Cont.)
The population variance is:
2 = E(Xi - 6)2
= (1/5)(16 + 4 + 0 + 4 + 16) = 8.
But the variance of X is:
Var(X) = E(X - 6)2
= 9 (0.1) + 4 (0.1) + 1 (0.2) + 1 (0.2)
+ 0 (0.2) + 4 (0.1) + 9 (0.1) = 3
which is consistent with the above formula for N = 5
& n = 2:
5 2 8
Var ( X )   3.
5 1 2
02/05/24 Lecture 5 10
Example 2
 We took 50 samples of size n = 36 from a population
consisting of savings account and examined the account
balance at a particular bank. The mean balance in the
population of accounts is $3900, and the standard deviation
of the population is $1200.
 There will be 50  36 = 1800 observations; but there will be
only 50 means. The mean of these 50 means will be
approximately $3900, the population mean. And the
standard deviation (or standard error) of these 50 mean
values will be approximately
 $1200
X   $200
n 36
02/05/24 Lecture 5 11
Sampling from Normal Population
 THEOREM 2: If the population for X is normally distributed
with mean . and variance 2 ( i.e. X ~ N(, 2)), then the
sample mean X is also normally distributed with mean  ,
and variance 2 /n .
 That is, X ~ N(, 2/n)

Example 3:
(1) If X ~ N(, 2) and the sample size is n, then
X 
Z ~ N(0,1)
/ n
(2) If X ~ N(40,36) and the sample size is 9, then X ~ N(40, 4)
(3) If X ~ N(40,36) and the sample size is 9, then
P(X < 42) = P(Z < 1) = 0.8413

02/05/24 Lecture 5 12
Central Limit Theorem
 From above theorems, we have known that if we take random
sample of size n from a normal population with mean  and
standard deviation  , then sampling distribution of has the
following properties:
X
(1) The expectation is ;
(2) the s.e. is /n ; and
(3) the shape is normal.

 If we are sampling from a population with unknown


distribution, the sampling distribution of will still be
X
approximately normal with mean  and the s.e. /n
provided that the sample size is large. This amazing result is
the following theorem, called the Central Limit Theorem .

02/05/24 Lecture 5 13
CENTRAL LIMIT THEOREM
 Central Limit Theorem (C.L.T): If X is the mean of a
random sample of size n taken from a population with
mean  and finite variance 2, then the limiting form
of the distribution of
X 
Z
/ n
as n  , is the standard normal distribution N(0, 1),
or X has approximately a normal distribution with
mean  and standard deviation n.

02/05/24 Lecture 5 14
Central Limit Theorem - Second Form
 If n is large, then
n

X i  n
Z  i 1
 n

has approximately a standard normal distribution, that


is, the sum (Xi) has approximately a normal
distribution with mean n and standard deviation
n.

02/05/24 Lecture 5 15
More About Central Limit Theorem
 The normal approximation for X will generally be
good if n  30 regardless of the shape of the
population.

 If n < 30, the approximation is good only if the


population is not too different from a normal
distribution.

 If the population is known to be normal, the sampling


distribution of X will follow a normal distribution
exactly, no matter how small the size of samples.

02/05/24 Lecture 5 16
Example 4
 Consider the discrete uniform population
f(x) = 1/4 for x = 0, 1, 2, 3.
Find the probability that a random sample of size 36, selected with
replacement, will yield a sample mean greater than 1.4 but less than
1.8 if the mean is measured to the nearest tenth.
 Solution:
 = E(X) = 1.5 and
2 = [(0 - 1.5)2 + (1 -1.5)2 + (2 - 1.5)2 + (3 - 1.5)2]  1/4 = 5/4.
Then E(X ) = 1.5, Var(X ) = 2/n =5/144, and
the standard error (X )=0.186.
Therefore, by the C. L. T.
P(1.4 < X < 1.8)  P(-0.54 < Z < 1.61)
= P(Z < 1.61) - P(Z < -0.54) = 0.6517.

02/05/24 Lecture 5 17
Sampling Distribution of The Difference
Between Two Means
 Suppose that we now have two populations, the first with
mean  and variance , and the second with mean 
and variance .
 Let the statistic X1 represent the mean of a random
sample of size n1 selected from the first population; and
the statistic X2 represent the mean of a random sample
of size n2 selected from the second population,
independent of the sample from the first population.
 What can we say about the sampling distribution of the
difference X1 - X2 for repeated samples of size n1 and
n2?
02/05/24 Lecture 5 18
Sampling Distribution of The Difference
Between Two Means (Cont.)
 According to the central limit theorem, the random
variables X1 and X2 are both approximately normally
distributed with means  &  and variances n1 and
n2 respectively, if n1 and n2 are large.
 By choosing independent samples from the two populations,
the variables X1 and X2 will be independent. Hence we can
conclude that X1 - X2 is approximately normally distributed
with the mean
E(X1 - X2 ) = E(X1 ) - E(X2 ) =  - 
and the variance
Var(X1 - X2) = Var(X1) + Var(X2) = n1+ n2

02/05/24 Lecture 5 19
Sampling Distribution of X1 - X2
 From the above discussion, the standard deviation of the
sampling distribution of X1 - X2 is given by:

12  22
 X1  X 2  
n1 n 2

1. If the population is normal, then the distribution of


1 - X
2 will
X be normal in shape;
2. If the population is not normal, then the distribution of
X1 - X2 will be approximately normal if n1 and n2 are
large.

02/05/24 Lecture 5 20
Example 5
 The television picture tubes of Manufacturer A have a
mean lifetime of 6.5 years & a standard deviation of 0.9
year; while those of Manufacturer B have a mean
lifetime of 6 years and a standard deviation of 0.8 year.
What is the probability that a random sample of 36
tubes from Manufacturer A will have a mean lifetime
that is at least 1 year more than the mean lifetime of a
sample of 49 tubes from Manufacturer B ?

02/05/24 Lecture 5 21
Example 5 - Solution
For Population I:  = 6.5,  = 0.9 and n1 = 36
For Population II:  = 6.0,  = 0.8 and n2 = 49
Since n1 and n2 are greater than 30, the sampling
distribution of X1 - X2 will be approximately normal with
E( X1  X 2 ) 6.5  6 0.5
12  22 .81 0.64
 X1  X 2     0.189
n1 n 2 36 49

P(X1 - X2  1.0)  P(Z  (1.0-0.5)/0.189)


= P(Z  2.65) = 0.0040.

02/05/24 Lecture 5 22
Sampling Distribution of Sample Proportion
 We take one sample of n items from a binomial population with
a proportion of success . In the previous lecture, we
demonstrated how to use normal distribution to approximate
the binomial distribution if
n    5 and n (1-)  5 (or n    (1-)  5).
 This is in fact a consequence of the Central Limit Theorem
applied to the binomial distribution.
 Consider a random sample of size n: X1, X2, ..., Xn, which are
independent random variables taking values 0 & 1 such that
P(Xi = 1) = .
Let X = X1 + X2 + … + Xn . Then X represents the total number
of successes in the sample.

02/05/24 Lecture 5 23
Sampling Distribution of p
 We know that X follows a binomial distribution Bin(x;
n, ) with mean  = n and the standard deviation
  n(1  )

 Then the standardized number of successes is


X  n
Z
n(1  )
which, according to C.L.T. (the second form), has
approximately a standard normal distribution if n is
reasonably large.

02/05/24 Lecture 5 24
Sampling Distribution of p (Cont.)
 Let p= X/n, which is the sample proportion of successes,
or the fraction of successes. We know that E(p) = ,
and
(1   )
p 
n
 And the standardized fraction of successes is

p 
Z
(1  ) / n
which, again according to C.L.T. (the first form), is
approximately standard normal.
02/05/24 Lecture 5 25
Sampling Distribution of p1 - p2
 Many times, we need to know the differences in the proportion of
successes in two independent binomial populations. For example,
we may have two production processes, two auditing procedures,
and two medical treatment procedures (the classical ‘placebo’
effect). Assume that: for population 1, the probability of success
= 1 and for population 2, the probability of success = 2
 For i = 1, 2, if let Xi be the number of successes in a sample of size
ni drawn from population i, then pi =Xi /ni is the proportion of
successes in the sample draw from population i.
1. The expected value of the sampling distribution of (p1 - p2) is
given by E(p1 - p2) =1 - 2
2. The standard deviation of the sampling distribution of (p1 - p2) is
 (1  1 )  2 (1   2 )
given by p  p  1 
1 2
n1 n2

02/05/24 Lecture 5 26
Sampling Distribution of p1 - p2 (Cont.)
 By extending the normal approximation to the binomial
distribution as discussed above, we know that when n1
and n2 are large, or practically, when
n1 1(1-1) 5 and n2 2(1-2) 5,
 the shape of the distribution of (p1 -p2) is approximately
normal. More specifically speaking, the standardized
difference between two sample proportions:
(p 1  p 2 )  (  1   2 )
Z
1 (1  1 )  2 (1   2 )

n1 n2
 is approximately a standard normal distribution.

02/05/24 Lecture 5 27
Example 6
 A store has two locations. At both locations, about 40%
of customers use credit card to pay their purchases.
That is, 1 = 2 =0.4. In doing an audit, the company
accountant took random samples of n1=100 and n2=100
sales slips from the two locations. 41 and 36 charge
customers were found in the sample from the first and
second location, respectively.
What is the probability that a result would be achieved
whereby the first location’s proportion of charge
customers exceeded the second location’s proportion of
charge customers by this much or more?

02/05/24 Lecture 5 28
Example 6 - Solution
 In this problem, n1 = n2 = 100,
 p1 = 41/100 = 0.41 & p2 = 36/100 = 0.36
 Since 1 = 2 = 0.4, then on average we would expect
that to be close to zero. The probability we seek can be
written as P(p1 - p2  0 .41 - 0.36). Since (n11(1-1))=24
and (n22(1-2))=24, we can use the normal
approximation, that is
P(p1  p 2 0.41  0.36) P(p1  p 2 0.05)
 0.05  0 
P Z 
 
 [(0.4)(0.6) / 100]  [(0.4)(0.6) / 100] 

 0.05 
= P Z   P( Z 0.72) 0.2358
 0.069 
02/05/24 Lecture 5 29

You might also like