0% found this document useful (0 votes)
48 views23 pages

Chapter One Stat II

statistics 2 chap 1

Uploaded by

Eyoseyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views23 pages

Chapter One Stat II

statistics 2 chap 1

Uploaded by

Eyoseyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER THREE

SAMPLING AND SAMPLING DISTRIBUTIONS

3.1. Introduction
Usually the population under study is very large or infinite which makes studding it very difficult
or impossible. Under such circumstances we take a sample or a subset of the population to study
the population.
The universe may be either finite or infinite.
A finite universe is one in which the number of items is determinable, such as the number of
students in Arba Minch University or in Ethiopia.
An infinite universe is that in which the number of items cannot be determined, such as the
number of stars in the sky. In some cases, the universe is so large that for all practical purposes it
is regarded as infinite, such as, the number of leaves on a tree.
What is sampling?
Data is collected from target population using survey. If a survey covers all population, the
survey is called census and if the survey covers part of the population, the survey is called
sampling.
Why sampling is preferable?
 Cheaper than census
 Takes smaller time as compared to census
 Economy of efforts as relatively fewer staffs are needed
 More detailed information can be collected using sample
 Better quality of interviewing, supervision and other related activities
What are the Essentials of Sampling?
i. Representativeness: A sample should be so selected that it truly represents the universe
otherwise the results obtained may be misleading. To ensure representative ness the random
method of selection should be used.
ii. Adequacy: The size of sample should be adequate; otherwise it may not represent the
characteristics of the universe.
iii. Independence: All items of the sample should be selected independently of one another and
all items of the universe should have the same chance of being selected in the sample. By
independence of selection we mean that the selection of a particular item in one draw has no
influence on the probabilities of selection in any other draw.
iv. Homogeneity: When we talk of homogeneity we mean that there is no basic difference in the
nature of units of the universe and that of the sample. If two samples from the same universe are
taken, they should give more or less the same unit.
3.2. Limitations of sampling
 It fails to provide information on individual account
 Sampling gives rise to certain errors
 Difficult to check for omissions of certain units
Parameter and Statistic
When mean, median and mode and standard deviations are used to describe the characteristics of
a sample, it is called statistics and when they are used to describe the population they are referred
to as parameter.
Population Parameters Sample Statistic
Population size N Sample size n
Population mean μ Sample mean
Population standard deviation σ Sample standard deviation s
Population proportion P Sample proportion
One of the objectives of sample survey is to estimate certain population parameters. A point to
know is that the true value of a population is parameter is unknown constant. It can be
determined only by complete study of the population. The concept of statistical inference comes
in to play whenever this is impossible or practically not feasible. A statistic which is sample
based quantity must serve as our source of information about the value of parameter. In this
context, there are three crucial points.
 As the sample is only part of the population, the numerical value of a statistics is normally
not expected to give us the correct value of the parameter.
 Since different samples can be drawn from particular population, the observed value of the
statistic depends on the particular sample that is chosen.
 The value of statistic will have some variability over different occasions of sample.
3.3 Importance of sampling theory
When undertaking any survey, it is essential that you obtain data from people that are as
representative as possible of the group that you are studying. Even with the perfect questionnaire
(if such a thing exists), your survey data will only be regarded as useful if it is considered that
your respondents are typical of the population as a whole. For this reason, an awareness of the
principles of sampling is essential to the implementation of most methods of research, both
quantitative and qualitative.
 Population The group of people, items or units under investigation
 Census Obtained by collecting information about each member of a population
 Sample Obtained by collecting information only about some members of a "population"
 Sampling Frame The list of people from which the sample is taken. It should be
comprehensive, complete and up-to-date. Examples of sampling frame: Electoral Register;
Postcode Address File; telephone book
3.4 Probability (Random) and Non-Probability (non-random) Sampling
A probability sample is one in which each member of the population has an equal chance of
being selected. A random sample is usually representative sample. There are two methods of
ensuring randomness: the lottery method and the use of random numbers.
In the lottery method, each unit of the population is numbered and shown on a chit of paper or
disc. The chits are then folded and put in a box from which a sample of predetermined number is
to be drawn.
In random number case, table of random numbers is used. The units of population are numbered
from 1 to N from which n units are selected.
In a non-probability sample, some people have a greater, but unknown, chance than others of
selection.
3.4.1 Probability Samples
There are five main types of probability sample. The choice of these depends on nature of
research problem, the availability of a good sampling frame, money, time, desired level of
accuracy in the sample and data collection methods. Each has its advantages, each its
disadvantages. They are:
Simple random
Systematic
Random route
Stratified
Multi-stage cluster sampling
1. Simple random sample
This is perhaps an unfortunate term, because it isn't that simple and it isn't done at random, in the
sense of "haphazardly".
Characteristics:
 Each person has same chance as any other of being selected
 Standard against which other methods are sometimes evaluated
 Suitable where population is relatively small and where sampling frame is complete and up-
to-date
Procedure:
1. Obtain a complete sampling frame
2. Give each case a unique number, starting at one
3. Decide on the required sample size
4. Select that many numbers from a table of random numbers or using computer
The possible samples of size two from the B, C, D & E population are BC, BD, CD, CE, DE.
Note that, B appears in three of the six samples: so the probability of B, being selected is p (B) =
3/6 = ½. Similarly, p(C) = p (D) = p (E) = ½: so (1.) each element of the population has the same
chance of being chosen. Moreover, (2) each of the possible samples of size two has the same
chance [p (BC) = p (BD) = p (BE) = p (CD) = P (CE) = p (DE) = 1/6], of being selected.
Consequently, we can say the
conditions are satisfied.
2. Systematic sampling
Similar to simple random sampling, but instead of selecting random numbers from tables, you
move through list (sample frame) picking every kth name where k is N/n.
You must first work out sampling fraction by dividing population size by required sample size.
E.g. for a population of 500 and a sample of 100, the sampling fraction is 1/5 i.e. you will select
one person out of every five in the population. Random number needs to be used only to decide
on starting point. With the sampling fraction of 1/5, the starting point must be within the first 5
people in your list
Disadvantage: Effect of periodicity (bias caused by particular characteristics arising in the
sampling frame at regular units). An example of this would occur if you used a sampling frame
of adult residents in an area composed of predominantly couples or young families. If this list
was arranged: Husband / Wife / Husband / Wife etc. and if every tenth person was to be
interviewed, there would be an increased chance of males being selected.
3. Random Route Sampling
Used in market research surveys - mainly for sampling households, shops, garages and other
premises in urban areas .
Address is selected at random from sampling frame (usually electoral register) as a starting point,
interviewer then given instructions to identify further addresses by taking alternate left- and
right-hand turns at road junctions and calling at every nth address (shop, garage etc.)
Advantages:
 May be saving in time
 Bias may be reduced because interviewer has to call at clearly defined addresses - not able to
choose
Problems:
Characteristics of particular areas (e.g. poor / rich) may mean that sample is not
representative
Open to abuse by interviewer because difficult to check that instructions fully carried out
4. Stratified Sampling
Dividing a population into non overlapping groups is called stratification. A stratified random
sampling is one where the population you have is divide into non overlapping sub groups or
strata & then a simple random sample is selected with in each of the strata or sub groups. Thus a
population can be stratified if they have readily identifiable
characteristics that can be used to separate the population members into sub groups.
For example, we can stratify a human population as follows: first we can divide the population
into different strata on the basis of age, sex, occupations, education, religion, region, etc… you
have to notice that stratification doesn’t mean absence of randomness. But all that it means, the
population is first divided into a certain strata & then a simple random sample is selected from
each stratum of the population. The advantages of using stratified random sampling are:
 It more accurately reflects the characteristics of the population than simple random sampling
& systematic random sampling.
 It is more cost effective than simple random sampling.
5. Multi-stage cluster sampling
As the name implies, this involves drawing several different samples. It does so in such a way
that cost of final interviewing is minimized.
Basic procedure: First draw sample of areas. Initially large areas selected then progressively
smaller areas within larger area are sampled. Eventually end up with sample of households and
use method of selecting individuals from these selected households.
3.4.2 Non-Probability Samples
It isn't always possible to undertake a probability method of sampling, such as in random
sampling. For example, there is not a complete sampling frame available for certain groups of
the population e.g. the elderly; people who are attending a football match; people who shop in a
particular part of town. Another factor to bear in mind is that many of the probability sampling
methods described above may mean that researchers would have to undertake a postal or
telephone survey delivery or might be expected to go from house to house. We will discuss some
of the problems of low response rate later on in this workbook, but you might find that a
probability sample with a poor response rate doesn't in the end
give you a particularly good representation of the population being examined.
Advantages of non-probability methods:
Cheaper
Used when sampling frame is not available
Useful when population is so widely dispersed that cluster sampling would not be efficient
Often used in exploratory studies, e.g. for hypothesis generation
Some research not interested in working out what proportion of population gives a particular
response but rather in obtaining an idea of the range of responses on ideas that people have.
1. Purposive Sampling
A purposive sample is one, which is selected by the researcher subjectively. The researcher
attempts to obtain sample that appears to him/her to be representative of the population and will
usually try to ensure that a range from one extreme to the other is included. Often used in
political polling - districts chosen because their pattern has in the past provided good idea of
outcomes for whole electorate.
2. Quota Sampling
Quota sampling involves the fixation of certain quotas, which are to be fulfilled by the
interviewers.
Quota sampling is often used in market research. Interviewers are required to find cases with
particular characteristics. They are given quota of particular types of people to interview and the
quotas are organized so that final sample should be representative of population.
Stages:
 Decide on characteristic of which sample is to be representative, e.g. age
 Find out distribution of this variable in population and set quota accordingly. E.g. if 20% of
population is between 20 and 30, and sample is to be 1,000 then 200 of sample (20%) will be
in this age group
Complex quotas can be developed so that several characteristics (e.g. age, sex, marital status) are
used simultaneously. By the end of the day, the researcher may be looking for a widowed man in
his nineties who looks as though he might buy a particular brand of detergent.
Disadvantage of quota sampling - Interviewers choose who they like (within
above criteria) and may therefore select those who are easiest to interview, so bias can result.
Also, impossible to estimate accuracy (because not random sample)
3. Convenience sampling
A convenience sample is used when you simply stop anybody in the street who is prepared to
stop, or when you wander round a business, a shop, a restaurant, a theatre or whatever, asking
people you meet whether they will answer your questions. In other words, the sample comprises
subjects who are simply available in a convenient way to the researcher. There is no randomness
and the likelihood of bias is high. You can't draw any meaningful conclusions from the results
you obtain.
However, this method is often the only feasible one, particularly for students or others with
restricted time and resources, and can legitimately be used provided its limitations are clearly
understood and stated.
Because it is an extremely haphazard approach, students are often tempted to use the word
"random" when describing their sample where they have stopped people in the street, as they see
it "at random". You should avoid using the word "random" when describing anything to do with
sampling unless you are absolutely certain that you selected respondents from a sampling frame
using truly random methods.
4. Snowball sampling
With this approach, you initially contact a few potential respondents and then ask them whether
they know of anybody with the same characteristics that you are looking for in your research.
For example, if you wanted to interview a sample of vegetarians / cyclists / people with a
particular disability / people who support a particular political party etc., your initial contacts
may well have knowledge (through e.g. support group) of others.
5. Self-selection
Self-selection is perhaps self-explanatory. Respondents themselves decide that they would like to
take part in your survey.
3.5 Bias and Error in Sampling
A sample is expected to mirror the population from which it comes; however, there is no
guarantee that any sample will be precisely representative of the population from which it comes.
Chance may dictate that a disproportionate number of untypical observations will be made like
for the case of testing fuses, the sample of fuses may consist of more or less faulty fuses than the
real population proportion of faulty cases. In practice, it is rarely known when a sample is
unrepresentative and should be discarded.

Sampling error
What can make a sample unrepresentative of its population? One of the most frequent causes is
sampling error.
Sampling error comprises the differences between the sample and the population that are due
solely to the particular units that happen to have been selected.

Sampling bias is a tendency to favour the selection of units that have particular characteristics.
Sampling bias is usually the result of a poor sampling plan. The most notable is the bias of non-
response when for some reason some units have no chance of appearing in the sample. For
example, take a hypothetical case where a survey was conducted recently by a Graduate School
to find out the level of stress that graduate students were going through. A mail questionnaire
was sent to 100 randomly selected graduate students. Only 52 responded and the results were
that students were not under stress at that time when the actual case was that it was the highest
time of stress for all students except those who were writing their thesis at their own pace.
Apparently, this is the group that had the time to respond. The researcher who was conducting
the study went back to the questionnaire to find out what the problem was and found that all
those who had responded were third and fourth PhD. students. Bias can be very costly and has to
be guarded against as much as possible. A means of selecting the units of analysis must be
designed to avoid the more obvious forms of bias. Another example would be where you would
like to know the average income of some community and you decide to use the telephone
numbers to select a sample of the total population in a locality where only the rich and middle
class households have telephone lines. You will end up with high average income, which will
lead to the wrong policy decisions.
Non-sampling error (measurement error)
The other main cause of unrepresentative samples is non-sampling error. This type of error can
occur whether a census or a sample is being used. Like sampling error, non-sampling error may
either be produced by participants in the statistical study or be an innocent by product of the
sampling plans and procedures.
A non-sampling error is an error that results solely from the manner in which the observations
are made.
The simplest example of non-sampling error is inaccurate measurements due to malfunctioning
instruments or poor procedures. For example, consider the observation of human weights. If
persons are asked to state their own weights themselves, no two answers will be of equal
reliability. The people will have weighed themselves on different scales in various states of poor
calibration. An individual’s weight fluctuates by several pounds, so that the time of weighing
will affect the answer. The scale reading will also vary with the person’s state of understanding.
Responses therefore will not be of comparable validity unless all persons are weighed under the
same circumstances.
Biased observations due to inaccurate measurement can be innocent but very devastating. A
story is told of a French astronomer who once proposed a new theory based on spectroscopic
measurements of light emitted by a particular star. When his colloquies discovered that the
measuring instrument had been contaminated by cigarette smoke, they rejected his findings.
In surveys of personal characteristics, unintended errors may result from:
 The manner in which the response is elicited -The social desirability of the persons surveyed
 The purpose of the study -The personal biases of the interviewer or survey writer.
The interviewer’s effect
No two interviewers are alike and the same person may provide different answers to different
interviewers. The manner in which a question is formulated can also result in inaccurate
responses. Individuals tend to provide false answers to particular questions. For example, some
people want to feel younger or older for some reason known to them. If you ask such a person
their age in years, it is easier for the individual just to lie to you by over stating their age by one
or more years than it is if you asked which year they were born since it will require a bit of quick
arithmetic to give a false date and a date of birth will definitely be more accurate.
The respondent effect
Respondents might also give incorrect answers to impress the interviewer. This type of error is
the most difficult to prevent because it results from outright deceit on the part of the respondent.
It is important to acknowledge that certain psychological factors induce incorrect responses and
great care must be taken to design a study that minimizes their effect.
Knowing the study purpose
Knowing why a study is being conducted may create incorrect responses. A classic example is
the question: What is your income? If a government agency is asking, a different figure may be
provided than the respondent would give on an application for a home mortgage. One way to
guard against such bias is to camouflage the study’s goals; another remedy is to make the
questions very specific, allowing no room for personal interpretation. For example, "Where are
you employed?" could be followed by "What is your salary?" and "Do you have any extra jobs?"
A sequence of such questions may produce more accurate information.
Selecting the Sample
The preceding section has covered the most common problems associated with statistical studies.
The desirability of a sampling procedure depends on both its vulnerability to error and its cost.
However, economy and reliability are competing ends, because, to reduce error often requires an
increased expenditure of resources. Of the two types of statistical errors, only sampling error can
be controlled by exercising care in determining the method for choosing the sample. The
previous section has shown that sampling error may be due to either bias or chance. The chance
component (sometimes called random error) exists no matter how carefully the selection
procedures are implemented, and the only way to minimize chance-sampling errors is to select a
sufficiently large sample (sample size is discussed towards the end of this tutorial). Sampling
bias on the other hand may be minimized by the wise choice of a sampling procedure.

3.6. SAMPLING DISTRIBUTIONS


A sampling distribution is a probability distribution for the possible values of a sample statistic,
such as a sample mean.
N.B The normal probability distribution is used to determine probabilities for the normally
distributed individual measurements, given the mean and the standard deviation.
Symbolically, the variable is the measurement X, with the population mean µ and population
standard deviation δ. In contrast to such distributions of individual measurements, a sampling
distribution is a probability distribution for the possible values of a sample statistic.

Sampling Distribution of the Mean


The sampling distribution of the mean is the probability distributions of the means, of all
simple random samples of a given sample size n that can be drawn from the population.
NB: the sampling distribution of the mean is not the sample distribution, which is the distribution
of the measured values of X in one random sample. Rather, the sampling distribution of the mean
is the probability distribution for , the sample mean.
For any given sample size n taken from a population with mean µ and standard deviation δ, the
value of the sample mean would vary from sample to sample if several random samples were
obtained from the population. This variability serves as the basis for sampling distribution.
The sampling distribution of the mean is described by two parameters: the expected value ( ) =
, or mean of the sampling distribution of the mean, and the standard deviation of the mean ,
the standard error of the mean.
Properties of the Sampling Distribution of Means
1. The mean of the sampling distribution of the means is equal to the population
mean. µ = = .
2. the standard deviation of the sampling distribution of the means (standard error) is equal to
the population standard deviation divided by the square root of the sample size: = δ/√n.
This hold true if and only of n<0.05N and N is very large. If N is finite and n≥ 0.05N,

. The expression is called finite population correction factor/finite

population multiplier. In the calculation of the standard error of the mean, if the population
standard deviation δ is unknown, the standard error of the mean , can be estimated by using
the sample standard error of the mean which is calculated as follows:

3. The sampling distribution of means is approximately normal for sufficiently large sample
sizes (n≥ 30).
Example:
A population consists of the following ages: 10, 20, 30, 40, and 50. A random sample of three is
to be selected from this population and mean computed. Develop the sampling distribution of the
mean.
Solution:
The number of simple random samples of size n that can be drawn without replacement from a
population of size N is NCn. With N= 5 and n = 3, 5C3 = 10 samples can be drawn from the
population as:
Sampled items Sample means ( )
10, 20, 30 20.00
10, 20, 40, 23.33
10, 20, 50 26.67
10, 30, 40 26.67
10, 30, 50 30.00
10, 40, 50 33.33
20, 30, 40 30.00
20, 30, 50 33.33
20, 40, 50 36.67
30, 40, 50 40.00
300.00
A systematic organization of the above figures gives the following:
Sample mean ( ) Frequency Prob. (relative freq.) of
20.00 1 0.1
23.33 1 0.1
26.67 2 0.2
30.00 2 0.2
33.33 2 0.2
36.67 1 0.1
40.00 1 0.1
10.00 1.00
Columns 1 and 2 show frequency distribution of sample means.
Columns 1 and 3 show sampling distribution of the mean.

regardless of the sample size .

Since averaging reduces variability < δ except the cases where δ = 0 and n = 1.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
1. If the population is normally distributed, the distribution of sample means is normal
regardless of the sample size.
2. If the population from which samples are taken is not normal, the distribution of sample
means will be approximately normal if the sample size (n) is sufficiently large (n ≥ 30).
The larger the sample size is used, the closer the sampling distribution is to the normal
curve.
The relationship between the shape of the population distribution and the shape of the
sampling distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to
make inference about population parameters with out knowing anything about the shape of the
frequency distribution of that population other than what we can get from the sample. It also
permits us to use the normal distribution (curve for analyzing distributions whose shape is
unknown. It creates the potential for applying the normal distribution to many problems when
the sample is sufficiently large.
Example:
1. The distribution of annual earnings of all bank tellers with five years of experience is skewed
negatively. This distribution has a mean of Birr 15,000 and a standard deviation of Birr 2000.
If we draw a random sample of 30 tellers, what is the probability that their earnings will
average more than Birr 15,750 annually?
Solution:
Steps:
1. Calculate µ and
µ = Birr 15,000
= δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for

3. Find the area covered by the interval


P ( > 15,750) = P (Z > +2.05)
= 0.5 - P (0 to +2.05)
= 0.5 – 0.4798
= 0.0202
4. Interpret the results
There is a 2.02% chance that the average earning being more than Birr 15, 750 annually in a
group of 30 tellers.
2. Suppose that during any hour in a large department store, the average number of shoppers is 448,
with a standard deviation of 21 shoppers. What is the probability of randomly selecting 49
different shopping hours, counting the shoppers, and having the sample mean fall between 441
and 446 shoppers, inclusive?
Solution:
1. Calculate µ and
µ = 448 shoppers
= δ/√n= 21/√49 = 3
2. Calculate Z for

3. Find the area covered by the interval


P (441 ≤ ≤ 15,750) = P (-2.33 ≤ Z≤ -0.67)
= P (0 to -2.33) - P (0 to - 0.67)
= 0.4901 – 0.2486
= 0.2415
4. Interpret the results
There is a 24.15% chance of randomly selecting 49 hourly periods for which the sample mean
falls between 441 and 446 shoppers.
3. A production company’s 350 hourly employees average 37.6 year of age, with a standard
deviation of 8.3 years. If a random sample of 45 hourly employees is taken, what is the
probability that the sample will have an average age of less than 40 years?
Solution:
1. Calculate µ and
µ = 37.6 years n/N= 45/350 > 5%...... FPCF is needed

2. Calculate Z for

3. Find the area covered by the interval


P ( < 40) = P (Z < +2.07)
= 0.5 + P (0 to +2.07)
= 0.5 + 0.4808
= 0.9808
4. Interpret the results
There is a 98.08% chance of randomly selecting 45 hourly employees and their mean age be
less than 40 years.
4. Suppose that a random sample size of 36 is being drawn from a population with
a mean of 278. If 86% of the time the sample mean is less than 280, what is the
population standard deviation?
Solution:
µ = 278 n = 36 P ( < 280) = 0.86 δ =?
(Z/P=0.36) = +1.08
5. A teacher gives a test to a class containing several hundred students. It is known that the standard
deviation of the scores is about 12 points. A random sample of 36 scores is obtained.
a) What is the probability that the sample mean will differ from the population mean by more
than 6 points?
b) What is the probability that the sample mean will be within 6 points of the population mean?
Solution:
a) n = 36 δ =12

P( > µ +6) + P ( < µ - 6) =?

P( > µ +6) + P (Z> µ - 6) = P (Z > 3) + P (Z < - 3)


= [0.5 – P (0 to +3)] + [0.5 – P (0 to -3)]
= (0.5 – 0.4987) + (0.5 – 0.4987)
= 0.0013(2) = 0.0026
b) n = 36 δ =12

P (µ - 6≤ ≤ µ + 6) = P (- 3≤ Z ≤ 3)
= P (0 to 3)*2
= 0.4987*2
= 0.9974
If the population standard deviation is 12, in a random sample of 36 scores there is a 99.74%
chance of getting a sample mean score to lie within 6 points of the population mean.
Sampling Distribution of Proportions ( )
Some times in statistics it is important to know the proportion of a certain characteristic in a
population. That is, there are numerous problems in business where we want to know the
proportion of items in a population that possess a certain characteristic. For example,
- A quality control engineer might want to know what proportions of products of an assembly
line are defective.
- A labor economist might want to know what proportion of the labor force is unemployed.
Whereas the mean is computed by averaging a set of values, the sample proportion is computed
by dividing the frequency that a given characteristic occurs in a sample by the number of items
in the sample.
Where = sample proportions
X = number of items in a sample that possess the characteristic
n = number of items in the sample
Like other probability distribution, sampling distribution of the proportion is described by two
parameters: the mean of the sample proportions, E ( ) and the standard deviation of the
proportions, which is called the standard error of the proportion.
Properties of Sampling distribution of
1. As the sampling distribution of the mean does, the population proportion, P, is always equal
to the mean of the sample proportion, i.e., P = E ( ).

2. The standard error of the proportion is equal to: ,

Where P= population proportion


q=1–P
n = sample size.

Or , where = finite population correction factor.

The finite population correction factor is not needed if n < 0.05N.


Central Limit Theorem (CLT) and Sampling distribution of
How does a researcher use the sample proportion in analysis?
Answer: By applying the Central Limit Theorem. The CLT states that normal distribution
approximates the shape of the distribution of sample proportions if np and nq are greater than 5.
Consequently we solve problems involving sample proportions by using a normal distribution
whose mean and standard deviation are:

NB: The sampling distribution of can be approximated by a normal distribution whenever the
sample size is large i.e., np and nq>5.
Example:
1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire.
What is the probability of taking a random sample of size 120 from these electrical contractors
and finding that 0.5 or less use that brand of wire?
Solution:
n = 120 P = 0.6 q = 0.4 P ( < 0.5) =?
Steps:
1. Check that np and nq > 5
120*0.6 = 120, and 120*0.4 = 48. Both are greater than 5.
2. Calculate

3. Calculate Z for
4. Find the area covered by the interval
P ( < 0.5) = P (Z < -2.24)
= 0.5 - P (0 to -2.24)
= 0.5 – 0.4875
= 0.0125
5. Interpret the results
The probability of finding 50% or less of the contractors to use this particular brand of wire is
very low (1.25%) if we take a random sample of 120 contractors.
2. If 10% of a population of parts is defective, what is the probability of randomly selecting
80 parts and finding that 12 or more are defective?

Solution:

n = 80

P = 0.1
X = 12
= X/n = 12/80 = 0.15
P ( > 0.15) =?

P ( > 0.15) = P (Z > + 1.49)


= 0.5 – P(0 to + 1.49)
= 0.5 – P (0 to + 1.49)
= 0.5 – 0.4319 = 0.0681
About 6.81% of the time, twelve or more defective parts would appear in a random sample of
eighty parts when the population proportion is 0.10.
3. Suppose that a population proportion is .40 and that 80% of the time you draw a random
sample from this population, you get a sample proportion of 0.35 or more. How large a sample
were you taking?
Solution:
P= 0.4 P ( > 0.35) = 0.80
n =?

(Z/P= 0.30) – 0.84 ; squaring both sides

0.0595 =
-0.84 = -0.05/ 0.0035 = 0.24/n

0.84 = 0.05 0.0035 = 0.24/n


= 0.05/0.84 n = 0.24/0.0035
= 0.0595 n = 68
4. If a population proportion is 0.28 and if the sample size is 140, 30% of the time the
sample proportion will be less than what value if you are taking random samples?
Solution:
P= 0.28 (Z/P = 0.2) = - 0.52
n = 140 X =?
P ( < X) = 0.30

Sampling Distribution of the Difference between Two Independent Sample Means

This distribution is concerned with finding the difference between sample means drawn
from two populations; it is interested in determining if the mean of one population is
equal to the mean of another.
For example we might want to know:
- Whether the mean life expectancy of females is equal to the mean life expectancy for
males
- Whether the mean productivity of women and men are equal or not
- Whether the mean CGPA for business students is equal to the mean CGPA for social
science students
- Whether the mean number of white blood cells in a droplet of blood is
equal to the mean number of red blood cells etc.
In each case we have two different populations (p1 and p2).
Population 1 has mean µ1 and variance, , and population 2 has mean µ 2 and variance
.
To make such comparisons we take independent random samples of n 1, observations
from population one, P1 and n2 observations from population two, P2 and then calculate the
respective means, which are denoted by .
N.B: In sampling distribution of the difference between two means , we are
actually concerned with five different probability distributions.
1st, Two distribution of the two populations, which have means and variances of µ µ
and & respectively.
2nd, Two sampling distributions of with µ = and µ = .
3rd, One sampling distribution of with µ - µ2, and .
The sampling distribution of the difference between two sample means is
described by two parameters.
1. Mean of the difference between two sample means, µ - µ
2. Standard error of the difference between two sample means, =
Variance = =

[If X and Y are independent random variables: var (X-Y) = var (X) + var (Y)
Where, = variance of population one
= variance of population two
n1 = sample size drawn from population one
n2 = sample size drawn from population two
If more than 5% of the population is sampled without replacement, we must apply the
finite population correction factor and the formula becomes:

The Central Limit Theorem and the sampling distribution of


The central limit theorem states that:
If n1 and n2 are greater than 30, the distribution of will be approximately
normal no matter how the original populations are distributed
If the original populations are normally distributed, then the distribution of
is exactly normally distributed for any values of n 1 and n2. This means
the sum or difference of independent normal variables is normally distributed.
To standardize a difference between two sample means we use the following
formula:

Example:
1 A financial loan officer claims that the mean monthly payment for credit
cards is Br 80 with a variance of 1400 for single females and Br 80 with a variance of
1320 for single males. You take a random sample of 100 females (population 1) and an
independent random sample of 120 (population 2). What is the probability that the
sample mean for females will be at least Br 5 higher than the sample mean for males?
Solution:
n1 = 100 n2 = 120

Е =µ -µ =

= 80 – 80 =

= 0 = 5
= 0.5 – P (0 to +1.00)
= 0.5 - 0.3413
= 0.1587
There is a 15.87% chance that the mean monthly payment for credit cards for single
females will be higher than that of single males by at least Birr 5
2 MOHA soft drinks factory produces two soft drinks: 7 up and Pepsi-cola.
The daily production of 7 up averages bottles and is normally distributed
with a standard deviation 2000 bottles and standard deviation
bottles. A sample of five randomly selected daily production figures is taken from each
of the plants. What is the probability that the sample mean production for 7 up will be
less than or equal to the sample mean production for Pepsi-cola?
Solution:
n1 = 5
n2 = 5

Е =µ -µ =

= 15,000 – 12,500

= 2,500 =

= 1,431.78

= 0.5 – P (0 to -1.75)
= 0.5 - 0.4599
= 0.0401
Thus, there is only a 4.01% chance that the mean productivity for 7UP will be smaller
than the mean productivity for Pepsi-cola. So, if the owner of the two plants found a
smaller first sample mean, say bottles, in independent random samples of five
randomly selected days from each plant, he would suspect that either the sampling was
faulty or that the difference in the plant’s mean daily outputs had changed.
3 X company claims that the mean annual repair bill for its rental cars is Br
290 and the standard deviation is Br. 50. Y Company also claims its mean annual repair
bill is Br 290 and the standard deviation is Br. 50. If independent random samples of 100
cars from each company are obtained, what is the probability that exceeds Br 5?
Solution:
n1 = 100 n2 = 100

Е =µ -µ =

= 290 – 290

=0 =

= 7.071

= [0.5 – P (0 to 0.71)]2
= [0.5 - 0.2611]2
= 0.4778
There is 47.78% chance that the difference between the mean annual repair bill for X and
Y companies exceed Br 5.
4 Two population of measurement are normally distributed with and
The two populations standard deviations are and . Two
independent samples of n1 = n2 = 36 are taken from the populations.
a. What is the expected value of the difference in sample means, ?
b. What is the standard deviation of the distribution of ?
c. What is the shape of the distribution of ? How do you know?

Solution:
a. Expected value of the difference in same means
= 57-25 = 32
b. The standard deviation of the distribution of = Standard error of the
difference between two sample means:

= = = 2.24
c. The shape of the distribution of is normal because of the central limit
theorem. The central limit theorem states that
If the populations from which the samples are drawn are normal in shape
then the distribution of will be normal in shape.
If the populations from which samples are drawn are not normal in shape,
then the distribution of will be approximately normal, owing to the central
limit effect, if the sample sizes n1 and n2 are both large.

5 Two production processes are, on the average, identical. Both use an


average of kg, of raw material per day. Both have the same standard
deviation of daily use, 9 kgs per day. Thus the daily use of material may vary
for the two processes, but on the average they are the same. Find the probability that
differ by no more than 1.0 kg.
Solution:
n1 = 81 n2 = 36

Е =µ -µ =

= 500 – 500

=0 =

= 1.8028

= [P (0 to 0.71)] 2
= [0.2611] 2
= 0.5222
SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO
SAMPLE PROPORTIONS
Frequently we are interested in determining if the proportion of items in one population
that possesses a certain characteristic is the same as the proportion possessing the
characteristic in another population. For example, a doctor who gives one type of
medicine to some patients and another medicine to others may want to determine if the
percentage of people cured by the first medicine is the same as the percentage of people
cured by the second. For this and other similar cases the sampling distribution of the
difference between sample proportions is used. It is also used to measure the proportion
of market share and proportion of vote.
Suppose we take independent samples of size n 1 and n2 from two populations. Let p 1 and
p2 be the proportions of items in each population that possesses a certain characteristic,
and let and .
If n1p1, n1q1 are greater than 5 and n2p2, n2q2 are greater than 5, then the random variable
is approximately normally distributed with
Mean: Е = P1 – P2; and

Variance: Var = ; if , finite population

correction factor is used.


To standardize a difference between two sample proportions we use the
following formula:

Example:
1 At Addis Ababa University there is a movement to re-establish the
Students’ Union. Approximately 90% of the entire students favor the reinstatement. A
pro union student takes a random sample of 100 students. An anti-union student takes an
independent random sample of 100 students. Let denote the proportion of student
who favor union in a sample taken by the pro-union student and denote the proportion
of students who favor the union in the sample taken by an anti-union student. Calculate
the probability that exceeds by 0.1 or more.
Solution:
Pro-union Anti-union
P1 = 0.9 P2 = 0.9
q1 = 0.1 q2 = 0.1
n1 = 100 n2 = 100

Е =P -P =

= 0.9 – 0.9

= 0 =

= 0.04243
= 0.5 - P (0 to +2.36)
= 0.5 – 0.4909
= 0.0091
2 A TV channel airs two talk shows: Talk-show 1 and Talk-show 2. On a
Sunday afternoon, a random sample of 400 people is taken to estimate , the proportion
of the population that watched the Talk show 1 on the TV channel. On the following
Sunday, an independent random sample of 400 people is taken to estimate , the
proportion of the population who watched the Talk show 2 on the TV channel. If = 0.6
and =0.5
find the probability, that in our samples. That is, find .
Solution:
Talk-show 1 Talk-show2
P1 = 0.6 P2 = 0.5
q1 = 0.4 q2 = 0.5
n1 = 400 n2 = 400

Е =P -P =

= 0.6 – 0.5

= 0.10 =

= 0.035

= 0.5 + P (0 to -2.86)
= 0.5 + 0.4979
= 0.9979

You might also like