ModLec273 2
ModLec273 2
unless summarized by the tools of descriptive statistics. Descriptive statistics, therefore, allow us to present the
data in a more meaningful way which allows interpretation of the data easily.
Inferential statistics includes statistical methods which facilitate estimating the characteristics of a population or
making decisions concerning a population on the basis of sample results. In this regard, methods like estimation
and hypothesis testing are examples of inferential statistics.
For example, a biologist collected blood samples of 10 students from biology Department to study blood types.
Accordingly, the following data is obtained:
O, A, O, AB, A, A, O, O, B, A, and O
Summary measures, for example, the proportion of students with blood type O in the sample is 50% is an
example of descriptive statistics. We can also describe the data using bar or pie charts.
However, if he/she wants to get information on the proportion of students with blood type O in the entire class,
he/she may use the sample proportion (50%) as an estimate of the corresponding value of the entire class. This
is an example of inferential statistics.
1.2 Definition of some terms
A population: Consists of all elements, individuals, items or objectives whose characteristics are being studied.
The population that is being studied is called target population.
Sample: A portion of the population selected for study.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population. Ethiopian population census survey is
carried every 10 years.
Variable: is a characteristic under study that assumes different values for different element.
Quantitative variable: A variable that can be measured numerically. The data collected on quantitative
variable are called quantitative data. Examples include weight, height, number of students in a class, number of
car accidents, e t c.
Qualitative variable: A variable that cannot assume a numerical value but can be classified into two or more
non numerical categories. The data collected on such a variable are called qualitative or categorical data.
Examples include sex, blood type, marital status, religion e t c.
Discrete variable: a variable whose values are countable. Examples include number patients in a hospital,
number of white blood cells in a droplet of blood sample, number of rodents per plot of farmland e t c.
Continuous variable: a variable that can assume any numerical value over a certain interval or intervals.
Examples include weight of new born babies, height of seedlings, temperature measurements e t c.
Parameter: A statistical measure obtained from a population data. Examples include population mean,
proportion, variance and so on.
Statistic: A statistical measure obtained from a sample data. Examples include sample mean, proportion,
variance and so on.
Unit of analysis (Experimental unit): The type of thing being measured in the data, such as persons, families,
households, states, nations, etc.
Exercise: From a sample of 200 household in a town the amount of garbage produced per day is found to be
2kg. Determine the population, sample, sample size, variable, parameter and statistic.
The required data can be obtained from either a primary source or a secondary source.
Primary source: Is a source of data that supplies first hand information for the use of the immediate purpose.
Primary Data: data you collect to answer your question. Data measured or collected by the investigator or the
user directly from the source. Or data originally collected for the immediate purpose.
Two activities involved: planning and measuring.
a) Planning:
Identify source and elements of the data.
Decide whether to consider sample or census.
If sampling is preferred, decide on sample size, selection method, etc
Decide measurement procedure.
Set up the necessary organizational structure.
b) Measuring: there are different options.
Focus Group
Telephone Interview
Mail Questionnaires
Door-to-Door Survey
Mall Intercept
New Product Registration
Personal Interview and
Experiments are some of the sources for collecting the primary data.
Secondary source: are individuals or agencies, which supply data originally collected for other purposes by
them or others. Usually they are published or unpublished materials, records, reports, e t c.
Secondary data: data collected from a secondary source by other people for other purposes. Data gathered or
compiled from published and unpublished sources or files.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with the present problem.
The nature and classification of data is appropriate to our problem.
There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 Sampling
Depending on the source, data can be primary or secondary. Primary data refers to the statistical data which the
investigator originates for the purpose of inquiry. But secondary data, on the other hand, refers to data which is
not originated by the investigator himself, but which he obtains from someone else records. Secondary data can
be obtained from published or unpublished documents: reports, journals, magazines, articles etc. Primary
methods of data collection: It includes data collection using observation, personal interview, self administered
questionnaire, mailed questionnaire etc. Generally data is collected from a sample of the population.
Sampling: is the technique of selecting representative sample from the whole.
Sampling Frame: A complete list of all the units of the population is called the sampling frame. A unit of
population is a relative term. If all the workers in a factory make a population, then a worker is a unit of the
population. If all the factories in a country are being studied for some purpose, then a factory is a unit of the
population of factories. The frame provides a base for the selection of a sample.
4
Cluster sampling
Cluster sampling is used when sampling frame is difficult to construct or using other sampling techniques
(simple random sampling) is not feasible or costly. For instance, when the geographic distribution of units is
scattered it is difficult to apply simple random sampling. It involves division of the population of elementary
units into groups or clusters that serve as primary sampling units. A selection of the clusters is then made to
form the sample. The precision of estimates made based on samples taken using this method is relatively low.
Non-probability sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined by personal judgment.
This method is cost effective; however, we cannot make objective statistical inferences. Depending on the
technique used, non-probability samples are classified into quota, judgment or purposive and convenience
samples.
Sampling and non-sampling errors
Sampling error is the difference between the value of a sample statistic and the value of the corresponding
population parameter. On the other hand, non-sampling error is an error that occurs in the collection, recording
and tabulation of data. Sampling error can be minimized by using appropriate sampling methods and/or
increasing the sample size. The non-sampling error is likely to increase with increase in sample size.
UNIT THREE: METHODS OF DATA PRESENTATION
Objectives:
After completing this unit you should be able to
9 organize data using frequency distribution.
9 present data using suitable graphs or diagrams.
Introduction
The amount of data collected in real life situations is often too large, thus we need some methods to organize it.
One of such methods is grouping, that is putting data into groups rather than treating each observation
individually. In fact, raw data provide little, if any, information to decision makers. Thus, we need a means of
converting the raw data into useful information. Hence, the purpose of this unit is to introduce tools used for
data presentation.
3.1 Classification and tabulation of data
The use of classifying and tabulating data are to display the points of similarity and dissimilarity; to save mental
strain by systematic condensation and suppression of irrelevant detail; to enable one to form a mental picture of
objects of perception; and to prepare the ground for comparison and inference.
Types of classification
1. Geographical- in terms of cities, districts, countries etc.
2. Chronological - on the basis of time
3. Qualitative - according to some qualitative characteristics.
4. Quantitative in terms of magnitude.
One can also use combination of these to classify data.
Tabulation: tables may be classified according to the number of characteristics used for tabulation.
1. Simple or one way table: it uses only one characteristic or variable for classification.
Example 2.1: Students who took introduction to statistics in 1998 E.C.by gender.
Gender
Number
Male
2000
Female
700
6
Class limit
16-21
22-27
28-33
34-39
40-45
46-51
Total
Class boundaries
15.5-21.5
21.5-27.5
27.5-33.5
33.5-39.5
39.5-45.5
45.5-51.5
Tally
\\\
\\\\\ \
\\\\\ \\\
\\\\
\\\
\
Frequency
3
6
8
4
3
1
25
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the observations
are bounded from above or from below, we can have a cumulative less than or a cumulative more than
frequency distributions, respectively.
Example 2.6: Convert the absolute frequency distribution in example 2.4 into:
i)
a cumulative less than frequency distribution.
a cumulative more than frequency distribution.
ii)
Solution:
i)
We use the class boundaries to form cumulative frequencies. For instance, there is no observation which
is less than 15.5, 3 observations are less than 21.5, 9 observations are less than 27.5 and so on. Thus,
the following less than cumulative frequency distribution is obtained.
4
4
2
5
4
4
3
3
2
5
8
2
3
7
4
3
4
3
2
6
2
7
8
3
5
8
3
4
4
5
To group these data, we will use classes based on the single numerical value.
10
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
The classes for grouping are Democratic, Republican and Other.
Table: Number of students by political party affiliations
Class
frequency Relative frequency
Democratic
13
0.325
Republican
18
0.45
Other
9
0.225
Total
40
1
3.2 Diagrammatic and graphical presentation of data
3.2.1 Graphs for quantitative data
Histogram: it consists of a set of adjacent rectangles whose bases are marked off by class boundaries along the
horizontal axis and whose heights are proportional to the frequencies associated with the respective classes.
To construct a histogram from a data set:
1. Construct a frequency table.
2. Draw adjacent bars having heights determined by the frequencies in step1.
The importance of a histogram is that it enables us to organize and present data graphically so as to draw
attention to certain important features of the data. For instance, a histogram can often indicate how symmetric
the data are; how spread out the data are; whether there are intervals having high levels of data concentration;
whether there are gaps in the data; and whether some data values are far apart from others.
11
Example 2.9: The following is a histogram for the frequency distribution in example 2.4.
Example 2.10: Construct a frequency polygon for the frequency distribution of the time spent by
automobile workers that we have seen in example 2.4.
the
Production year
Amounts of coffee (in 1000 tons)
120
100
80
60
40
20
0
1990
1991
1992
1993
1994
1995
Production year
Pie-chart: it is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
f
9 Calculate the percentage frequency of each component. It is i * 100 .
n
f
9 Calculate the degree measures of each sector. It is given by i * 360 0 .
n
9 Draw the circle using protractor and compass
Example 2.13: Draw a pie-chart to represent the following data on a certain family expenditure.
Table: Family expenditure.
Item
Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr)
50
30
20
15
35
150
Percentage
frequencies
Angles of the sector
33.33
20
13.33
10
23.33
100
1200
720
480
360
840
3600
Item
Food
Clothing
House rent
Fuel and light
Miscellaneous
b) Pie chart
Find the percentage of donors for each blood type. In order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 3600 and divide by 100.
Frequency
19
8
19
4
50
Blood type
A
B
O
AB
Total
Percent
38.0
16.0
38.0
8.0
100.0
Angles
136.80
57.60
136.80
28.80
360 0
Blood type
A
B
O
AB
Number of donors
20
15
10
0
A
AB
Blood type
14
Rules of summation
and
15
Definition 3.2:
i)
Let
is the
ii)
If the numbers
Example 3.2: The ages of a random sample of patients in a given hospital in Ethiopia is given below:
Age
10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
30
72
140
224
198
100
88
852
Definition 3.3: If
mean denoted by
have weights
, is defined as
Example 3.3: The GPA or CGPA of a student is a good example of a weighted arithmetic mean. Suppose that
Solomon obtained the following grades in the 1st semester of last year.
Course
Credit hour (wi) Grade
Math101
4
A=4
Bio101
3
C=2
Chem101
3
B=3
Phys101
4
B=3
Flen101
3
C=2
Compute the GPA of Solomon.
Example 3.4: In a vacancy for a position of botanist in an organization, the criteria of selection were work
experience, entrance exam, and, interview result. The relative importance of these criteria was regarded to be
different. The weights of these criteria and the scores obtained by 3 candidates (out of 100 in each criterion) are
given in the following table. In addition, the selection of a candidate is based on average result on these criteria.
Criterion
Weight
Candidates
Tesfaye Gutema Kedir
Work experience 4
70
89
85
Entrance exam
3
78
83
89
Interview result 2
90
92
90
Who is the appropriate candidate for the position based on the criteria?
Solution: We use the weighted mean since the relative importances of these criteria are different.
Criterion
Weight
Candidates
Tesfaye Gutema Kedir
xi xiwi xi xiwi xi xiwi
Work experience
4
70 280 89 356 85 340
Entrance exam
3
78 234 83 249 89 267
Interview result
2
90 180 92 184 90 180
Total
9
694
789
787
The weighted mean and the simple arithmetic mean for the applicants are as follows:
Applicant
Tesfaye
Gutema
Kedir
Weighted mean
694/9=77.11 789/9=87.67 787/9=87.44
Simple arithmetic mean 238/3=79.33 264/3=88
264/3=88
17
If we use the simple arithmetic mean of the scores, both Gutema and Kedir have got equal chances to be
recruited. However, the relative importance of the criteria is different. So we have to use the weighted mean for
discriminating among the candidates. The weighted mean of the scores obtained by Gutema is larger than the
others. So Gutema should be recruited for the job.
Properties of arithmetic mean
i. It can be computed for any set of numerical data, it always exists, and unique.
ii. It depends on all observations.
iii. The sum of deviations of the observations about the mean is zero i.e.
b)
Day
1st
2nd
3rd
4th
5th
Number of cases
12
24=2 12
48=2
96=2
192=2
18
Example 3.7: A companys year-to-year changes in fuel consumption expenditures were 5, 10, 20, 40 and 60
percent. Determine the average yearly percent change in expenditure.
Solution: The 1st, 2nd ,3rd , 4th , and 5th growth rates are 105 %,110%,120%,140%, and 160%, respectively.
Average growth rate=
Rate of increase
in population (in %)
1
5
2
8
3
12
Average growth rate=
Hence the average rate of increase in population over the last three decades is 108.2-100=8.2 percent.
4.2.3 The median
Definition 3.5: the median of a set of data is a value which divides the set in such a way that the number of
observations below it is the same as the number of observations above it.
Median from raw data
i. If the number of observations, say n, is odd then the median is equal to the
observation of the
array.
ii. If the number of observations n is even then the median is equal to the sum of
observation and
Example 3.9: Find the median for the following sets of data:
i. 10 5 7 9 6 5 4
Solution: First arrange the data in the form of an array.
4 5 5 6 7 9 10
Here we have n=7 which is odd
ii. 10 5 7 9 6 5 4 8
Solution: Arrange the data in ascending order.
4 5 5 6 7 8 9 10
Here n=8 which is even.
Therefore,
19
iii. A shop keeper (sales person) recorded the number of video cassette recorders (VCRs) sold per month
over a two year period. Find the median number VCRs sold.
Number of sets sold Frequency ( months) Cumulative frequency
1
3
3
2
8
11
3
5
16
4
4
20
5
2
22
6
1
23
7
1
24
The number of observations n=24 , even.
Properties of median
It is an average of position.
It is affected by the number of observations than by extreme values.
The sum of the deviations about the median, signs ignored, is less than the sum of deviations taken from
any other value or specific average.
4.2.4 The mode
Definition 3.6: The mode (modal value) of an observed set of data is the value that occurs the largest number of
times.
The mode for raw data
Example 3.10: Find the modal value for the following sets of data.
i. 5 6 5 8 7 4 . In this data set, 5 is the most frequent value. Therefore, the mode is 5. Since the modal
value is only one number, we call the distribution unimodal.
ii. 1 2 3 4 8 2 5 4 6. In this data,the modal values are 2 and 4 since both 2 and 4 appear most
frequently and they occur equal number of times. These kind distributions are called bimodal
distribution.
iii. 1 2 4 3 5 6 8 7 In this data set, all values appear equal number of times so there is no modal value.
Note:
9 If a distribution has more than two modal values then we call the distribution multimodal.
9 If in a set of observed values, all values occur once or equal number of times, there is no mode.
9 The mode is also useful in finding the most typical case when the data are nominal or categorical.
Example 3.11: A survey showed the following distribution for the number of students enrolled in each field.
Find the mode.
Subject
Number of students
Business
850
Liberal arts
825
Computer sciences
645
Education
478
General studies
100
20
Solution: Since the category with the highest frequency is business, the most typical case is a business major.
Properties of modal value
It is easy to calculate and understand.
It is not affected by extreme values.
It is ill-defined, indeterminate and indefinite sometimes.
It is not based on all observations.
Is not used in further analysis of data.
The mean, median, and mode of grouped data
The mean for grouped data can be found by considering the values in the interval are centered at the mid-point
of the interval.
Example 3.12: Consider the frequency distribution of the time spent by the automobile workers. Find the mean
time spent by these workers from this frequency distribution.
Note: In case of grouped data if any class interval is open, arithmetic mean cannot be calculated.
The median for grouped data can be approximated by the following formula.
Solution: The class containing the (n/2) th observation or the 10th observation is the median class. This class has
class boundaries 20.5 & 25.5(4th class).
Deciles are nine points which divide an array into 10 parts in such a way that each part contains equal number
of elements. The 1st, 2nd,, and the 9th points are known as the 1st, 2nd,, and the 9th deciles and are usually
denoted by D1,D2,,D9, respectively.
Percentiles are 99 points which divide an array into 100 parts in such a way that each part consists of equal
number of elements. The 1st, 2nd and the 99th points are known as the 1st, 2nd and the 99th percentiles and are
usually denoted by P1, P2 P99, respectively.
Note: The array should be in ascending order in order to get the quantiles.
i. Quantile points for raw data
First form an array in an ascending order and then apply the following procedure.
Example 3.15: The following data relate to sizes of shoes sold at a stock during a week. Find the quartiles, the
seventh decile and the 90th percentile.
Size of shoes
5 5.5 6
6.5 7
7.5 8
8.5 9 9.5
Number of pairs 2 5
15 30
60 40
23 11
4 1
Solution: The total number of observations is 191.
22
The mean yield of both varieties is 42 kg. The mean yield of variety 1 is close to the values in this variety. On
the other hand, the mean yield of variety 2 is not close to the values in variety 2. The mean doesnt tell us how
the observations are close to each other. This example suggests that a measure of central tendency alone is not
sufficient to describe a frequency distribution. Therefore, we should have a measure of spreads of observations.
There are different measures of dispersion. In this chapter we shall discus the most commonly used measure of
dispersion or variation like Range, Quartile Deviation, Standard Deviation, coefficient of variation. And
measure of shape such as skewness and kurtosis.
Objectives of measuring variation
To describe dispersion (variability) in a data.
To compare the spread in two or more distributions.
To determine the reliability of an average.
23
Note: The desirable properties of good measures of variation are almost identical with that of a good measure of
central tendency.
Absolute and relative measures
Measures of variation may be either absolute or relative. Absolute measures of variation are expressed in the
same unit of measurement in which the original data are given. These values may be used to compare the
variation in two distributions provided that the variables are in the same units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar versus tones of
sugarcane, the absolute measures of dispersion are not comparable. In such cases measures of relative
dispersion should be used. A measure of relative dispersion is the ratio of a measure of absolute dispersion to an
appropriate measure of central tendency. It is a unitless measure.
5.2 Types of measures of variation
The range and relative range
Definition 5.1: Range is defined as the difference between the maximum and minimum observations in a set of
data.
Range is the crudest absolute measures of variation. It is widely used in the construction of quality control
charts and description of daily temperature.
Definition 5.2: Relative range (RR) is defined as
Variance, standard deviation and coefficient of variation
Definition 5.3: The variance is the average of the squares of the distance each value is from the mean. The
symbol for the population variance is 2 ( is the Greek lower case letter sigma). Let x1,x2,,xN be the
measurements on N population units then, the population variance is given by the formula:
where
Definition 5.4: The standard deviation is the square root of the variance. The symbol for the population standard
The corresponding formula for the standard deviation is
deviation is
.
Example 5.1: The height of members of a certain committee was measured in inches and the data is presented
below. Height(x): 69 66 67 69 64 63 65 68 72
2 -1 0 2 -3 -4
4 1
0 4 9
-2 1 5
16 4
1 25
Definition 4.5: The sample variance is denoted by S2, and its formula is
.
Definition 4.6: The sample standard deviation, denoted by S, is the square root of the sample variance
.
Example 5.2: For a newly created position, a manager interviewed the following numbers of applicants each
day over a five-day period: 16, 19, 15, 15, and 14. Find the variance and standard deviation.
Solution:
Note that the procedure for finding the variance and standard deviation for grouped data is similar to that for
finding the mean for grouped data, and it uses the mid-points of each class.
Properties of variance
9 The unit of measurement of the variance is the square of the unit of measurement of the observed values.
It is one of its limitations.
9 The variance gives more weight to extreme values as compared to those which are near to mean value,
because the difference is squared in variance.
9 It is based on all observations in the data set.
Properties of standard deviation
9 Standard deviation is considered to be the best measure of dispersion and is used widely.
9 There is, however, one difficulty with it. If the unit of measurement of variables of two series is not the
same, then their variability cannot be compared by comparing the values of standard deviation.
Uses of the variance and standard deviation
9 The variance and standard deviations can be used to determine the spread of data, consistency of a
variable and the proportion of data values that fall within a specified interval in a distribution.
9 If the variance or standard deviation is large, the data is more dispersed. This information is useful in
comparing two or more data sets to determine which is more (most) variable.
9 Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of variation (CV)
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is known as
the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean, usually
expressed in percent:
S
CV = 100% , where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or more uniform
or more homogeneous.
25
Example 5.3: Last semester, the students of Biology and Chemistry Departments took Stat 273 course. At the
end of the semester, the following information was recorded.
Department
Biology Chemistry
Mean score
79
64
Standard deviation
23
11
Compare the relative dispersions of the two departments scores using the appropriate way.
Solution:
Biology Department
Chemistry Department
23
11
CV = 100 = 29.11%
CV = 100 = 17.19%
79
64
Since the CV of Biology Department students is greater than that of Chemistry Department students, we can say
that there is more dispersion in the distribution of Biology students scores compared with that of Chemistry
students.
Example 5.4: The mean weight of 20 children was found to be 30 kg with variance of 16kg2 and their mean
height was 150 cm with variance of 25cm2. Compare the variability of weight and height of these children.
Example 5.5: Two sections were given an exam in a course. The average score was 72 with standard deviation
of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from section 1 scored 84 and
student B from section 2 scored 90. Who performed better relative to his/her group?
Solution: Section 1: x = 72,
Section 2: x = 85,
Z-score of student A: Z =
x A x1 84 72
=
= 2.00
S1
6
Z-score of student B: Z =
x B x 2 90 85
=
= 1.00
S2
5
26
From these two standard scores, we can conclude that student A has performed better relative to his/her section
students because his/her score is two standard deviations above the mean score of selection 1 while the score of
student B is only one standard deviation above the mean score of section 2 students.
Example 5.6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative positions on
each test.
Solution: First, find the z-scores.
Y =
S y2 =
(Yi
,
n
Y =
( xi + a )
i 1
( xi + a ( x + a)) 2
n 1
1
[xi + a] = 1 xi + 1 (na ) = X + a
n
n
n
( xi x ) 2
= S x2
n 1
Let Yi = aXi
n
axi a xi
= c=1 = aX
Y = c=1
n
n
[axc aX ]
n
S =
2
Y
c 1
n 1
a 2 [ax c a X
n
c 1
n 1
n 1
= a2
= x i 2 x xi + x
xi2 2nx 2 + nx 2
n 1
x
x n i
n
=
n 1
( xi x ) 2
= a2 S x2 ,
n 1
Sy =
a 2 S x2 = a Sx
xi2 2 x xi + x 2
=
n 1
n 1
2
i
xi2
=
(xi ) 2
n
n 1
Example: Use short cut formula to compute the variance of the following data (x).
x
1 3 4 5 7
xi = 20
x i2 1
9 16 25 49
x i2 = 100
27
Definition 6.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the random experiment. The
sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote events with capital
letters, A, B, C, etc.
Example 6.1: If an experiment consists of flipping of a coin once, then
S = {H, T} where H means that the outcome of the toss is a head and T that it is a tail. A= {H} represents the
event of head occurring.
28
Example 6.2: If an experiment consists of rolling a die once and observing the number on top, then the sample
space is S = {1, 2, 3, 4, 5, 6} where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6. {1},
{2},{3},{4},{5} and {6}are elementary events i.e. events consisting of a single outcome. Let A represents the
event of an odd number will occur, then A is simply the set containing 1, 3 and 5 i.e. A= {1, 3, 5}.
Review of set theory
Concepts of set theory are important in understanding probability. Given A, B and C are events associated with
a sample space S and represents an elementary event (outcome) in S, then the following are some useful
definitions and results in set theory.
Definitions 6.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points that are both in
A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any A, then B. Then A B .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and denoted by .
5. Complement: The complement of a set A denoted by Ac is the set where S, Ac but, A .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if their
intersection is empty. (i.e. A n B = ). Subsets A1, A2, are defined to be mutually exclusive if Ai n Aj =
for every i j.
Theorem 6.1: Important elementary set theory results
i)
Au B=B u A and A n B = B n A
ii)
Au (B u C) = (Au B) u C and A n (B n C) = (A n B) n C
iii)
An (B u C) = (A n B) u (A n C) and Au (B n C) = (A u B) n (A u C)
iv)
(Ac)c = A
v)
An S = A; A u S = S; A n = ; and A u A =A
vi)
In short, to assign probabilities for an event, we might need to enumerate the possible outcomes of a random
experiment and need to know the number of possible outcomes favoring the event. The following principles
will help us in determining the number of possible outcomes favoring a given event.
Theorem 6.2: Addition principle
If a task can be accomplished by k distinct procedures where the ith procedure has ni alternatives, then the total
number of ways of accomplishing the task equals
29
n1 + n2++nk.
Example 6.3: Suppose one wants to purchase a certain commodity and that this commodity is on sale in 5
government owned shops, 6 public shops and 10 private shops. How many alternatives are there for the person
to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways
Theorem 6.3: Multiplication principle
If a choice consists of k steps of which the first can be made in n1 ways, for each of these the second can be
made in n2 ways,, and for each of these the kth can be made in nk ways, then the whole choice can be made
in n1.n2.nk ways.
Example 6.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to Washington D.C. in 3
ways then the number of ways in which we can go from Addis Ababa to Rome to Washington D.C. is 2x3
ways or 6 ways. We may illustrate the situation by using a tree diagram below:
W
R
W
W
A
W
R
W
W
Example 6.5: If a test consists of 10 multiple choice questions, with each permitting 4 possible answers, how
many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4xx4=410 ways or1, 048, 576 ways of completing the exam. Note that there is only
one way in which he /she can give correct answers to all questions and that there are 310 ways in which all the
answers will be incorrect.
Example 6.6: A manufactured item must pass through three control stations. At each station the item is
inspected for a particular characteristic and marked accordingly. At the first station, three ratings are possible
while at the last two stations four ratings are possible. Hence there are 48 ways in which the item may be
marked.
Example 6.7: Suppose that car plate has three letters followed by three digits. How many possible car plates are
there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.
30
Definition 6.4: If n is a positive integer, we define n!= n(n-1)(n-2)1 and call it n-factorial and 0!=1.
Permutations
Suppose that we have n different objects. In how many ways, say nPn, may these objects be arranged
(permuted)? For example, if we have objects a, b and c we can consider the following arrangements: abc, acb,
bac, bca, cab, and cba. Thus the answer is 6. The following theorem gives general result on the number of such
arrangements.
Theorem 6.4: Permutation
i)
The number of permutations of n different objects is given by nPn= n!
ii)
A permutation of n objects, arranged in groups of size r, without repetition, and order being important
is:
n!
n Pr =
(n r )!
Example 6.8: Suppose that we have five letters a, b, c, d.
i) What is the number of possible arrangements of these letters taken all at a time?
ii) What is the number of possible arrangements of these letters if we use only three of the letters at a time?
Solution:
i) Using (i) of theorem 5.4, we have 4! ways of arranging the 4 letters, i.e. we have 24 possible
arrangements.
ii) Using (ii) of theorem 5.4, we have 4P3 ways of arranging 3 letters taken from the four letters, i.e. we have
24 possible arrangements.
Example 6.9: In a class with 8 boys and 8 girls
i) In how many ways can the children line up if they alternate girl-boy-girl-boy-... ?
ii) In how many ways can the children line up so that no two of the same sex are next to each other?
Solution:
i) The 8 girls can line-up in 8! ways, and likewise the 8 boys can line-up in 8! ways. For any single
arrangement of the girls, all possible arrangements of the boys are possible, thus by multiplication
principle we have 8!x 8! ways to arrange the children in girl-boy lines.
ii) Now we must include the case of boy-girl. So we have 2x8!x 8! ways of arranging.
Example 6.10: If I have 5 different books on my shelf, in how many ways can I arrange these books? Solution:
We can arrange the books in 5! different ways or 5x4x3x2x1 ways or 120 ways.
Remarks
i) The number of permutations of n distinct objects arranged in a circle is (n-1)!.
This is because we consider two permutations the same if one is a rotation of the other. For n objects arranged
around a circle, there a n rotations that give the same permutation. Dividing n! by n gives (n - 1)!. The two
circular permutations below are considered the same; their order is a, b, c, d, e.
Given n objects of which n1 are one kind, n2 are another kind, , nk of another kind, then the total number of
n!
.
distinct permutations that can be made from these objects is
n1!n2 !...nk !
Example 6.11
i)
How many "words" (text strings or distinct arrangements) can be made from the letters b,k,o,o?
ii)
How many permutations are there for the letters in the word banana?
Solution:
i)
If we label the two os as o1 and o2, and think of them as distinct, then the number of permutations is
4!. For each permutation there will be a matching permutation that switches the os, that is for
o1o2bk there is the matching o2o1bk permutation. We can see then that if we divide the number of
distinct permutations by two, we have a count of the number of permutations of the 4 letters where
we do not distinguish between the two os. Therefore, there are distinct4!/2 text strings or 12 text
strings.
ii)
If we think of all 6 letters as distinct, then we would have 6! permutations. As in the preceding
example for the two ns, we would need to divide 6! by 2. For the 3 as, we would have 6 counts for
a single permutation. For instance, each of the following would be a single word if the as were not
distinct. a1a2a3bnn, a1a3a2bnn, a2a1a3bnn, a2a3a1bnn, a3a1a2bnn, and a3a2a1bnn. Hence the number of
distinct permutations of the word banana is
6!
= 60 .
2!3!
Combinations
Consider n different objects. This time we are concerned with counting the number of ways we may choose r
out of these n objects without regard to order. For example, we have the objects a, b, c and d, and r=2; we wish
to count ab, ac, ad, bc, bd, and cd. In other words, we do not count ab and ba since the same objects are
involved and only the order differs.
There are many problems in which we are interested in determining the number of ways in which r objects can
be selected from n distinct objects without regard to the order in which they are selected. Such selections are
called combinations or r-sets. It may help to think of combinations as committees. The key here is without
regard for order.
To obtain the general result we recall the formula derived above: the number of ways of choosing r objects out
of n and permuting the chosen r equals n!/(n-r)!. Let C be the number of ways of choosing r out of n,
disregarding order. C is the number required. Note that once the r items have been chosen, there are r! ways of
permuting them. Hence applying the multiplication principle again, together with the above result, we obtain
n!
C.r! = n!/(n-r)!. Therefore, C =
. This number arises in many contexts in mathematics and hence a
r!(n r )!
special symbol is used for it. We shall write
32
n
n!
= n C r =
.
r!(n r )!
r
Theorem 6.5: Combination
The number of ways of choosing r out of n different objects, disregarding order, is given by
n
n!
=
.
r r!(n r )!
Example 6.12: How many different committees of 3 can be formed from Hawa, Segenet, Nigisty and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how many subsets of 3 elements
are there? In terms of combinations the question becomes, what is the number of combinations of 4 distinct
objects taken 3 at a time? The list of committees:{H,S,N}, {H,S,L}, {H,N,L}, {S,N,L}.Therefore, we have 4C3
or 4 possible number of committees.
Example 6.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different committees are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2 men and 3 women can
be formed?
20 20!
Solution: (i) There are =
= 1140 possible committees.
3 3!17!
5 7 5! 7!
(i) =
= 350 possible committees.
2 3 2!3! 3!4!
Remarks:
n n
i) =
r
n
r
It is rather surprising that with only these three axioms, we can construct the "entire" theory of probability! The
next theorems and definitions help in assigning probabilities of events.
33
Theorem 6.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the probabilities of
the individual outcomes comprising A.
Theorem 6.7: Suppose that we have a random experiment with sample space S and probability function P
and A and B are events. Then we have the following results:
i)
P( ) = 0
ii)
iii)
iv)
P(Ac) = 1 P(A)
P(B n Ac) = P(B) P(A n B)
If A subset of B then P(A) P(B).
Solution:
34
2
= 0.5
4
2
P ( B) = = 0.5
4
c
P ( A ) = 1 P( A) = 1 0.5 = 0.5
P ( A) =
P ( B c ) = 1 P( B) = 1 0.5 = 0.5
P ( S c ) = 1 P( S ) = 1 1 = 0 = P( )
Example 6.16: From a group of 5 men and 7 women, it is required to form a committee of 5 persons. If the
selection is made randomly, then
what is the probability that 2 men and 3 women will be in the committee?
i)
what is the probability that all members of the committee will be men?
ii)
what is the probability that at least three members will be women?
iii)
12 12!
Solution: The total number of possible committees is =
= 792 , i.e. the number of possible out comes
5 5!7!
Let B be the event that all members of the committee will be men. Hence
5 7
5 0
1
P ( A) = =
792
12
5
iii)
Let C be the event that at least three of the committee members will be women.
Basically, three different compositions of committee members can be formed in terms of sex: 3
women and 2 men, 4 women and 1 man, and all are women. Hence the number of possible outcomes
favoring event C using the principle of combination together with the addition principle
5 7 5 7 5 7
is + + = 350 + 175 + 21 = 546 .
2 3 1 4 0 5
35
5 7 5 7 5 7
+ +
2 3
1 4
0 5
546
Therefore, P (C ) = =
= 0.69
792
12
5
Definition 6.7: Relative Frequency Definition of probability
If an experiment is repeated a large number, n, of times and the event A is observed nA times, the probability
of A is P(A) nA/n.
The above definition of probability is based on empirical data accumulated through time or based on
observations made from repeated experiments for a large number of times.
=
.
6
6
6
6
Example 6.18: Sixty percent of the families in a certain community own their own car, thirty percent own their
own home, and twenty percent own both their own car and their own home. If a family is randomly chosen,
a) what is the probability that this family do not have a car?
b) what is the probability that this family owns a car or a house?
c) what is the probability that this family owns a car or a house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car nor a house?
Solution: Let A represents that the family owns a car and B represents that the family owns a house.
information: P(A)=0.6,P(B)=0.3, and P(AnB)=0.2.
a) Required: P(Ac) = ?
P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ?
Given
36
In more precise terms, given an experiment, a corresponding sample space, and a probability law, supposes that
we know that the outcome is within some given event B. We wish to quantify the likelihood that the outcome
also belongs to some other given event A. We thus seek to construct a new probability law, which takes into
account this knowledge and which, for any event A, gives us the conditional probability of A given B, denoted
by P(A|B).
Definition 6.8: If P(B) > 0, the conditional probability of A given B, denoted by P(A|B),
P( AnB)
P( A / B) =
.
P( B)
is
Example 6.19: Suppose cards numbered one through ten are placed in a hat, mixed up, and then one of the
cards is drawn at random. If we are told that the number on the drawn card is at least five, then what is the
conditional probability that it is ten?
Solution: Let A denote the event that the number on the drawn card is ten, and B be the event that it is at least
five. The desired probability is P(A|B).
P( AnB) P({10}n{5,6,7,8,9,10})
P({10})
1 / 10 1
=
=
=
=
P( A / B) =
P( B)
P({5,6,7,8,9,10})
P({5,6,7,8,9,10}) 6 / 10 6
37
Example 6.20: A family has two children. What is the conditional probability that both are boys given that at
least one of them is a boy? Assume that the sample space S is given by S = {(b, b), (b, g), (g, b), (g, g)}, and all
outcomes are equally likely. (b, g) means, for instance, that the older child is a boy and the younger child is a
girl.
Solution: Letting A denote the event that both children are boys, and B the event that at least one of them is a
boy, then the desired probability is given by
P( AnB) 1 / 4 1
P( A / B) =
=
=
P( B)
3/ 4 3
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B) and we are asked to
find P(AnB). An example illustrates the use of this formula. Suppose that 5 good fuses and two defective ones
have been mixed up. To find the defective fuses, we test them one-by-one, at random and without replacement.
What is the probability that we are lucky and find both of the defective fuses in the first two tests?
Example 6.21: Suppose an urn contains seven black balls and five white balls. We draw two balls from the urn
without replacement. Assuming that each ball in the urn is equally likely to be drawn, what is the probability
that both drawn balls are black?
Solution: Let A and B denote, respectively, the events that the first and second balls drawn are black. Now,
given that the first ball selected is black, there are six remaining black balls and five white balls, and so P(B|A)
= 6/11. As P(A) is clearly 7/12 , our desired probability is
7 6
7
P( AnB) = P( A) P( B / A) = . =
12 11 22
Bayes Theorem
Introduction
Mutually exclusive events: If only one of several events can occur at one time, the events are said mutually
exclusive.
Exhaustive events: If an experiment has a set of events that include every possible outcome, then the set of
events is called collectively exhaustive.
The Law of total probability
Let A1, . . ., An be mutually exclusive and exhaustive events. Then for any other event B,
P(B) = P(B/A1)P(A1) + P(B/A2)P(A2) . . . +P(B/An)P(An) =
P(B/Ai)P(Ai)
i =1
See how B(circular region) can be observed from the following venndiagram before the proof.
A3
A1BB
BA2A4
A
38
Proof
Because the Ais are mutually exclusive and exhaustive, if B occurs it must be in combination with exactly one
of the Ais.
B = (A1 and B) or (A2 and B) or . . . or (An and B).
= (A1 B) (A2 B) . . . (An B)
where the events (Ai B) are mutually exclusive.
P(B) = P(A1 B) + P(A2 B) + . . . + P(An B)
P(B) = P(B/A1)P(A1) + P(B/A2)P(A2) + + P(B/An)P(An)
P(B) =
P( B/Ai)P(Ai)
i =1
P(B/Ak)P(Ak)
n
, k = 1,2,. . .,n.
P(B/Ai)P(Ai)
i =1
b) selecting a second ball after replacing first selected ball in the basket
Let A and B represents black ball will be selected in the first and second selection, respectively. In which of the
two ways are A and B independent?
Solution:
First way (a):
S = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2, W1W2 , W1B1, W2B1 , W1B2, W2B2,W2W1}
A = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2 }
B = {B1B2, B2B1, W1B1, W2B1 , W1B2, W2B2 }
A n B = {B1B2, B2B1}
6 1
6 1
2 1
P( A) =
= , P( B) =
= and P( AnB) =
=
2
6
12
12 2
12
P( AnB) =
1
1
P( A) P( B) = .
4
6
40
Suppose we are interested to calculate the probability that X1. The values of X which are greater than or equal
to 1 are 1 and 2. Thus, the probability that X is greater than or equal to 1, denoted P(X1), is found as P(X1)
= P(X = 1) + P(X = 2)=3/4.
Definition 7.2: Continuous random variable
A random variable X is called continuous if there exists a function fX(x) called the probability density
function of X which satisfies
a. fX(x)0 for all x.
b.
( x)dx = 1
We can use the probability density function to calculate probabilities of events expressed in terms of the random
variable X. For instance, if we are interested in the probability that X lies between two points, say a and b, we
can find it using integration of fX(x) on the interval [a,b],i.e.
b
P (a X b) = f X ( x)dx
a
iii) The probability that a continuous random variable X will assume a value in a closed intervals is the same
as the probability that it will assume in open interval or half open intervals, i.e. , P(aXb) = P(a<X<b) =
P(aX<b) = P(a<Xb), P(Xc) = P(X<c) , P(Xc) = P(X>c) where a, b, and c are constants.
Example 7.1B: The error involved in making a certain measurement is a continuous rv X with pdf
k (4 x 2 ), 2 x 2
f ( x) =
0, otherwise
Determine the value o k and compute, a) P( X < 0) , b) P(1 < X < 1) , c) P ( X < 0.5 or X > 0.5)
Solution
x3
32k
3
f ( x)d x = 1 k (4 x )d x = 1 k (4 x ) =
=1 k =
= 0.09375
3
3
32
2
2
2
0.09375(4 x 2 ), 2 x 2
Therefore, f ( x) =
0, otherwise
42
x3
a) P ( X > 0) = 0.09375(4 x )d x = 0.09375(4 x ) = 0.5
0
3 0
2
x3
b) P (1 < X < 1) = 0.09375(4 x )d x = 0.09375(4 x ) = 0.6875
1
3 1
1
c)
P ( X < 0.5 or X > 0.5) = P( X < 0.5) + P( X > 0.5) P( X < 0.5 and X > 0.5)
= P( X < 0.5) + P( X > 0.5), since there is no intersection
=
0.5
0.09375(4 x 2 )d x + 0.09375(4 x 2 )d x
0.5
0.5
x3
x3
= 0.09375(4 x )
+ 0.09375(4 x )
= 0.6328
3 2
3 0.5
7.2 Expectation of Random variable: mean and variance
We can associate with each random variable certain averages of interest, such as mean and variance which
give useful summary of a probability distribution.
Mean
Definition 7.3: The (mean) expected value of a random variable X denoted by E(X) or is given by
i) E ( X ) = xPX ( x) if X is discrete r.v.
It is useful to view the mean of X as a representative value of X, which lies somewhere in the middle of its
range. We can make this statement more precise, by viewing the mean as the center of gravity of the
distribution.
Variance
Definition 7.4: The variance of a random variable X denoted V(X) or 2 is defined as V(X)=E[(X- )2] =
E(X2) 2.
i) if X is discrete, V ( X ) = [ x 2 PX ( x)] 2
The variance provides a measure of dispersion of X around its mean. Another measure of dispersion is the
standard deviation of X, which is defined as the square root of the variance and is denoted by .
Example 6.2: Calculate the mean and variance of the random variable X in example 7.1A.
1
1
1
E ( X ) = xPX ( x) = 0 + 1 + 2 = 1
4
2
4
1
1
1
E ( X 2 ) = x 2 PX ( x) = 02 + 12 + 22 = 1.5
4
2
4
2
2
2
V ( X ) = E ( X ) = 1.5 1 = 0.5
43
Example 6.2B: Calculate the mean and variance of the r.v. X in example 7.1B
2
x4
E ( X ) = x = xf ( x)d x = 0.09375 x(4 x )d x = 0.09375(2 x ) = 0
2
4 2
4x3 x5
E ( X ) = x f ( x)d x = 0.09375 x (4 x )d x = 0.09375(
) = 0.8
2
3
5 2
2
V ( X ) = x = E ( X 2 ) x = 0.8 0 2 = 0.8
2
The mean and variance of the binomial distribution are np and np(1-p), respectively. Note that the binomial
distributions are used to model situations where there are just two possible outcomes, success and failure. The
following conditions also have to be satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent
Example 7.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out of the four trials.
Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each other. In addition the
probability that a head will appear in each trial is the same. Thus, X has a binomial distribution with number of
trials 4 and probability of success (the occurrence of head in a trial) is . The probability mass function of X is
given by
n
n
PX ( x) = 0.5 x (1 0.5) n x = 0.5 n , x = 0, 1, 2, 3,4 , Note that n = 4 and p = 1/2
x
x
44
4
P ( X = 2) = 0.5 2 (1 0.5) 42 = 0.3750
2
4
ii) P ( X = 0) = 0.5 0 (1 0.5) 40 = 0.0625
0
i)
10
(a) P ( X = 5) = 0.4 5 0.6105 = 0.200658
5
10
(b) P ( X 9) = 1 P( X = 10) = 1 0.410 0.61010 = 1 0.000105 = 0.9999
10
Hypergeometic Distribution
Hypergeometic distribution is Probability model for sampling without replacement from a finite dichotomous
(S-F) population. Applications for the hypergeometric distribution are found in many areas, with heavy uses in
acceptance sampling, electronic testing, and quality assurance. Obviously, for many of these fields testing is
45
done at the expense of the item being tested. That is the item is destroyed and hence cannot be replaced in the
sample.
The assumptions leading to the hypergeometric distribution are as follows:
The population or set to be sampled consists of N individual objects, or elements ( a finite population).
Each individual can be characterized as a success (S) or a failure (F), and there are M successes in the
population. A sample of n individuals is drawn in such a way that each subset of size n is equally likely to be
chosen.
Example 7.6. An undergraduate library has 20 copies of a certain introductory forestry text, of which 8 are first
printings and 12 are second printings. The course instructor has requested that 5 copies be put on 2 hours
reserve. If the copies are selected in a completely random fashion, what is the probability that X (X = 0, 1, 2, 3,
4, 5) of those selected are second printing?
P(X=2)
12 8
2 3
20
5
= 0.238
Proposition:
46
If X is the number of Ss in a completely random sample of size n drawn from a population consisting of M Ss
and (N-M) Fs, then the probability distribution of X, called the hypergeometric distribution, is given by
M N M
x n x
P(X=x) = h(x;n,M,N) =
N
n
N = 10, n = 5, M = 7, N-M = 3
Proposition:
The mean and variance of the hypergeometric rv X having pmf h(x; n, M, N) are
E(X) = n
M
N n M
,V ( X ) =
.n
N
N 1 N
M
1
N
Example 7.7. Lots of 40 components each are called unacceptable if they contain as many as 3 defectives or
more. The procedure for sampling the lot is to select 5 components at random and to reject the lot if a defective
is found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in the
entire lot?
Solution: using the hypergeometric distribution with n=5, N=40, M=3, and x=1, we find the probability of
obtaining one defective to be
3 37
1 4
P (X=1) = h(1; 40, 5, 3) =
40
5
= 0.3011.
This sampling plan detects a bad lot (3 defectives) only about 30% of the time.
47
Example 7.8. Five individuals from an animal population thought to be near extinction in a certain region have
been caught, tagged, and released to mix into the population. After they have had an opportunity to mix in, a
random sample of 10 of these animals is selected. Let X = the number of tagged animals in the second sample.
If there are actually 25 animals of this type in the region, what is the probability that (a) X = 2? b) X 2?
The parameter values are n = 10, M = 5 (tagged animals in the population) and N = 25
5 20
x 10 x
, x = 0, 1, 2, 3, 4, 5,
5 20
2 8
a) P(X=2) = h(2;10, 5, 25) =
25
10
0.385
Poisson distribution
Experiments yielding numerical values of a random variable X, the number of outcomes occurring during a
given time interval or in a specified region, are called Poisson experiments. The given time interval may be of
any length, such as a minute, a day, a week, a month or even a year. Examples include number of telephone
calls per hour received by an office, the number of postponed games due to rain during a baseball season, or the
number of days school is closed due to snow during the winter. The specified region could be a line segment, an
area, a volume, or a piece of material. In such instances X might represent the number of field mice per acre, the
number of bacteria in a given culture, or the number of typing errors per page.
Properties of Poisson process
The number of outcomes occurring in one time interval or specified region is independent of the number that
occurs in any other disjoint time interval or region of space.
The probability that a single outcome will occur during a very short time interval or in a small region is
proportional to the length of the time interval or the size of the region and does not depend on the number of
outcomes occurring outside this time interval or region.
The probability that more than one outcome will occur in such a short time interval or fall in such a small region
is negligible.
The number X of outcomes occurring during a Poisson experiment is called a Poisson random variable, and its
probability distribution is called the Poisson distribution. The mean number of outcomes is computed from =
e t ( t ) x
, x = 0,1,2, where is the average number of outcomes per unit time, distance, area,
x!
or volume. Both the mean and variance for the Poisson distribution are t. Poisson table provides Poisson
p ( x, t ) =
probability sums P (r ; t ) = p ( x; t ) .
x =0
Example 7.8: During laboratory experiment the average number of radioactive particles passing a counter in 1
millisecond is 4. What is the probability that 6 particles enter the counter in a given millisecond?
Solution: Using the Poisson distribution with x =6 and t = 4, from Poisson table,
e 4 (4)6
p (6,4) =
=
6!
Example 7.9: Suppose that the number of typographical errors on a single page of this lecture note has a
Poisson distribution with parameter = 1. if we randomly select a page in this lecture note, calculate the
probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
e t t k
, t = 1, x = 0,1,2,...
k!
e 110 1
= = 0.367879
a) Required P(X1)=? P ( X = 0) =
e
0!
P ( X = x) =
e 113
= 0.061313
b) P( X = 3) =
3!
c) P ( X < 2) = P( X = 0) + P( X = 1) = 0.73576
D) P ( X 1) = 1 P( X = 0) = 1 0.367879 = 0.632121
Example 7.10: If the number of accidents occurring on a highway each day is a Poisson random variable with
parameter t = 3, what is the probability that no accidents will occur on a randomly selected day in the future?
Solution: Let X= number of accidents per day
P ( X = x) =
e 3 3 x
, k = 0,1,2,...
x!
e 3 30
= e 3 = 0.05
0!
Note: The Poisson random variable has a wide range of applications in a diverse number of areas. An important
property of the Poisson random variable is that it may be used to approximate a binomial random variable when
Required P(X= 0) = ? P ( X = 0) =
49
the binomial parameter n is large and p is small. The probability that X will be k can be approximated by
e t t x
, t = np .
substituting t by np in the poisson distribution, i.e. P ( X = x) =
x!
A continuous random variable X is said to follow normal distribution , if and only if , its probability density
f X ( x) =
1 x 2
(
)
2
e
where x (-, ), (-, ) and (0, ). There are
2
infinitely many normal distributions since different values of and define different normal distributions. For
function (p.d.f.) is
instance, when = 0 and =1 , the above density will have the following form f Z ( z ) =
1
z2
2
. This
2
particular distribution is called the standard normal distribution and sometimes known as Z-distribution.. The
random variable corresponding to this distribution is usually denoted by Z. If X has a normal distribution with
mean and variance 2, we denote it as X ~ N ( , 2 ) .
iv) The total area under the curve and bounded from below by the horizontal axis is 1, i.e.
( x)dx = 1
50
However, evaluating P (a X b) = f X ( x)dx is very complicated. To facilitate this problem, we use the
a
standard normal table which gives area values bounded by two points. Areas under the standard normal
distribution curve are tabulated in various ways. The most common tables give areas bounded between Z=0 and
a positive value of Z. In addition to the standard normal table, the properties of normal distribution and the
following theorem are useful to make probability calculations very easy for any normal distribution.
Theorem 7.1: Standardization of a normal random variable
If X has a normal distribution with mean, and standard deviation , , then
X
i) Z =
will have a standard normal distribution.
P ( a < X < b) = P (
ii)
a
b
= P(
<Z<
)
<
<
Example 7.11: Let Z be the standard normal random variable. Calculate the following probabilities using the
standard normal distribution table: a) P(0<Z<1.2) b) P(0<Z<1.43) c) P(Z0) d) P(-1.2<Z<0) e) P(Z-1.43)
f) P(-1.43Z<1.2) g) P(Z1.52) h)P(Z-1.52)
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the standard normal table as
follows: look for the value 1.2 from z column ( first column) and then move horizontally until you find
the value of 0.00 in the first row. The point of intersection made by the horizontal and vertical movements
will give the desired area (probability). Hence P(0<Z<1.2)= 0.3849. Refer the table below as a guide to
find this probability.
51
Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z 0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764
52
53
>
240 200
) = P( Z > 2) = 0.5 P(0 < Z < 2) = 0.5 0.4772 = 0.0228
20
54
<
<
180 200
220 200
<Z<
) = P(1 < Z < 1)
20
20
= 2 P(0 < Z < 1) = 2 0.3413 = 0.6826
b) = P(
200 200
) = P( Z < 0) = 0.5
20
Example 7.14: Assume that the test scores for a large class are normally distributed with a mean of 74 and a
standard deviation of 10.
(a) Suppose that you receive a score of 88. What percent of the class received scores higher than yours?
(b) Suppose that the teacher wants to limit the number of A grades in the class to no more than 20%. What
would be the lowest score for an A?
Solution: Let X be the score of a randomly picked student, then X ~ N (74, 100 )
c) P ( X < 200) = P( Z <
a)
X 74 88 74
) = P ( Z > 1 .4 )
10
10
= 0.5 P(0 < Z 1.4) = 0.5 0.4192 = 0.0808
P ( X > 88) = P(
Hence 8.08 percent of the students score more than you did?
b) Let XA be the lowest mark to get letter grade A. We are given that
X 74 x A 74
P ( X x A ) = 0.2 = P(
) = P( Z > z A )
10
10
x 74
P(0 < Z z A ) = 0.5 0.2 = 0.3 z A = 0.85 z A = 0.85 = A
10
Hence, the lowest mark to get letter grade A is 82.5.
55
56
8. Sampling distribution
8.1 Sampling distribution of the sample mean
The value of the sample mean for any sample will depend on the elements included in that sample.
Consequently, the sample mean is a random variable. Therefore, like other random variable, the sample means
possess a probability distribution which is more commonly called the sampling distribution of sample mean. In
general, the probability distribution of a sample statistic is called its sampling distribution. Sampling
distribution is important in statistical inference. The important characteristics of the sampling distribution of the
sample mean are its mean, variance and the form of the distribution.
Example 7.1: Suppose we have a hypothetical population of size 3, consisting of three children namely: A is 3
years old, B is 6 years old and C is 9 years old. Construct sampling distribution of the sample mean of size 2
using sampling without replacement and with replacement.
Solution: The mean and variance of the population are 6 and 6, respectively.
1. If sampling is without replacement we will have 3C2 = 3 possible samples: (A, B), (A, C) and (B, C) and
their corresponding sample means are (3+6)/2 = 4.5, 6 and 7.5, respectively. Hence the probability
distribution (sampling distribution) of the sample mean is:
x
4.5
6
7.5
1/3
1/3
1/3
P( X = x )
2. If sampling is with replacement we will have Nn = 32 = 9 possible samples: (A, A), (A, B), (A, C), (B,
A), (B, B), (B, C), (C, A), (C, B) and (C, C). Hence the probability distribution (sampling distribution)
of the sample mean is:
x
3
4.5
6
7.5
9
2/9
3/9
2/9
1/9
P(X = x ) 1/9
E(X )=
Note:
9 The mean of the sampling distribution of the sample mean is the same as the population mean
irrespective of the sampling procedure.
9 The variance of the sampling distribution of the sample mean is:
2
if sampling is with replacement
,
n
2
N n ,
if sampling is without replacement
n N 1
9 The problem with using sample mean to make inferences about the population mean is that the sample
mean will probably differ from the population mean. This error is measured by the variance of the
sampling distribution of the sample mean and is known as the standard error. The standard error is the
average amount of sampling error found because of taking a sample rather than the whole population.
As sample size increases, the standard error decreases.
57
/ n
~ N (0,1).
Note: The central limit theorem is useful for approximating the distribution of the sample mean based on a large
sample size and when the population distribution is non normal; however, if the population is normal, then the
sampling distribution of the sample mean will be normal regardless of the sample size.
Example 8.2: If the uric acid values in normal adult males are normally distributed with mean 5.7 mgs and
standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and variance 1.
a) If a sample of size 4 is taken, then X ~ N (5.7, 0.25) since the population is normally distributed.
5 5 .7
P( X < 5) = P( Z <
) = P( Z < 1.4)
0 .5
= 0.5 P(0 < Z < 1.4) = 0.0808
b) If a sample of size 9 is taken, then X ~ N (5.7, 1/9) since the population is normally distributed.
6 5.7
P( X > 6) = P( Z >
) = P( Z > 0.9)
1
3
= 0.5 P(0 < Z < 0.9) = 0.1841
to provide an estimate of the true value of the corresponding population parameters (such as , or p ). Such a
58
single statistic is termed as point estimator, and the specific value of the statistic is termed as point estimate. For
example, the sample mean X is an estimator for population mean and X = 10 is an estimate, which is one of
the possible values of X .
Interval estimate: In most practical problems, a point estimate does not provide information about how close
is the estimate to the population parameter unless accompanied by a statement of possible sampling errors
involved based on the sampling distribution of the statistic. Hence, an interval estimate of a population
parameter is a confidence interval with a statement of confidence that the interval contains the parameter value.
An interval estimate of the population parameter consists of two bounds within which the parameter will be
contained:
L U
where L is the lower bound and U is the upper bound.
Case 1: When the population is normal.
9 If the variance 2 is known, the sampling distribution of the sample mean X is normal with mean and
variance
2
. i.e., X ~ N ,
n
n
X
and Z =
~ N(0,1).
X
will have t-distribution with
S
n
n - 1 degrees of freedom. Moreover, as the sample size increases t is approximately the same as standard
normal.
Consider the case 2 is known, we can derive a (1 )100% confidence interval for the population mean .
. By
To obtain the limit of the interval estimate, we use the standardized form of X in the above probability
X
statement. i.e., letting Z =
59
P( Z <
P( Z
P( X Z
P( X Z
< X < Z
< Z ) = 1
) = 1
< < X + Z
< < X + Z
) = 1
) = 1
< < X + Z
X Z
,
2
n
X + Z
and X + Z
In a similar way a (1 )100% confidence interval for the population mean with unknown variance 2 is
given by
S
X t (n 1)
,
2
n
X + t (n 1)
2
(X
X )2
.
n 1
Case 2: When the population is non normal.
We use the central limit theorem to approximate the distribution of the sample mean based on large sample
X
~
( n 30 ). Large sample size is a necessary condition to use the normal distribution. And hence, Z =
n
N(0,1). If is unknown we can replace it by its sample estimate S. The resulting (1 )100% confidence
interval of becomes
, X + Z
when is known
X Z 2
,
2
n
n
X Z S , X + Z S ,
when is unknown
2
2
n
n
Example 8.1: A drug company is testing a new drug which is supposed to reduce blood pressure. From the six
people who are used as subjects, it is found that the average drop in blood pressure is 2.28 millimeter of
60
mercury (mmHg) with a standard deviation of 0.95 mmHg. What is the 95% confidence interval for the mean
change in blood pressure? (Assume that the population is normal).
Solution: Given: X = 2.28 , S = 0.95 , n = 6
(1 )100% = 95% 1 = 0.95 = 0.05
= 0.025
2
9 X = 2.28 is a point estimate for the population mean drop in blood pressure .
A 95% confidence interval of population mean for unknown 2 and small sample size is:
S
S
X t (n 1)
, X + t (n 1)
.
2
2
n
n
0.95
0.95
2.28 (2.571)
, 2.28 + (2.571)
6
6
(2.28-0.997, 2.28+0.997)
(1.28, 3.27)
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg and 3.27 mmHg for the
sampled population.
Example 8.2: Punctuality of patients in keeping appointment is of interest to a research team. In a study of
patients flow through the office of general practitioners, it was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average. Previous research had shown the standard deviation to be about 8
minutes. The population distribution was felt to be not normal. What is the 90 percent confidence interval for
the true mean amount of time late for appointment?
Solution: Given: X = 17.2 , = 8 , n = 35
(1 )100% = 90% 1 = 0.90 = 0.1
= 0.05
2
Since the sample size is fairly large (n > 30), and since the population standard deviation is known, according to
the central limit theorem, the sampling distribution of sample mean is approximately normal. Thus, a
confidence interval of the population mean is given by:
X Z
, X + Z
2
2
n
n
8
8
17.2 (1.65)
, 17.2 + (1.65)
35
35
people can form different opinions by looking at data, but a hypothesis test provides a standardized decisionmaking process that will be consistent for all people.
Statistical hypothesis: is a claim (belief or assumption) about an unknown population parameter values.
Examples of hypothesis:
9 There is association between lung cancer and number of cigarettes an individual smokes.
9 The proportion of female students in Hawassa University is 0.35.
9 In sub-Saharan Africa 40% of individuals are leaving below poverty line.
Hypothesis testing: is the procedure that enables decision-makers to draw inferences about population
characteristics by analyzing the difference between the value of sample statistic and the corresponding
hypothesized parameter value.
General procedure for hypothesis testing
To test the validity of the claim or assumption about the population parameter, sample is drawn from the
population and analyzed. The result of the analysis are used to decide whether the claim is valid or not.
Step 1: State the null hypothesis ( H 0 ) and alternative hypothesis ( H 1 )
Null hypothesis ( H 0 ): refers to a hypothesized numerical value of the population parameter which is initially
assumed to be true. The null hypothesis is always expressed in the form of an equation making a claim
regarding the specific value of the population parameter. That is, for example
H 0 : = 0
where 0 is hypothesized value of the population mean.
Alternative hypothesis ( H 1 ): is the logical opposite of the null hypothesis. The alternative hypothesis states
that specific population parameter value is not equal to the value stated in the null hypothesis. For example,
H 1 : 0 (Two-sided test)
H1 : < 0
or
specified by the statistician or the researcher before the sample is drawn. The most commonly used values of
are 0.10, 0.50 or 0.01.
Step 3: Calculate the appropriate test statistic
Test statistic is a value computed from a sample that is used to determine whether the null hypothesis has to be
rejected or not. The choice of suitable test statistic depends on the sampling distribution of the sample statistic.
Accordingly, we have the following cases:
Case 1: When the population is normal.
9 If the variance 2 is known, the sampling distribution of the sample mean X is normal with mean and
variance
2
. i.e., X ~ N ,
n
n
X
and the test statistic is Z =
~ N(0,1).
X
~t (n-1).
S
n
We use the central limit theorem to approximate the distribution of the sample mean based on large sample
( n 30 ). Large sample size is a necessary condition to use the normal distribution. And hence the test statistic
is
X
~ N(0,1). If is unknown we can replace it by its sample estimate S.
Z=
n
Step 4: Establish a decision rule (critical or rejection region)
The cut-off point to reject or not reject H 0 depends on the level of significance , the type of test statistic
chosen and the form of the alternative hypothesis. If the value of the test statistic falls in the rejection region, the
null hypothesis is rejected, otherwise we do not reject H 0 (see fig 1 below). The value of the sample statistic
that separates the regions of acceptance and rejection is called critical value. For a specified , we read the
critical values from the Z or t tables, depending on the test statistic chosen.
63
ii.
iii.
Alternative hypotheses
H1 : 0
H1 : > 0
H1 : < 0
Reject H 0 : = 0 if
Z > Z
Reject H 0 : = 0 if
t > t (n 1)
Z > Z
Z < Z
t > t (n 1)
t < t (n 1)
nonrejection of H 0 when it is true. However, the correct decision is not always possible. Since the decision to
reject or do not reject a hypothesis is based on sample data, there is a possibility of committing an incorrect
decision or error. Hence, a decision-maker may commit one of the two types of errors while testing a null
hypothesis. These errors are summarized as follows:
Null Hypothesis ( H 0 )
Decision
Reject H 0
True
False
Type I error ( )
Correct decision
Accept H 0
Correct decision
Type II error ( )
64
Type I error is committed if we reject the null hypothesis when it is true. The probability of committing a type I
error, denoted by is called the level of significance. The probability level of this error is decided by the
decision-maker before the hypothesis test is performed. Type II error is committed if we do not reject the null
hypothesis when it is false. The probability of committing a type II error is denoted by (Greek letter beta). As
type one error increases type two error will decrease (they are inversely proportional). Hence we cannot reduce
both errors simultaneously. As the sample size increases both errors will decrease.
Example 8.3: The life expectancy of people in the year 1999 in a country is expected to be 50 years. A survey
was conducted in eleven regions of the country and the data obtained, in years, are given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of significance.
Solution: Let be the life expectancy of people in the year 1999 in a country.
1. H 0 : = 50 (The life expectancy of people in the year 1999 in a country is 50 years)
H 1 : 50 (The life expectancy of people in the year 1999 in a country is different from 50 years)
X =
i =1
11
x
i =1
2
(
xi ) 1
(598.5) 2
1
S =
xi n = 10 32799.91 11
n 1
1
= (236.07) = 23.607
10
2
S = 23.607 = 4.859
Then, the t-test statistic is calculated as:
X 0 54.41 50 4.41
t=
=
=
= 3.01
4.859
S
1.465
11
n
4. For = 0.05 and two-tailed test, the critical (table) value is:
t (n 1) = t 0.05 (11 1) = t 0.025 (10) = 2.228
2
Since t = 3.01 > t (n 1) = 2.228 reject the null hypothesis H 0 . That is, the calculated t value lies in
2
5. Conclusion: The data do not confirm the expected view. That is, the life expectancy is different from 50
years at 5% level of significance.
Example 8.4: Suppose that we want to test the hypothesis with a significance level of .05 that the climate has
changed since industrialization. Suppose that the mean temperature throughout history is 50 degrees. During
the last 40 years, the mean temperature has been 51 degrees and the population standard deviation is 2 degrees.
What can we conclude?
Solution:
Let be the mean temperature.
1. H 0 : = 50 (There is no change in temperature since industrialization)
H 1 : 50 (There is change in temperature since industrialization)
51 50
1
=
= 3.16
2
0.316
40
n
4. For = 0.05 and two-tailed test, the critical (table) value is:
Z = Z 0.05 = Z 0.025 = 1.96
Z=
Since Z = 3.16 > Z = Z 0.025 = 1.96 reject the null hypothesis H 0 . That is, the calculated Z value
2
66
X 0
33.3 30
3.3
=
=
= 4.23
12.14
S
0.7804
242
n
4. For = 0.05 and right-tailed test, the critical (table) value is:
Z = Z 0.05 = 1.65
Z=
Since Z = 4.23 > Z = 1.65 reject the null hypothesis H 0 . That is, the calculated Z value lies in the
rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy midwife women is greater
than 30 at 5% level of significance.
67
4. The following bar graph displays high school average score of different groups of students enrolled at
certain College (Ethiopia) in 1992/93 academic year ( source Laekemariam (1994), EJE). Comment
on the classification of the data and purpose of the author.
E = Entrants (enrolled students )
W = Withdrawals
D = Dismissed students
S = Freshman Survivors
74
58.4 57.6
58
40
20
0
Groups
68
Fig 1. High School mean scores of the entrants and the groups.
5. From Fig 2, comment on the classification of the data and purpose of the author ( source Laekemariam (1994), EJE) .
70
60 49.5
50
40
30
20
10
0
65.3
Groups
6. In Fig 3 (line graph), the relation of first year first semester withdrawals and students subject to dismissal of the 19881993 entries of WGCF are considered. Can we generalize the conclusion we draw from Fig 3 to all Universities in the
country? Why?
Percent
40
30
W
SD
20
10
0
1988 1989 1990 1991 1992 1993
Entry Year
Fig 3. An increase of withdrawers (W) create a decrease of students subject to academic dismissal (SD) or vice versa.
7. A family plans its expenditure for a month whose total income is birrs 4000 as shown in the table below. Use pie chart
to represent the data
Items
Amount (Birrs)
Housing
776
Food
1168
69
Children Education
724
Clothing
260
Savings
888
Miscellaneous
184
Total
4000
8. The concentration of suspended solids in river water is an important environmental characteristic. A research paper
reported on concentration (in parts per million, or ppm) for several different rivers. Suppose the following 50
observations had been obtained for a particular river:
55.8
60.9
37.0
91.3
65.8
42.3
33.8
60.6
76.0
69.0
45.9
39.1
35.5
56.0
44.6
71.7
61.2
61.5
47.2
74.5
83.2
40.0
31.7
36.7
62.3
47.3
94.6
56.3
30.3
68.2
75.3
71.4
65.2
52.6
58.2
48.0
61.8
78.8
39.8
65.0
60.7
77.1
59.1
49.5
69.3
69.8
64.9
27.1
87.1
66.3
Frequency
.15-<.25
.25-<.35
14
70
.35-<.45
28
.45-<.50
24
.50-<.55
39
.55-<.60
51
.60-<.65
106
.65-<.70
84
.70-<.75
11
211
183
211
180
194
200
15. A sample of eight resistors of a certain type resulted in the sample resistances (ohms) x1 = 40, x2 = 43, x3 = 39, x4 =
35, x5 = 37, x6 = 43, x7 = 46, x8 = 37.
a. Compute s2 and s directly from the definitions.
b. Compute s2 and s using the shortcut formula
c. Subtract 35 from each xi and then compute S2.
d. If the resistances were 400, 430, 390, 350, 370, 430, 460, and 370, how would you use the results of parts
(a), (b), or (c) to compute S2 and s?
16. The accompanying data appeared in an article in Technometrics that discussed the analysis of information form
weather-modification experiments. Construct side-by-side box plots and then comment on similarities and differences.
Rainfall from
17.
Rainfall from
Control Clouds
Seeded Clouds
1202.6
41.1
2745.6
200.7
830.1
36.6
1697.8
198.6
372.4
29.0
1656.0
129.6
345.5
28.6
978.0
119.0
321.2
26.3
703.4
118.3
244.3
26.1
489.1
115.3
163.0
24.4
430.0
92.4
147.8
21.7
334.1
40.6
95.0
17.3
302.8
32.7
87.0
11.5
274.7
31.4
81.2
4.9
274.7
17.5
68.5
4.9
255.0
7.7
47.3
1.0
242.5
4.1
A. For what value of c is the quantity (xi c)2 minimized? (HINT: Take the derivative with respect to c, set equal to
0, and solve.)
B. Using the result of part (a), Which of the two quantities (xi x)2 and
(xi )2 will be smaller than the other (assuming that x )?
72
A. Let a and b be constants and let yi = axi + b for i= 1,2, . . . , n. What are the
relationships between Sx2 and Sy2 ?
18.
21. The accompanying data resulted from a study carried out to examine the relationship between a measure of the
corrosion of iron (y) and the concentration of NaPO4 (x, in ppm)
x
y
2.50
7.60
x
y
26.20
.93
5.03
6.95
7.60
6.30
33.00
.72
11.60
5.75
40.00
.68
13.00
5.01
50.00
.65
19.60
1.43
55.00
.56
a. Construct a scatter plot of the data. Does the simple linear regression model appear to be plausible?
b. Calculate the equation of the estimated regression line, use it to predict the value of the corrosion rate that would be
observed for a concentration of 33 ppm, and calculate the corresponding residual.
c. Calculate correlation coefficient and coefficient of determination.
d. What percentage of sample variation in corrosion can be attributed to the model relationship?
22. The accompanying data was read from a graph that appeared in the paper Reactions on Painted Steel Under the
Influence of Sodium Chloride, and Combinations Thereof (Ind. Engr. Chem. Prod. Res. Dev., 1985, pp. 375-378). The
independent variable is SO2 deposition rate (mg/m2/day) and the dependent variable is steel weight loss (g/m2).
x
y
14
280
18
350
40
470
43
500
45
560
112
1200
a. Construct a scatter plot. Does the simple linear regression model appear to be reasonable in this situation?
b. Calculate the equation of the estimated regression line.
73
c. What percentage of observed variation in steel weight loss can be attributed to the model relationship in combination
with variation in deposition rate?
d. Because the largest x value in the sample greatly exceeds the others, this observation may have been very influential
in determining the equation of the estimated line. Delete this observation and recalculate the equation. Does the new
equation appear to differ substantially from the original one (you might consider predicted values)?
23. A family that owns two cars is selected, and for both the older car and the newer car we note whether the car was
manufactured in America, Europe, or Asia.
a. What are the possible outcomes of this experiment?
b. Which outcomes are contained in the event that one car is American and the other not American?
c. Which outcomes are contained in the event that at least one of the two cars is not American? What is the
complement of this event? Is either of these two events a simple event?
24 college library has five copies of a certain text on reserve. Two copies (1 and 2) are first printings, and the other three
(3, 4, and 5) are second printings. A student examines these books in random order, stopping only when a second
printing has been selected. One possible outcome is 5, and another is 213.
a. List the outcomes in S
b. Let A denote the event that exactly one book must be examined. What outcomes are in A?
c. Let B be the event that book 5 is the one selected. What outcomes are in B?
d. Let C be the event that book 1 is not examined. What outcomes are in C?
25 Use Venn diagrams to verify the following two relationships for any events A and B (these are called
De Morgans laws):
c. ( A B) = A B
d. (A B) =A B
26. A family that owns two automobiles is selected at random. Let A1 = { the older car is American} and A2 = {the newer
car is American}. If P(A1) = .7, p(A2) = .5, and P(A1 A2) = .4, compute the following:
e. P(A1 A2) ( the probability that at least one car is American).
f. The probability that neither car is American.
g. The probability that exactly one of the two cars is American.
27. In a school machine shop, 60% of all machine breakdowns occur on lathes and 15% on drills. Let
A = {the next machine breakdown is a lathe}, and
B = { the next machine breakdown is a drill} (so that A and B are mutually exclusive). With P(A) = .60 and P(B)
= .15, calculate the following:
A. P(A)
B. P(AB)
C. P(AB)
28. A video store sells two different brands of VCRs, each of which comes with either two heads or four heads. The
accompanying table gives the percentages of recent purchasers buying each type of VCR:
74
Brand
Number of Heads
2
25%
16%
32%
27%
Suppose a recent purchaser is randomly selected and both the brand and the number of heads are determined.
A. What are the four simple events?
B. What is the probability that the selected purchaser bought brand Q, with two heads?
C. What is the probability that the selected purchaser bought brand M?
29.A Library has five copies of a certain text, of which copies 1 and 2 are first
printing and copies 3, 4, and 5 are second printings. Two copies are to be
randomly selected to be placed on 2- hour reserve (implying 10 equally likely
outcomes).
A.
B.
C.
D.
What is the probability that both selected copies are first printings?
What is the probability that both selected copies are second printings?
What is the probability that at least one selected copy is a first printing?
What is the probability that the selected copies are different printings?
30 The student Engineers Council at a certain college has one student representative
from each of the five engineering majors (civil, electrical, industrial, materials, and
mechanical). In how many ways can
k. How many selections result in all 6 workers coming from the day shift? What is the probability that all 6
selected workers will be from the day shift?
l. What is the probability that all 6 selected workers will be from the same shift?
75
m. What is the probability that at least two different shifts will be represented among the selected workers?
n. What is the probability that at least one of the shifts will be unrepresentative in the sample of workers?
33. An experimenter is studying the effects of temperature, pressure, and type of
catalyst on yield from a certain chemical reaction. Three different temperatures,
four different pressures, and five different catalysts are under consideration.
A. If any particular experimental run involves the use of a single temperature, pressure, and catalyst,
how many experimental runs are possible?
B. How many experimental runs are there that involve use of the lowest temperature and two lowest
pressures?
34. An engineering professor wishes to schedule an appointment with each of her eight teaching assistants, four men
and four women, to discuss her calculus course. Suppose all possible orderings of appointments are equally likely to
be selected.
A. t is the probability that at least one female assistant is among the first three the professor meets with?
B. t is the probability that after the first five appointments she has met with all female assistants?
C. Suppose the professor has the same eight assistants the following semester and again schedules
appointments without regard to the ordering during the first semester. What is the probability that the
orderings of appointments are different?
35.The head size and grip size are determined for a randomly selected tennis racket purchaser at a certain sporting
goods store. Relevant probabilities appear in the accompanying table:
Grip Size
4 3/8 in.
Head Size
4 5/8 in.
4 in.
Midsize
.10
.20
.15
Oversize
.20
.15
.20
Let A denote the event that a midsize racket was purchased and B denote the event that a racket with a 4 in.
grip was purchased.
A. Determine P(A), P(B), and (A B).
B. Calculate both P(A/B) and P(B/A) and explain in words what you have calculated in each case.
C. If C denotes the event that grip size is at least 4 in., calculate and interpret P(A/C).
36. A mathematics professor is teaching both a morning and an afternoon section
of introductory calculus. Let
A = {the professor gives a bad morning lecture}
and B= {the professor gives a bad afternoon lecture}. If P(A)
= .3, P(B) = .2,
and P(AB) = .1, calculate the following probabilities (a Venn diagram might help):
A. P(B/A)
B. P(B/A)
C. P(B/A')
D. P(B/A)
76
E. If at the conclusion of the afternoon class, the professor is heard to mutter what a rotten lecture, what
is the probability that the morning lecture was also bad?
37.At a certain gas station, 40% of the customers use regular unleaded gas (A1), 35% use extra unleaded gas (A2),
and 25% use premium unleaded gas (A3). Of those customer using regular gas, only 30% fill their tanks (event
B). Of those customers using extra gas, 60% fill their tanks, while of those using premium, 50% fill their tanks.
A. What is the probability that the next customer will request extra unleaded gas and fill the tank (A2
B)?
B. What is the probability that the next customer fills the tank?
C. If the next customer fills the tank, what is the probability that regular gas in requested? Extra gas?
Premium gas?
38. If A and B are independent events, show that A & B and A' & B' are also independent.
[HNT: P(A B) = P(B) P(A B) and P(A B') = P((AB)')]
39.An executive has both a morning and an afternoon meeting on a particular day. Let A = { late to the morning
meeting} and B = {Late to the afternoon meeting}.
a. If P(A) = .4, P(B) = .5, and P(A B) = .25, are A and B independent events?
b. If A and B are independent event with P(A) =.4 and P(B) =.5, what is the probability that the
executive is on time to both meetings? To exactly one meeting?
40.The probability that a grader will make a marking error on any particular question of a multiple choice exam
is .1. If there are ten questions and questions are marked independently, what is the probability that no errors
are made? That at least one error is made? If there are n questions and the probability of a marking error is p
rather than .1, give expressions for these two probabilities.
41. Three automobiles are selected at random, and each is categorized as having a diesel (S) or nondiesel (F)
engine (so outcomes are SSS, SSF, etc.). If X = the number of cars among the three with diesel engines, list each
outcome in S and its associated X value.
42 The number of pumps in use at both a six-pump station and a four-pump station will be determined. Give the
possible values for each of the following random variables.
43.An automobile service facility specializing in engine tune-ups knows that 45% of all tune-ups are done on
four-cylinder automobiles, 40% on six-cylinder automobiles, and 15% on eight-cylinder automobiles. Let X = the
number of cylinders on the next car to be tuned.
A. What is the pmf X?
B. Draw both a line graph and a probability histogram for the pmf of part (a).
44.Let X = the number of tires on a randomly selected automobile that are under inflated.
A. Which of the following three p(x) functions is a legitimate pmf for X, and why are the other two not
allowed?
X
p(x)
.3
.2
.1
.05
.05
p(x)
.4
.1
.1
.1
.3
p(x)
.4
.1
.2
.1
.3
B. For the legitimate pmf of part (a), compute P(2 X 4 ), P(X 2), and P(X 0).
C. If p(x) = c (5 x) for x = 0,1, . . . , 4, what is the value of c? [x4 = 0 p(x) = 1.]
45. Two fair six-sided dice are tossed independently. Let M = the maximum of the two tosses (so M(1,5) = 5,
M(3,3) = 3, etc.
A. What is the pmf of M? [HINT: First determine p(1), then p(2), and so on.]
B. Determine the cdf of M and graph it
46.An insurance company offers its policyholders a number of different premium payment options. For a
randomly selected policyholder, let X = the number of months between successive payments. The cdf of X is as
follows:
78
F(x) =
x <1
.03
1x<3
.40
3x<4
.45
4x<6
.60
6 x < 12
12 x
p(x)
.08
.15
.45
.27
.05
E(X)
V(X) directly from the definition
The standard deviation of X
V(X) using the shortcut formula
48.An instructor in a technical writing class has asked that a certain report be turned in the following week,
adding the restriction that any repot exceeding four pages will not be accepted. Let Y = the number of pages in a
randomly chosen students report and suppose that Y has pmf
p(y)
.01
.19
.35
.45
A. Compute E(Y)
79
B. Suppose the instructor spends Y minutes grading a paper consisting of Y pages. What is the expected
amount of time [E(Y)] spent grading a randomly selected paper?
49.An appliance dealer sells three different models of upright freezers having 13.5, 15.9, and 19.1 cubic feet of
storage space, respectively. Let X = the
amount of storage space purchased by the next customer to buy a
freezer. Suppose that X has pmf.
13.5
15.9
19.1
p(x)
.2
.5
.3
51. When circuit boards used in the manufacture of compact disc players are tested, the long-run percentage of
defectives is 5% .Let X = the number of defective boards in a random sample of size n = 25, so X Bin(25,.05).
A. Determine P(X 2).
B. Determine P(X 5).
C. Determine P(1 X 4).
D. What is the probability that none of the 25 boards are defective?
E. Calculate the expected value and standard deviation of X.
52. Suppose that only 20% of all drivers come to a complete stop at an intersection
having flashing red lights in all directions when no other cars are visible. What is
the probability that, of 20 randomly chosen drivers coming to an intersection
under these conditions,
80
53. Customers at a gas station selection either regular (A), premium (B), or dieselfuel (C). Assume that successive
customers make independent choices, with P(A) = .3, P(B) = .2, and P(C) = .5,
A. Among the next 100 customers, what are the mean and variance of the number who select
regular fuel? Explain your reasoning.
B. Answer part (a) for the number among the 100 who select a nondiesel fuel.
54. Let X denot the amount of time for which a book on 2-hour reserve at a college library is checked out by a
randomly selected student and suppose that X has density function
.5x , 0 x 2
f(x)=
Otherwise
Otherwise
56.A college professor never finishes his lecture before the bell rings to end the
period and always finishes his lecture within 1 min after the bell rings. Let X =
the time that elapses between the bell and the end of the lecture and suppose the
pdf of X is
Kx2 0 x 1
f(x)=
Otherwise
A. Find the value of k. [HINT: Total area under the graph of f(x) is 1.]
B. What is the probability that the lecture ends within 1/2 min of the
bell ringing?
C. What is the probability that the lecture continues beyond the bell for etween 15 and 30 sec?
D. What is the probability that the lecture continues for at least 40 sec beyond the bell?
,x < 0
x2
F(x)=
,0 x < 2
4
1
,2 x
b. P(.5 X 1)
2(1- x) 0 x 1
f(x)=
82
Otherwise
A.
= .0055
B. = .09
C. = .663
60.The air pressure in a randomly selected tire put on a certain model new car is
normally distributed with mean value 31 psi and standard deviation.2 psi.
A. What is probability that the pressure for a randomly selected tire exceeds 30.5 psi?
B. What is the probability that the pressure for a randomly selected tire is between 30.5 and 31.5 psi?
Between 30 and 32 psi?
C. Suppose a tire is classed as under inflated if its pressure is less than 30.4 psi. What is the probability
that at least one of the four tires on a car is under inflated? (HINT: If A = {at least 1 tire is under
inflated}, what is the complement of A?)
61. Suppose only 40% of all drivers in a certain state regularly wear a seatbelt. A
random sample of 500 drivers is selected. What is the probability that
B. Between 180 and 230 (inclusive) of the drivers in the sample regularly wear a seatbelt?
B. Fewer than 175 of those in the sample regularly wear a seatbelt? Fewer than 150?
62.Let X have a binomial distribution with parameters n=25 and P. Calculate each of the following probability
using the normal approximation (with the continuity correction) for the cases p = .5, .6, and .8 and compare to
the xact probabilities calculated from Binomial Table.
A. P(15 X 20)
B. P(X 15)
C. P(20 X)
83
63.On average a certain intersection result in 3 traffic accidents per month. What is the probability that for any
given month at this intersection
A. exactly 5 accidents will occur?
B. Less than 3 accidents will occur?
C. At least 2 accidents will occur?
64. The average number of field mice per acre in a 5-acre wheat field is estimated to be 12. Find the probability
that fewer than 7 field mice are found
A. on a given acre
B. on 2 of the acres
65.A secretary makes 2 errors per page, on average . What is the probability that on the next page he or she will
make
A. 4 or more errors? B. no errors?
84