QM1 Notes
QM1 Notes
Statistics I: QM 1
Lecture Notes
by
Stefan Waner
TABLE OF CONTENTS
0. Introduction................................................................................................... 2
1. Describing Data Graphically ...................................................................... 3
2. Measures of Central Tendency and Variability........................................ 8
3. Chebyshev's Rule & The Empirical Rule................................................ 13
4. Introduction to Probability ....................................................................... 15
5. Unions, Intersections, and Complements ................................................ 23
6. Conditional Probability & Independent Events..................................... 28
7. Discrete Random Variables....................................................................... 33
8. Binomial Random Variable...................................................................... 37
9. The Poisson and Hypergeometric Random Variables............................ 44
10. Continuous Random Variables: Uniform and Normal....................... 46
11. Sampling Distributions and Central Limit Theorem.......................... 55
12. Confidence Interval for a Population Mean .......................................... 61
13. Introduction to Hypothesis Testing........................................................ 66
14. Observed Significance & Small Samples............................................... 72
15. Confidence Intervals and Hypothesis Testing for the Proportion ...... 75
1
Note: Throughout these notes, all references to the “book” refer to the class text:
“Statistics for Business and Economics” 8th Ed.
by Anderson, Sweeney, Williams (South-Western/Thomson Learning, 2002)
Topic 0 Introduction
Q: What is statistics?
A: Basically, statistics is the “science of data.” There are three main tasks in statistics: (A)
collection and organization, (B) analysis, and (C) interpretation of data.
(A) Collection and organization of data: We will see several methods of organizing
data: graphically (through the use of charts and graphs) and numerically (through the use of
tables of data). The type of organization we do depends on the type of analysis we wish to
perform.
Quick Example Let us collect the status (freshman, sophomore, junior, senior) of a group
of 20 students in this class. We could then organize the data in any of the above ways.
(B) Analysis of data: Once the data is organized, we can go ahead and compute various
quantities (called statistics or parameters) associated with the data.
Quick Example Assign 0 to freshmen, 1 to sophomores etc. and compute the mean.
(C) Interpretation of data: Once we have performed the analysis, we can use the
information to make assertions about the real world (e.g. the average student in this class
has completed x years of college).
In QM1 we begin with descriptive statistics, and then use our knowledge to introduce
inferential statistics.
2
Topic 1
Describing Data Graphically
Example Choose a highly paid CEO and record the highest degree the CEO has received.
Here is a set of fictitious data:
The four categories are called classes, and the relative frequencies are the fraction in each
class:
frequency
Relative Frequency of a class = total .
Graphical Representation
1. Bar graph
To get the graph, just select all the data and go to the Chart Wizard.
0.5
0.4
0.3
0.2
0.1
0
None Bachelors Masters Doctorate
3
2. Pie chart
Doctorate
20%
None
8%
Masters
28%
Bachelors
44%
3. Cumulative Distributions
To get these, we sort the categories by frequency (largest to smallest) and then graph relative
frequency as well as cumulative frequency:
This shows that, for instance, that more than 90% of all CEOs have some degree, and that
72% have either a Bachelors or Masters degree.
Example 1 Choose a lawyer in a population sample of 1,000 lawyers (the experiment) and
record his or her income. Since there are so many lawyers, it is usually convenient to divide
the outcome into measurement classes (or "brackets").
Suppose that the following table gives the number of lawyers in each of several income
brackets.
4
Income $20,000 - $30,000 - $40,000 - $50,000 - $60,000 - $70,000 - $80,000 -
Bracket $29,999 $39,999 $49,999 $59,999 $69,999 $79,999 $89,999
Frequency 20 80 230 400 170 70 30
Let X be the number that is the midpoint of an income bracket. Find the frequency
distribution of X.
Solution Since the first bracket contains incomes that are at least $20,000, but less than
$30,000, its midpoint is $25,000. Similarly the second bracket has midpoint $35,000, and
so on. We can rewrite the table with the midpoints, as follows.
In Figure 2 we see the histogram of the frequency distribution and the histogram of the
probability distribution. The only difference between the two graphs is in the scale of the
vertical axis (why?).
400
350
300
250
200
150
100
50
0
25000 35000 45000 55000 65000 75000 85000 X
5
rel. frequency Realtive Frequency Histogram
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
25000 35000 45000 55000 65000 75000 85000 X
Note We shall often be given a distribution involving categories with ranges of values (such
as salary brackets), rather than individual values. When this happens, we shall always take X
to be the midpoint of a category, as we did above. This is a reasonable thing to do,
particularly when we have no information about how the scores were distributed within each
range.
Example 2
We use the Data Analysis Toolpac to make a histogram for the some random whole
numbers between 0 and 100:
Then we use “Bins” to sort the data into measurement classes. Each bin entry denotes the
upper boundary of a measurement class; for instance, to get the ranges 0-99, 100–199, etc,
use bin values of 99, 199, 299, etc. Here is what we can get for the current experiment:
6
Homework
p. 28 #5, 6, 10
p. 36 #16 (Table 2.9 appears on the next page.)
7
Topic 2
Measures of Central Tendency and Variability
The central tendency of a set of measurements is its tendency to cluster around one or
more values. Its variability s its tendency to spread out.
The sample mean of a variable X is the sum of the X-scores for a sample of the
population divided by the sample size:
£xi sum!of!x-values
x– = = sample!size .
n
The population mean is the mean of the scores for the entire population (rather than just a
sample) and we denote it by µ rather than x–.
Note In statistics, we use the sample mean to make an inference about the population mean.
Example 1 Calculate the mean of the sample scores {5, 3, 8, 5, 6} (in class)
Example 2 You are the manager of a corporate department with a staff of 50 employees
whose salaries are given in the following frequency table.
Solution To find the average salary we first need to find the sum of the salaries earned by
your employees.
1,390,000
Thus, the average annual salary is µ = 50 = $27,800.
8
The sample median is the middle number when the scores are arranged in ascending order.
To find the median, arrange the scores in ascending order. If n is odd, m is the middle
number, otherwise, it is the average of the two middle numbers. Alternatively, we can use the
following formula:
n+1
m= -th score.
2
(If the answer is not a whole number, take the average of the scores on either side.)
The mode is the score (or scores) that occur most frequently in the sample. The modal
class is the measurement class containing the mode.
Measures of Variability
Percentiles
When we say “the 30th percentile for the first quiz is 43” we mean that at least 30% of the
student got a score ≤ 43 and at least 70% got a score ≥ 43. (We can't always find a score
such that exactly 30% got less and exactly 70% got more, as happens in the first example
below.)
In general, the pth percentile is a number such that at least p% of the scores are ≤ that
number and at least (100-p)% of the socres are ≥ that number. To compute it, arrange the
scores in order, calculate
Ê p ˆ
i = ÁË 100˜¯ n
If i is a whole number, take the average of the ith score and the next one above it (the
(i+1)st score). If i is not a whole number, take the (i+1)st score.
Example 6 Find the 30th percentile for the scores {10, 10, 10, 10, 10, 80, 80, 80, 80, 80}.
Quartiles
Quartiles are just certain percentiles. The first quartile Q1 is 25th percentile. the second
quartile Q2 is the 50% percentile (which is also the median) and the third quartlie Q3 is
the 75th percentile.
9
To get the quartile in Excel, use
=QUARTILE(Cell Range,q)
Range
This is just Xmax - Xmin, and measures the total spread of the data.
s= s2 .
Shortcut Formula:
£(xi2)!-!(£xi)2/n
s2 = .
n-1
Excel:
Variance: =VAR(Range)
St. Deviation: =STDEV(Range)
10
Example 8 Calculate the sample variance and sample standard deviation for the data set
{3.7, 3.3, 3.3, 3.0, 3.0, 3.0, 3.0, 2.7, 2.7, 2.3}.
0
1 1.3 1.7 2 2.3 2.7 3 3.3 3.7 4
xi x i - x– ( xi - x–)2
3.7 0.7 0.49
3.3 0.3 0.09
3.3 0.3 0.09
3.0 0 0
3.0 0 0
3.0 0 0
3.0 0 0
2.7 -0.3 0.09
2.7 -0.3 0.09
2.3 -0.7 0.49
Totals 30.0 0 1.34
The second column, xi - x–, is obtained by subtracting the average, x– = 3.0, from each of the
race times in the first column. The entries in the last column are the squares of the entries in
the second column.
The sample variance, s2, is the sum of the entries in the right-hand column, divided by n-1
= 9:
1.34
s2 = = 0.14888....
9
s= 0.14888.. ‡ 0.38586.
11
Note For the population variance, we take the actual average of the (xi - x–)2. That is, we
divide by n instead of n-1, and we call this ß2 instead of s2.
Excel:
Pop Variance: =VARP(Range)
Pop. St. Deviation: =STDEVP(Range)
Homework
p. 79, #8, 12
p. 88 #18 (The coefficient of variation means the size of the standard deviation as a
percentage of the size of the mean, given by s/x–¿100, and can be used to compare the
variability of samples with totally different means, like the variability of the lengths of rivers
as compared with the variability of the number of stocks in a portfolio.) , #20 (The
interquartlie range is the difference between Q3 and Q 1 and is yet another measure of
variability.)
12
Topic 3
Interpreting the Standard Deviation: Chebyshev's Rule & The Empirical Rule
(Section 2.6 in book)
Question Suppose we have a set of data with mean x– = 10 and standard deviation s = 2.
How do we interpret this information?
Answer This is given by the following rules
Chebyshev's Rule
Applies to all distributions, regardless of shape.
1. At least 3/4 of the scores fall within 2 standard deviations of the mean; that is, in the
interval (x–-2s, x–+2s) for samples, or (µ-2ß, µ+2ß) for populations.
2. At least 8/9 of the scores fall within 3 standard deviations of the mean; that is, in the
interval (x–-3s, x–+3s) for samples, or (µ-3ß, µ+3ß) for populations.
3. In general, for k > 1, at least 1-1/k2 of the scores fall within k standard deviations of the
mean; that is, in the interval (x–-ks, x–+ks) for samples, or (µ-kß, µ+kß) for populations.
Empirical Rule
Applies to mound-shaped, symmetric distributions
1. Approximately 68% of the scores fall within 1 standard deviation of the mean; that is, in
the interval (x–-s, x–+s) for samples, or (µ-ß, µ+ß) for populations.
2. Approximately 95% of the scores fall within 2 standard deviations of the mean; that is, in
the interval (x–-2s, x–+2s) for samples, or (µ-2ß, µ+2ß) for populations.
3. Approximately 99.7% of the scores fall within 3 standard deviations of the mean; that is,
in the interval (x–-3s, x–+3s) for samples, or (µ-3ß, µ+3ß) for populations.
Answer
(a) 15/16 = 1 - 1/16 = 1 - 1/42, so we take k = 4. By Chebyshev, the interval is
(x–-4s, x–+4s) = (8.49-4(1.98), 8.49+4(1.98) ) = (0.57, 16.41)
1
(b) We want 1 - 2 at least 0.98
k
1
Try various values of k: k = 2: 1 - 2 = .75 too small
2
1
k = 3: 1 - 2 = .888 too small
3
1
k = 4: 1 - 2 = .9395 too small
4
1
k = 5: 1 - 2 = .96 big enough
5
Thus, we can take k = 6, and obtain
(x–-6s, x–+6s) = (8.49-6(1.98), 8.49+6(1.98) ) = (-3.39, 28.37)
13
So, we can use (0, 28.37), since no scores can be negative in this experiment.
Note Almost all (8/9 or 99.7% for nice distributions) will fall within 3 standard deviations
of the mean, so the entire range of scores should not exceed approximately 3 standard
deviations. This gives us a "guestimate" of whether our calculation of the standard deviation
is reasonable.
Answer 40 months is two standard deviations from the mean. By the empirical rule, the
chance of a battery falling within (µ-2ß, µ+2ß) is 95%. Thus approximately only 5% fall
outside that range. Half of those fall to the left, the rest to the right, so only about 2.5% of
batteries should fail before 40 months. Thus, you have reason to doubt the claim, or else
you were extremely unlucky to be in the bad 2.5%.
If you bought, say, 10 batteries and discovered that their mean lifespan was less than 40
months, you would be pretty confident that the manufacturer was wrong. How confident?
We'll see towards the end of the course.
x!-!x–
z= for samples
s
or
x!-!µ
z= for populations
ß
The z-score measures the number of standard deviations a specific value xis away from the
mean. So, if a data value has z = -1.5, it means that it is 1.5 standard deviations below the
mean. An outlier is a data value that has a |z| > 3. We need to carefully review outliers to
check whether they belong there, or are due to measurement errors.
Note we can rewrite Chebyschev's rule and the empirical rule in terms of z-scores.
Homework
www.FiniteMath.com Æ Student Web Site Æ Chapter Review Exercises Æ Statistics
# 2, 3, 4, 9
p. 93 #32, 34, 36
14
Topic 4
Introduction to Probability
(Based on 4.1, 4.2 in book)
Sample Spaces
Let's start with a familiar situation: If you toss a coin and observe which side lands up, there
are two possible results: heads (H) and tails (T). These are the only possible results,
ignoring the (remote) possibility that the coin lands on its edge. The act of tossing a coin is
an example of an experiment. The two possible results H and T are the possible outcomes
of the experiment, and the set S = {H, T} of all possible outcomes is the sample space for
the experiment.
Quick Examples
1. Experiment: Flip a coin and observe the side facing up.
Outcomes: H, T
Sample Space: S = {H, T}
2. Experiment: Select a student in your class.
Outcomes: The students in your class
Sample Space: The set of students in your class.
3. Experiment: Select a student in your class and observe the color of his or her hair
Outcomes: red, black, brown, blond, green, ...
Sample Space: { red, black, brown, blond, green, ...}
4. Experiment: Cast a die and observe the number facing up.
Outcomes: 1, 2, 3, 4, 5, 6
Sample Space: S = {1, 2, 3, 4, 5, 6}
5. Experiment: Cast two distinguishable dice and observe the numbers facing up.
Outcomes: (1,1), (1,2), ... , (6,6) (36 outcomes)
ÏÔ (2,1) (2,2) (2,3) (1,4)
(1,1) (1,2) (1,3) (1,5) (1,6) ¸
(2,4) (2,5) (2,6) Ô
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
Sample Space: S = Ì (4,1)! (4,2)! (4,3)! (4,4)! (4,5)! (4,6)! ˝
ÔÓ (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) Ô˛
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
n(S) = 36
6. Experiment: Cast two indistinguishable dice and observe the numbers facing up.
Outcomes: (1,1), (1,2), ... , (6,6) (21 outcomes)
ÏÔ ! (2,2) (2,3) (1,4)
(1,1) (1,2) (1,3) (1,5) (1,6) ¸
(2,4) (2,5) (2,6) Ô
! ! (3,3) (3,4) (3,5) (3,6)
Sample Space: S = Ì ! ! ! ! ! ! (4,4)! (4,5)! (4,6)! ˝ ;
ÔÓ ! ! ! ! (5,5) (5,6) Ô˛
! ! ! ! ! (6,6)
n(S) = 21
7. Experiment: Cast two dice and observe the sum of the numbers facing up.
15
Outcomes: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Sample Space: S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
8. Experiment: Choose 2 cars (without regard to order) at random from a fleet of 10.
Outcomes: Collections of 2 cars chosen from 10.
Sample Space: The set of all collections of 2 cars chosen from 10;
n(S) = C(10, 2) = 45
Events
Looking at the last example, suppose that we are interested in the outcomes in which the
factory worker was covered by some form of medical insurance. In mathematical language,
we are interested in the subset consisting of all outcomes in which the worker was covered.
Events
Given a sample space S, an event E is a subset of S. The outcomes in E are called the
favorable outcomes. We say that E occurs!in a particular experiment if the outcome of
that experiment is one of the elements of E; that is, if the outcome of the experiment is
favorable.
Quick Examples
1. Experiment: Roll a die and observe the number facing up.
S = {1, 2, 3, 4, 5, 6}
Event: E: The number observed is odd.
E = {1, 3, 5}
2. Experiment: Roll two distinguishable dice and observe the numbers facing up.
S = {(1,1), (1,2) ... , (6,6)}
Event: F: The dice show the same number.
F = {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}
3. Experiment: Roll two distinguishable dice and observe the numbers facing up.
S = {(1,1), (1,2), ... , (6,6)}
Event: G: The sum of the numbers is 1.
G=Ø There are no favorable outcomes
4. Experiment: Select a city beginning with “J.”
Event: E: The city is Johannesburg.
E = {Johannesburg} An event can consists of a single outcome
5. Experiment: Roll a die and observe the number facing up
Event: E: The number observed is either even or odd
E = S = {1, 2, 3, 4, 5, 6} An event can consist of all possible outcomes
6. Experiment: Select a student in your class.
Event: E: The student has red hair.
E = {red-haired students in your class}
7. Experiment: Draw a hand of two cards from a deck of 52.
Event: H: Both cards are diamonds.
16
H is the set of all hands of 2 cards chosen from 52 such that both cards are
diamonds.
Example 2 You roll a red die and a green die and observe the numbers facing up. Describe
the following events as subsets of the sample space.
(a) E: Both dice show the same number.
(b) F: The sum of the numbers showing is 6.
(c) G: The sum of the numbers showing is 2.
Probability Distribution
(1) A probability distribution is an assignment of a number P(si) to each outcome si in
a sample space {s1, s2, . . . , sn}, so that
(a) 0 ≤ P(si) ≤ 1 and
(b) P(s1) + P(s2) + . . . + P(sn) = 1.
In words, the probability of each outcome must be a number between 0 and 1, and the
probabilities of all the outcomes must add up to 1.
(2) Given a probability distribution, we can obtain the probability of an event E by adding
up the probabilities of the outcomes in E.
Example 3 Weighted Dice!In order to impress you friends with your dice-throwing skills,
you have surreptitiously weighted your die in such a way that 6 is three times as likely to
come up as any one of the other numbers. Find the probability distribution, and use it to
calculate the probability of an even number coming up.
Example 4
A fair die is tossed, and the up face is observed. If it is even, you win $1. Otherwise, you
lose $1. What is the probability that you win. (First obtain the event, then the probability.)
Note
Since the probability of an outcome can be zero, we are also allowing the possibility that
P(E) = 0 for an event E. If P(E) = 0, we call E an impossible event. The event Ø is
always impossible, since something must happen.
Example 5
Your broker recommends four companies. Unbeknownst to you, two of the four happen to
be duds. You invest in two of them. Find the probability that:
17
(a) you have chosen the two losers
(b) you have chosen the two winners
(c) you have chosen one of each
number!of!favorable!outcomes n(E)
P(E) = = .
total!number!of!outcomes n(S)
You walk into an ice cream place and find that you can choose between ice cream, of
which there are 15 flavors, and frozen yogurt, of which there are 5 flavors. How many
different selections can you make? Clearly, you have 15 + 5 = 20 different desserts to
choose from. Mathematically, this is an example of the formula for the cardinality of a
disjoint union: If we let A be the set of ice creams you can choose from, and B the set of
frozen yogurts, then A Ú B = Ø and we want n(A Æ B). But, the formula for the
cardinality of a disjoint union is n(A Æ B) = n(A) + n(B), which gives 15 + 5 = 20 in
this case.
This example illustrates a very useful general principle.
Addition Principle
When choosing among r disjoint alternatives, if
alternative 1 has n1 possible outcomes,
alternative 2 has n2 possible outcomes,
…
alternative r has nr possible outcomes,
then you have a total of n1 + n2 + … + nr possible outcomes.
Quick Example
At a restaurant you can choose among 8 chicken dishes, 10 beef dishes, 4 seafood
dishes, and 12 vegetarian dishes. This gives a total of 8 + 10 + 4 + 12 = 34 different
dishes to choose from.
Here is another simple example. In that ice cream place, not only can you choose
from 15 flavors of ice cream, but you can also choose from 3 different sizes of cone.
How many different ice cream cones can you select from? If we let A again be the set of
ice cream flavors and now let C be the set of cone sizes, we want to pick a flavor and a
size. That is, we want to pick an element of A ¿ C, the Cartesian product. To find the
number of choices we have, we use the formula for the cardinality of a Cartesian
18
product: n(A ¿ C) = n(A)n(C). In this case, we get 15¿3 = 45 different ice cream
cones we can select.
This example illustrates another general principle.
Multiplication Principle
When making a sequence of choices with r steps, if
step 1 has n1 possible outcomes
step 2 has n2 possible outcomes
…
step r has nr possible outcomes
then you have a total of n1 ¿ n2 ¿ … ¿ nr possible outcomes.
Quick Example
At a restaurant you can choose among 5 appetizers, 34 main dishes, and 10 desserts.
This gives a total of 5 ¿ 34 ¿ 10 = 1700 different meals (each including one appetizer,
one main dish, and one dessert) you can choose from.
Things get more interesting when we have to use the addition and multiplication
principles in tandem.
Example 6 Desserts
You walk into an ice cream place and find that you can choose between ice cream, of which
there are 15 flavors, and frozen yogurt, of which there are 5 flavors. In addition, you can
choose among 3 different sizes of cones for your ice cream or 2 different sizes of cups for
your yogurt. How many different desserts can you choose from?
Combinations
Question How many groups of 4 marbles can be selected from a bag containing 12?
Ê 12ˆ 12·11·10·9
Answer Ë 4 ¯ = !4·!3·!2·!1 = 495
Question How many groups of r marbles can be selected from a bag containing n?
Ênˆ n·(n-1)·...·(n-r+1)
Answer Ë r ¯ =
r·(r-1)··!·1
19
Solution
(a) Since the order of the cards doesn’t matter, we simply need to know the number of
ways of choosing a set of 5 cards out of 52, which is
(b) Here is a decision algorithm for choosing a full house with three 10s and two
Queens.
Step 1: Choose three 10s. Since there are four 10s to choose from we have
C(4,!3) = 4 choices.
Step 2: Choose 2 Queens; C(4, 2) = 6 choices.
Thus, there are 4 ¿ 6 = 24 possible full houses with three 10s and two Queens.
Homework
In Exercises 1–3, describe the sample space S of the experiment and list the elements of
the given event. (Assume that the coins are distinguishable and that what is observed are
the faces or numbers that face up.)
1. Two coins are tossed; the result is at most one tail.
2. Two indistinguishable dice are rolled; the numbers add to 5.
3. You are deciding whether to enroll for Psychology 1, Psychology 2, Economics 1,
General Economics, or Math for Poets; you decide to avoid economics.
4. A packet of gummy candy contains 4 strawberry gums, 4 lime gums, 2 black currant
gums, and 2 orange gums. April May sticks her hand in and selects 4 at random.
Complete the following sentences:
(a) The sample space is the set of ...
(b)April is particularly fond of combinations of 2 strawberry and 2 black currant gums.
The event that April will get the combination she desires is the set of ...
6. True or False? Every set S is the sample space for some experiment. Explain.
20
8. The probability of an event E is the number of outcomes in E divided by the total
number of outcomes, right?
9. Motor Vehicle Safety The following table shows crashworthiness ratings for 10
small SUVs.1 (3=Good, 2=Acceptable, 1=Marginal, 0=Poor)
(a) Find the estimated probability distribution for the experiment of choosing a small
SUV at random and determining its frontal crash rating.
(b) What is the estimated probability that a randomly selected small SUV will have a
crash test rating of “Acceptable” or better?
10. It is said that lightning never strikes twice in the same spot. Assuming this to be the
case, what is the estimated probability that lightning will strike your favorite dining spot
during a thunderstorm? Explain.
11. Zip™ Disks Zip™ disks come in two sizes (100MB and 250MB), packaged
singly, in boxes of five, or in boxes of ten. When purchasing singly, you can choose
from five colors; when purchasing in boxes of five or ten you have two choices, black or
an assortment of colors. If you are purchasing Zip disks, how many possibilities do you
have to choose from?
12. Tests A test requires that you answer either Part A or Part B. Part A consists of 8
true-false questions, and Part B consists of 5 multiple-choice questions with 1 correct
answer out of 5. How many different completed answer sheets are possible?
13. Tournaments How many ways are there of filling in the blanks for the following
(fictitious) soccer tournament?
North Carolina
Central Connecticut
Virginia
Syracuse
1 Ratings by the Insurance Institute for Highway Safety. Sources: Oak Ridge National Laboratory: “An
Analysis of the Impact of Sport Utility Vehicles in the United States” Stacy C. Davis, Lorena F. Truett,
(August 2000)/Insurance Institute for Highway Safety
http://www-cta.ornl.gov/Publications/Final SUV report.pdf http://www.highwaysafety.org/vehicle_ratings/
21
14. HTML Colors in HTML (the language in which many web pages are written) can
be represented by 6-digit hexadecimal codes: sequences of six integers ranging from 0
to 15 (represented as 0, ..., 9, A, B, .., F).
(a) How many different colors can be represented?
(b) Some monitors can only display colors encoded with pairs of repeating digits (such
as 44DD88). How many colors can these monitors display?
(c) Grayscale shades are represented by sequences xyxyxy. consisting of a repeated pair
of digits. How many grayscale shades are possible?
(d) The pure colors are pure red: xy0000; pure green: 00xy00; and pure blue: 0000xy.
(xy = FF gives the brightest pure color, while xy = 00 gives the darkest: black). How
many pure colors are possible?
Poker Hands A poker hand consists of five cards from a standard deck of 52. (See the
chart preceding Example 7.) In Exercises 15–18, find the number of different poker
hands of the specified type.
15. Two pairs (two of one denomination, two of another denomination, and one of a
third)
16. Three of a kind (three of one denomination, one of another denomination, and
one of a third)
17. Two of a kind (two of one denomination and three of different denominations)
18. Four of a kind (all four of one denomination and one of another)
Answers
1. S = {HH, HT, TH, TT}; E = {HH, HT, TH}
ÏÔ (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) ¸
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) Ô
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
2. S = Ì (4,1)! (4,2)! (4,3)! (4,4)! (4,5)! (4,6)! ˝ E = {(1, 4), (2, 3), (3, 2), (4, 1)}
ÔÓ (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) Ô˛
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
3. S = {Psychology 1, Psychology 2, Economics 1, General Economics, Math for Poets};
E = {Psychology 1, Psychology 2, Math for Poets} 4. (a) all sets of 4 gummy bears
chosen from the packet of 12. (b) all sets of 4 gummy bears in which two are strawberry
and two are blackcurrant.
5.!Subset of the sample space 6.!True; Consider the following experiment: Select an element
of the set S at random. 7.!False; for instance, consider the following experiment: Flip a coin
until you get heads, and observe the number of times you flipped the coin.
8.!Only when all the outcomes are equally likely.
9. (a) Test Rating 3 2 1 0
Probability 0.1 0.4 0.4 0.1
(b) 0.5
10.!Zero; according to the assumption, no matter how many thunderstorms occur, lightning
cannot only strike your favorite spot more than once, and so, after n trials the estimated
probability will never exceed 1/n, and so will approach zero as the number of trials gets
large. 11. (2¿5) + (2¿2¿2) = 18 12. 28 + 55 = 3,381 13. 4 14. (a) 166 =
22
3 2 2
16,777,216 (b) 1 6 = 4096 (c) 1 6 = 256 (d) 3¿16 - 2 = 766
15.!C(13,2)C(4,2)C(4,2)¿44 = 123,552 16. 13¿4¿C(12,2)¿4¿4 = 54,912
17.!13¿C(4,2)C(12,3)¿4¿4¿4 = 1,098,240 18. 13¿48 = 624
Topic 5
Unions, Intersections, and Complements
(Based on 4.3 in book)
Events may often be described in terms of other events, using set operations. An example is
the negation of an event E, the event that E does not occur. If in a particular experiment E
does not occur, then the outcome of that experiment is not in E, so is in its complement!(in
S). It is called Ec and its probability is given by
P(Ec ) = 1 - P(E).
Example 1 You roll a red die and a green die and observe the two numbers facing up.
Describe the event that the sum of the numbers is not 6. What is its probability?
Question If E and F are events, how can we describe the event EÆF?
Answer Consider a simple example: the experiment of throwing a die. Let E be the event
that the outcome is a 5, and let F be the event that the outcome is an even number. Thus,
E = {5}, F = {2, 4, 6}.
So, EÆF = {5, 2, 4, 6}.
In other words, EÆF is the event that the outcome is either a 5 or an even number. In
general we can say the following.
Question If E and F are events, how can we describe the event EÚF?
Answer in class
Example 2 The following table shows sales of recreational boats in the U.S. during the
period 1999–2001.2
Consider the experiment in which a recreational boat is selected at random from those in the
table. Let E be the event that the boat was a motor boat, let F be the event that the boat was
purchased in 2001, and let G be the event that the boat was a sailboat. Find the probabilities
of the following events:
2 Figures are approximate, and represent new recreational boats sold. ("Jet skis" includes similar vehicles,
such as "wave runners".) Source: National Marine Manufacturers Association/New York Times, January 10.
2002, p. C1.
23
(a) E (b) F (c) EÚF (d) G' (e)!EÆF'.
If A and B are events, then A and B are said to be disjoint or mutually exclusive if AÚB is
empty.
Example 3 A coin is tossed three times and the sequence of heads and tails is recorded.
Decide whether the following pairs of events are mutually exclusive.
(a) A: the first toss shows a head, B: the second toss shows a tail.
(b) A: all three tosses land the same way up, B: one toss shows heads and the other two
show tails.
Complement of an Event
The complement Ec of an event E is the event that E does not occur.
P(Ec ) = 1 - P(E).
Union of Events
If E and F are events, then EÆF is the event that either E occurs or F occurs (or both).
P(EÆF) = P(E) + P(F) - P(EÚF) (if not mutually exclusive)
P(EÆF) = P(E) + P(F) (if mutually exclusive)
Intersection of Events
If E and F are events, then EÚF is the event that both E and F occur.
P(EÚF) = P(E)P(F) (if independent)
Example 4 Astrology The astrology software package Turbo Kismet works by first
generating random number sequences, and then interpreting them numerologically. When I
ran it yesterday, it informed me that there was a 1 /3 probability that I would meet a tall dark
stranger this month, a 2 /3 probability that I would travel within the next month, and a 1 /6
probability that I would meet a tall dark stranger on my travels this month. What is the
probability that I will either meet a tall dark stranger or that I will travel this month?
Example 5 Salaries Your company's statistics show that 30% of your employees earn
between $20,000 and $39,999, while 20% earn between $30,000 and $59,999. Given that
40% of the employees earn between $20,000 and $59,999,
(a) what percentage earn between $30,000 and $39,999?
(b) what percentage earn between $20,000 and $29,999?
Homework
Supposetwo dice(one red,one green)are rolled.Considerthe followingevents: A: t h e
red die shows 1; B: the numbers add to 4; C: at least one of the numbers is 1; and D:
the numbers do not add to 11. In Exercises 1–4, express the given event in symbols and
say how many elements it contains.
1. The red die shows 1 and the numbers add to 4.
2. The numbers do not add to 4 but they do add to 11.
3. Either the numbers add to 11 or the red die shows a 1.
4. At least one of the numbers is 1 or the numbers add to 4.
24
Let W be the event that you will use the web site tonight, let I be the event that your math
grade will improve, and let E be the event that you will use the web site every night. In
Exercises 5–8, express the given event in symbols.
5. You will use the web site tonight and your math grade will improve.
6. Either you will use the web site every night, or your math grade will not improve.
7. Your math grade will not improve even though you use the web site every night.
8. You will either use the web site tonight with no grade improvement, or every night
with grade improvement.
9. Complete the following. Two events E and F are mutually exclusive if their
intersection is _____.
10. If E and F are events, then (EÚF)' is the event that ____ .
Publishing Exercises 11–15 are based on the following table, which shows the results
of a survey of 100 authors by a publishing company.
16. Steroids Testing A pharmaceutical company is running trials on a new test for anabolic
steroids. The company uses the test on 400 athletes known to be using steroids and 200
athletes known not to be using steroids. Of those using steroids, the new test is positive for
390 and negative for 10. Of those not using steroids, the test is positive for 10 and negative
for 190. What is the estimated probability of a false negative result (the probability that an
athlete using steroids will test negative)? What is the estimated probability of a false
positive result (the probability that an athlete not using steroids will test positive)?
17. Tony has had a “losing streak” at the casino—the chances of winning the game he is
playing are 40%, but he has lost 5 times in a row. Tony argues that, since he should have
won 2 times, the game must obviously be “rigged.” Comment on his reasoning.
18. Computer Sales In 1999 (one year after the iMac was first launched by Apple), a retail
or mail-order purchase of a personal computer was approximately 7 times as likely to be a
25
non-Apple PC as an Apple PC.3 What is the probability that a randomly chosen personal
computer purchase was an Apple?
In Exercises 19–26, use the given information to find the indicated probability.
19. P(A) = 0.1, P(B) = 0.6, P(AÚB) = 0.05. Find P(AÆB).
20. AÚB = Ø, P(A) = 0.3, P(AÆB) = 0.4. Find P(B).
21. AÚB = Ø, P(A) = 0.3, P(B) = 0.4. Find P(AÆB).
22. P(AÆB) = 0.9, P(B) = 0.6, P(AÚB) = 0.1. Find P(A).
23. P(A) = 0.22. Find P(A').
24. A, B and C are mutually exclusive. P(A) = 0.2, P(B) = 0.6, P(C) = 0.1. Find
P(AÆBÆC).
25. A and B are mutually exclusive. P(A) = 0.4, P(B) = 0.4. Find P((AÆB)').
26. P(AÆB) = 0.3 and P(AÚB) = 0.1. Find P(A) + P(B).
31. Holiday Shopping In 1999, the probability that a consumer would shop for
holiday gifts at a discount department store was .80, and the probability that a consumer
would shop for holiday gifts from catalogs was .42.4 Assuming that 90% of consumers
shopped from one or the other, what percentage of them did both?
32. Online Households In 2001, 6.1% of all U.S. households were connected to the
Internet via cable, while 2.7% of them were connected to the internet through DSL.
What percentage of U.S. households did not have high-speed (cable or DSL)
connection to the Internet? (Assume that the percentage of households with both cable
and DSL access is negligible.)
33.!Fast-Food Stores In 2000 the top 100 chain restaurants in the U.S. owned a total
of approximately 130,000 outlets. Of these, the three largest (in numbers of outlets)
were McDonalds, Subway, and Burger King, owning between them 26% of all of the
outlets.5 The two hamburger companies, McDonalds and Burger King, together owned
approximately 16% of all outlets, while the two largest, McDonalds and Subway,
3 Figure is approximate. Source: PC Data/The New York Times, April 26. 1999, p. C1.
4 Sources: Commerce Department, Deloitte & Touche Survey/The New York Times, November 24, 1999,
p. C1.
5 Source: Technomic 2001 Top 100 Report, Technomic, Inc. Information obtained from their web site,
www.technomic.com.
26
together owned 19% of the outlets. What was the probability that a randomly chosen
restaurant was a McDonalds?
34. Auto Sales in 1999, automobile sales in Europe equaled combined sales in NAFTA
(North American Free Trade Agreement) countries and Asia. Further, sales in Europe
were 70% more than sales in NAFTA countries.6
(a) Write down the associated probability distribution.
(b) A total of 34 million automobiles were sold in these three regions. How many were
sold in Europe?
Answers:
1. AÚB; n(AÚB) = 1 2.!B'ÚD'; n(B'ÚD') = 2 3.!D'ÆA n(D'ÆA) = 8
4.!CÆB; n(CÆB) = 12 5. WÚI 6.!EÆI' 7. I'ÚE 8.!(WÚI') Æ (EÚI) 9.!Empty
10.!E and F do not both occur. 11.!0.25 12.!0.2 13.!0.7 14.!11/14 15.!!0.75
16.!P(false negative) = 10/400 = 0.025, P(false positive) = 10/200 = 0.05 17.!He
is wrong. It is possible to have a run of losses of any length.. Tony may have
grounds to suspect that the game is rigged, but no proof. 18.!0.125 19.!0.65
20.!0.1 21.!0.7 22.!0.4 23.!0.78 24.!0.9 25.!0.2 26.!0.4 27.!Yes 28.!No;
P(AÆB) should be ≤ P(A)+P(B). 29.!No; P(AÚB) should be ≤ P(A) 30. Yes
31.!32% 32.!91.2% 33.!.09
34. (a) Outcome NATFA Asia Europe
Probability 5/17 7/34 1/2
(b) 17 million.
27
Topic 6
Conditional Probability & Independent Events
(Section 4.4 in the book)
Q Who cares about conditional probability? What is its relevance in the business world?
A Let's consider the following scenario: Cyber Video Games, Inc., has been running a
television ad for its latest game, “Ultimate Hockey.” As Cyber Video's director of
marketing, you would like to assess the ad’s effectiveness, so you ask your market research
team to make a survey of video game players. The results of their survey of 50,000 video
game players are summarized in the following chart.
The market research team concludes in their report that the ad campaign is highly effective.
Question But wait! How could the campaign possibly have been effective? only 1,200
people who saw the ad purchased the game, while 2,000 people purchased the game without
seeing the ad! It looks as though potential customers are being put off by the ad.
Answer!Let us analyze these figures a little more carefully. First, we can look at the event E
that a randomly chosen video game player purchased Ultimate Hockey. In the “Purchased
Game” row we see that a total of 3,200 people purchased the game. Thus, the experimental
probability of E is
fr(E) 3,200
P(E) = = = 0.064.
N 50,000
To test the effectiveness of the television ad, let's compare this figure with the experimental
probability that a video game player who saw the ad purchased Ultimate Hockey. This
means that we restrict attention to the “Saw Ad” column. This is the fraction
Number!of!people!who!saw!the!ad!and!purchased!the!game 1,200
Total!number!of!people!who!saw!the!ad = = 0.24.
5,000
In other words, 24% of those surveyed who saw the ad bought Ultimate Hockey, while
overall, only 6.4% of those surveyed bought it. Thus, it appears that the ad campaign was
highly successful.
Let us first introduce some terminology. In this example there were two related events of
importance,
E, the event that a video game player purchased Ultimate Hockey, and
F, the event that a video game player saw the ad.
28
The two probabilities we compared were the experimental probability P(E) and the
experimental probability that a video game player purchased Ultimate Hockey given that he
or she saw the ad. We call the latter probability the (experimental) probability of E, given
F, and we write it as P(E|F). We call P(E|F) a conditional probability—it is the
probability of E under the condition that F occurred.
Number!of!people!who!saw!the!ad!and!bought!the!game
P(E|F) = Total!number!of!people!who!saw!the!ad
Number!of!favorable!outcomes!in!F
= Total!number!of!outcomes!in!F !.
The numerator is the frequency of EÚF, while the denominator is the frequency of F. Thus,
we can say the following.
Conditional Probability
If E and F are events, then
fr(EÚF)
P(E|F) =
fr(F)
Example (Based on p. 146, Example 3.15 of Statistics for Business and Economics 8th
Ed by McClave, Benson, and Sicich, Prentice Hall, 2001) A manufacturer of an electric
kitchen utensil conducted a survey of consumer complaints. The results are summarized in
the following table:
(a) Calculate the probability that a customer complains about appearance (dents, scratches,
etc.) given that the complaint occurred during the guarantee time.
(b) Calculate the probability that a customer complains about appearance.
29
Independence
We saw that the formula
P(EÚF)
P(E|F) =
P(F)
could be used to calculate P(EÚF) if we rewrite the formula in the following form, known as
the multiplication principle.
Multiplication Principle
If E and F are events, then
P(EÚF) = P(F)P(E|F).
Example 4 An experiment consists of tossing two coins. The first coin is fair, while the
second coin is twice as likely to land with heads facing up as it is with tails facing up. Draw
a tree diagram to illustrate all the possible outcomes, and use the multiplication principle to
compute the probabilities of all the outcomes.
Let us go back to Cyber Video Games, Inc., and their ad campaign. We would like to assess
the ad's effectiveness. As before, we consider
E, the event that a video game player purchased Ultimate Hockey, and
F, the event that a video game player saw the ad.
P(E), the probability that a video game player purchased Ultimate Hockey, and
P(E|F), the probability that a video game player who saw the ad purchased
Ultimate Hockey.
When these probabilities are compared, one of three things can happen.
Case 1 P(E|F) > P(E): This is what the survey data actually showed: a video game player
was more likely to purchase Ultimate Hockey if he or she saw the ad. This indicates that the
ad is effective—seeing the ad had a positive effect on a player’s decision to purchase the
game.
Case 2 P(E|F) < P(E): If this happens, then a video game owner is less likely to purchase
Ultimate Hockey if he or she saw the ad. This would indicate that the ad has “backfired:” it
has, for some reason, put potential customers off. In this case, just as in the first case, the
event F has an effect—a negative one—on the event E.
Case 3 P(E|F) = P(E): In this case seeing the ad had absolutely no effect on a potential
customer's buying Ultimate Hockey. Put another way, the event F had no effect at all on the
event E. We would say that the events E and F are independent.
30
In general, we say that two events E and F are independent if P(E|F) = P(E). When this
happens, we have
P(EÚF)
P(E) = P(E|F) = ,
P(F)
so P(EÚF) = P(E)P(F).
Conversely, if P(EÚF) = P(E)P(F), then, assuming P(F) ≠ 0,† P(E) = P(EÚF)/P(F) =
P(E|F).
Independent Events
The events E and F are independent if
P(E|F) = P(E)
or, equivalently,
P(EÚF) = P(E)P(F)
If two events E and F are not independent, then they are dependent.
Notes!
(1) The formula P(EÚF) = P(E)P(F) also says that P(F|E) = P(F). Thus, if F has no effect
on E, then likewise E has no effect on F.
(2) Sometimes it is obviously the case that two events, by their nature, are independent. For
example, the event that a die you roll comes up 1 is clearly independent of whether or not a
coin you toss comes up heads. In some cases, though, we need to check for independence
by comparing P(EÚF) to P(E)P(F). If they are equal then E and F are independent, but if
they are unequal then E and F are dependent.
Homework
p. 158 #30, 32, 34, 38
Also:
Publishing Exercises 1–6 are based on the following table, which shows the results of a
survey of 100 authors by a publishing company.
† We shall only discuss the independence of two events in cases where their probabilities are both non-zero.
31
1. That an author is established, given that she is successful
2. That an author is successful, given that he is established
3. That an author is unsuccessful, given that she is established
4. That an author is established, given that he is unsuccessful
5. That an unsuccessful author is established
6. That an established author is successful
[Answers: 1. 5/6 2. 5/16 3. 11/16 4. 11/14 5. 11/14 6. 5/16 ]
32
Topic 7
Discrete Random Variables & Their Probability Distributions
In many experiments, the outcomes can be assigned numerical values. For instance, if you
roll a die, then each outcome has the numerical values 1 through 6. If you select a lawyer
and ascertain her annual income, then the outcome is again a number. We call a rule that
assigns a numerical value to each outcome of an experiment a random variable.
A random variable X is a rule that assigns a numerical value to each outcome in the
sample space of an experiment.
A random variable may have only finitely many values, such as the outcome of a roll of a
die. Or, its possible values may be infinite but discrete, such as the number of times it takes
you to roll a 6 if you keep rolling until you get one. Or, the variable may be continuous, as
we shall see in the last section of this chapter.
Examples 1
(A) (discrete finite) Let X be the number of heads that comes up when a coin is tossed three
times. List the value of X for each possible outcome. What are the possible values of X?
(B) (discrete, infinite) Book, p. 163) The EPA inspects a factory's pesticide discharge in to
a lake once a month by measuring the amount of pesticide in a sample of lake water. If it
exceeds the legal maximum, the company is held in violation and fined. Let X be the number
of months since the last violation. Also, let Y be the amount of pesticide found in a sample
of lake water.
(C) (discrete finite) You have purchased $10,000 worth of stock in a biotech company
whose newest arthritis drug is awaiting approval by the FDA If the drug is approved this
month, the value of the stock will double by the end of the month. If the drug is rejected this
month, the stock’s value will decline by 80%, and if no decision is reached this month, its
value will decline by 10%. Let X be the value of your investment at the end of this month.
List the value of X for each possible outcome.
(D) (discrete finite) Survey a group of 50 high school graduates for their SAT scores and
let X be the score obtained. When we are given a collection of values of a random variable X
we refer to the values as X-scores. We also call such data raw data, as these are the
original values on which we often perform statistical analysis. One important purpose of
statistics is to interpret the raw data from the sample to get information about the entire
population.
(E) Sampling (continuous) Survey a group of 50 high school graduates for their SAT
scores. Let X— be the mean score of the sample of 50; let Y be the median. We call X— and Y
statistics of the raw scores.
33
Example 2 Let X being the number of heads that come up when a coin is tossed three
times—we obtain
the event that X = 0 is!{TTT} P(x=0) = 1/8 = 0.125
the event that X = 1 is {HTT, THT, TTH} P(x=1) = 3/8 = 0.375
the event that X = 2 is {HHT, HTH, THH} P(x=2) = 3/8 = 0.375
the event that X = 3 is {HHH} P(x=3) = 1/8 = 0.125
the event that X = 4 is Ø P(x=4) = 0
Representing the Probability Distribution The most common way to represent the
distribution is via a histogram, such as the following for the above example.
Probability Distribution
0.4
0.3
0.2
0.1
0
0 1 2 3
X
34
Probability Distribution Histogram
probability
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
25000 35000 45000 55000 65000 75000 85000 X
We know what the mean of a bunch of x-scores means (and also the median, standard
deviation, etc.). If we think of the x-scores as the values of a random variable X, we can also
obtain the mean value of X. There are two approaches to measuring this mean:
Method 1 (As before: using the raw x-scores) Measure X a large number of times and take
the mean of your set of measurements. For example, look at the lawyer salary example:
Method 2 (Using the probability distribution) Since we have multiplied each x-value by its
frequency and then divided by the total number, we might as well have just multiplied each
value of x by its probability, and then added. This would result in the same answer:
35
Expected Value, etc. of a Random Variable
If X is a finite random variable taking on values x1, x2, . . ., xn, the expected value of X,
written µ or E(X), is
ß= ß2 .
Note We can use Chebyshev's Rule and the Empirical Rule to make inferences about the
values of X.
Example In class, we expand the above table to compute ß for the lawyers, and answer the
following question: Using Chebyshev, complete the statement: at most 12.5% of lawyers
earn less than _____.
Homework
p. 179 #2, 3
p. 182 # 7, 8
p. 186 # 16, 18, 24
Also:
www.FiniteMath.com Æ Student Web Site Æ Chapter Review Exercises Æ Statistics
# 5, 6, 7
36
Topic 8
Binomial Random Variable
A Bernoulli7 trial is an experiment with two possible outcomes, called success and
failure. If the probability of success is p then the probability of failure is q = 1 - p.
Tossing a coin three times is an example of a sequence of independent Bernoulli
trials: a sequence of Bernoulli trials in which the outcomes in any one trial are
independent (in the sense of the preceding chapter) of those in any other trial.
A binomial random variable is one that counts the number of successes in a
sequence of independent Bernoulli trials.
Suppose that we have a possibly unfair coin, whose probability of heads is p and whose
probability of tails is q = 1-p.
(a) Let X be the number of heads you get in a sequence of 5 tosses. Find P(X = 2).
(b) Let X be the number of heads you get in a sequence of n tosses. Find P(X = x).
Solution
(a) We are looking for the probability of getting exactly 2 heads in a sequence of 5
tosses. Let’s start with a simpler question.
Question What is the probability that we will get the sequence HHTTT?
Answer The probability that the first toss will come up heads is p.
The probability that the second toss will come up heads is also p.
The probability that the third toss will come up tails is q.
The probability that the fourth toss will come up tails is q.
The probability that the fifth toss will come up tails is q.
37
The probability that the first toss will be heads and the second will be heads and the
third will be tails and the fourth will be tails and the fifth will be tails equals the
probability of the intersection of these five events. Since these are independent events,
the probability of the intersection is the product of the probabilities, which is
p¿p¿q¿q¿q = p2q3.
Now HHTTT is only one of several outcomes with two heads and three tails. Two
others are HTHTT and TTTHH.
Each of these 10 outcomes has the same probability: p2q3 (why?). Thus, the probability
of getting one of these 10 outcomes is the probability of the union of all these (mutually
exclusive) events, and we saw in the preceding chapter that this is just the sum of the
probabilities. In other words, the probability we are after is
Notice that we can replace C(5,2) (where 2 is the number of heads), by C(5,3) (where 3
is the number of tails), since C(5,2) = C(5,3).
(b) There is nothing special about 2 in part (a). To get P(X = x) rather than P(X = 2),
replace 2 with x:
P(X=x) = C(5,x)px q5-x.
Again, there is nothing special about 5. The general formula for n tosses is
38
Probability Distribution of Binomial Random Variable
If X is the number of successes in a sequence of n independent Bernoulli trials, then
Quick Example
If you roll a fair die 5 times, the probability of throwing exactly 2 sixes is
2 3
P(X = 2) = C(5,2)ÊË 16 ˆ¯ ÊË 56 ˆ¯ = 10¿ 1 ¿125 ‡ 0.1608.
36 216
Here, we used n = 5 and p = 1/6, the probability of rolling a six on one roll of the die.
Examples 2
(Example 4.7 (b)) 100 customers must select a preference among three sodas: your
company's new Hyper Cola and the two competitors (you know what they are...).
Success, of course, means selecting Hyper Cola. Is this binomial?
(Example 4.7 (a)) You select 3 bonds from 10 recommended ones. Unbeknownst to
you, 8 of them will go up, and three are stones. x is the number of winners you select. Is
this binomial?
(An Extra One) You select 3 bonds from a large number of recommended ones.
Unbeknownst to you, 80% of them will go up, and 30% are stones. x is the number of
winners you select. Is this binomial?
Solution
(a) The experiment is a sequence of Bernoulli trials; in each trial we select a person and
ascertain his age. If we take “success” to mean selection of a person 65 or older, the
probability distribution is
10 Source: Carnegie Center, Moscow/The New York Times, March 15, 1998, p. 10.
39
P(X = x) = C(n,x)px qn-x,
(b) We have already computed P(X = 4). Here are all the calculations.
P(X = 0) = C(6,0)(0.2)0(0.8)6
= 1¿1¿0.262144 = 0.262144
P(X = 1) = C(6,1)(0.2)1(0.8)5
= 6¿0.2¿0.32768 = 0.393216
P(X = 2) = C(6,2)(0.2)2(0.8)4
= 15¿0.04¿0.4096 = 0.24576
P(X = 3) = C(6,3)(0.2)3(0.8)3
= 20¿.008¿0.512 = 0.08192
P(X = 4) = C(6,4)(0.2)4(0.8)2
= 15¿0.0016¿0.64 = 0.01536
P(X = 5) = C(6,5)(0.2)5(0.8)1
= 6¿0.00032¿0.8 = 0.001536
P(X = 6) = C(6,6)(0.2)6(0.8)0
= 1¿0.000064¿1 = 0.000064
x 0 1 2 3 4 5 6
P(X=x) 0.262144 0.393216 0.24576 0.08192 0.01536 0.001536 0.000064
Figure 1
40
(c) P(X ≤ 2), the probability that the number of people selected who are at least 65
years old is either 0, 1, or 2, is the union of these events, and is thus the sum of the three
probabilities,
but it is far easier to compute the probability of the complement of the event,
A B C D E F G
1 0 1 2 3 4 5 6
2 =BINOMDIST(A1,6,0.2,0) fi fi fi fi fi fi
The values of X are shown in Row 1, and the probabilities are computed in Row 2. The
arguments of the BINOMDIST function are as follows:
Setting the last coordinate to 0 (as shown) gives P(X = x). Setting it to 1 gives P(X ≤ x).
Web Site
Follow the path
Web site Æ Everything for Finite Math Æ Chapter 8
Æ Binomial Distribution Utility,
where you can obtain the distribution and also graph the histogram.
Question OK Now, what are the mean and standard deviation of the binomial distribution?
41
Answer in the box
Mean = µ = np
Variance = ß2 = npq
St. Deviation = ß = npq
42
Homework
p. 197 #25, 30, 32, 34
Also
www.FiniteMath.com Æ Student Web Site Æ Chapter Review Exercises Æ Statistics
#1
43
Topic 9
The Poisson and Hypergeometric Random Variables
(Sections 5.5 & 5.6 in book)
(1) The expected number of successes per unit time does not depend on the time interval.
(2) The event of success in any one interval of time is independent of that in any other
interval.
For example, X could be the number of people arriving at a store in a fixed period of time
over the lunch-hour, or the number of leaks in 200 miles of pipeline, or the number of cars
arriving at a carwash in a given hour. If X is Poisson, we compute P(X = x) as follows:
-¬ x
e ¬
P(X = x) =
x!
where ¬ is the expected number of successes for the time interval we are interested in
Example In a bank, people arrive per minute on average. Find the probability that, in a
given minute, exactly 2 people will arrive. Also generate the entire probability distribution
for X.
Solution
¬ = 3 (given). Thus,
-3 2
e 3
P(X = 2) = ‡ 0.2240
2!
To get the entire table we use Excel and obtain, using the formula
=EXP(-3)*3^x,FACT(X)
x 0 1 2 3 4 5 6 7 8 9 10 11
P ( X = x ) 0.0498 0.1494 0.224 0.224 0.168 0.1008 0.0504 0.0216 0.0081 0.0027 0.0008 0.0002
44
For example, we know that 30 of the 100 workers at the Petit Mall visit your diner for
lunch. You choose 10 workers at random; X is the number of workers who visit your diner.
(Note that the problem is becomes the same as the binomial distribution for a large
propulation, where we can ignore the issue of replacement). If X is hypergeometric, we
compute P(X = x) as follows:
,then
Ê rˆ Ê N–rˆ
Ë x¯ Ë n–x ¯
P(X = x) = Ê Nˆ
Ë n¯
Example The Gods of Chaos have promised you that you will win on exactly 40 of the
next 100 bets at the Happy Hour Casino. However, your luck has not been too good up to
this point: you have bet 50 times and have lost 46 times. What are your chances of winning
both of the next two bets?
Solution Here N = number of bets left = 100–50 = 50, n = number of trials = 2 and r =
number of successes possible = 40–4 = 36 (you have used up 4 of your guaranteed 40
wins). So we can now compute P(X = 2) using the formula.
Homework
p. 201 #40, 42
p. 204, #48, 50
45
Topic 10
Continuous Random Variables: Uniform and Normal
(Based on Sections 6.1-6.2 in the book)
When a random variable is continuous, we use the following to describe the associated
probabilities. Note that, in this case, P(X = x) = 0. So instead, we will look at probabilities
in a range: P(a < X < b).
An Example:
The Uniform Distribution The uniform density function on the interval [c,d] is
given by
1
f(x) = .
d-c
Its graph is a horizontal line (see the figure)
46
Example 1 Spinning a Dial!Suppose that you spin the dial shown below so that it comes to
rest at a random position. Model this with a suitable distribution, and use it to find the
probability that the dial will land somewhere between 5˚ and 300˚.
0
270 90
180
µ = Mean
ß = standard deviation
The standard normal distribution has µ = 0 and ß = 1. We use Z rather than X to refer
to the associated random variable.
47
Negative z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 0 . 0 0.50000 0.49601 0.49202 0.48803 0.48405 0.48006 0.47608 0.47210 0.46812 0.46414
- 0 . 1 0.46017 0.45620 0.45224 0.44828 0.44433 0.44038 0.43644 0.43251 0.42858 0.42465
- 0 . 2 0.42074 0.41683 0.41294 0.40905 0.40517 0.40129 0.39743 0.39358 0.38974 0.38591
- 0 . 3 0.38209 0.37828 0.37448 0.37070 0.36693 0.36317 0.35942 0.35569 0.35197 0.34827
- 0 . 4 0.34458 0.34090 0.33724 0.33360 0.32997 0.32636 0.32276 0.31918 0.31561 0.31207
- 0 . 5 0.30854 0.30503 0.30153 0.29806 0.29460 0.29116 0.28774 0.28434 0.28096 0.27760
- 0 . 6 0.27425 0.27093 0.26763 0.26435 0.26109 0.25785 0.25463 0.25143 0.24825 0.24510
- 0 . 7 0.24196 0.23885 0.23576 0.23270 0.22965 0.22663 0.22363 0.22065 0.21770 0.21476
- 0 . 8 0.21186 0.20897 0.20611 0.20327 0.20045 0.19766 0.19489 0.19215 0.18943 0.18673
- 0 . 9 0.18406 0.18141 0.17879 0.17619 0.17361 0.17106 0.16853 0.16602 0.16354 0.16109
- 1 0.15866 0.15625 0.15386 0.15151 0.14917 0.14686 0.14457 0.14231 0.14007 0.13786
- 1 . 1 0.13567 0.13350 0.13136 0.12924 0.12714 0.12507 0.12302 0.12100 0.11900 0.11702
- 1 . 2 0.11507 0.11314 0.11123 0.10935 0.10749 0.10565 0.10383 0.10204 0.10027 0.09853
- 1 . 3 0.09680 0.09510 0.09342 0.09176 0.09012 0.08851 0.08692 0.08534 0.08379 0.08226
- 1 . 4 0.08076 0.07927 0.07780 0.07636 0.07493 0.07353 0.07215 0.07078 0.06944 0.06811
- 1 . 5 0.06681 0.06552 0.06426 0.06301 0.06178 0.06057 0.05938 0.05821 0.05705 0.05592
- 1 . 6 0.05480 0.05370 0.05262 0.05155 0.05050 0.04947 0.04846 0.04746 0.04648 0.04551
- 1 . 7 0.04457 0.04363 0.04272 0.04182 0.04093 0.04006 0.03920 0.03836 0.03754 0.03673
- 1 . 8 0.03593 0.03515 0.03438 0.03362 0.03288 0.03216 0.03144 0.03074 0.03005 0.02938
- 1 . 9 0.02872 0.02807 0.02743 0.02680 0.02619 0.02559 0.02500 0.02442 0.02385 0.02330
- 2 0.02275 0.02222 0.02169 0.02118 0.02068 0.02018 0.01970 0.01923 0.01876 0.01831
- 2 . 1 0.01786 0.01743 0.01700 0.01659 0.01618 0.01578 0.01539 0.01500 0.01463 0.01426
- 2 . 2 0.01390 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.01160 0.01130 0.01101
- 2 . 3 0.01072 0.01044 0.01017 0.00990 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842
- 2 . 4 0.00820 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639
- 2 . 5 0.00621 0.00604 0.00587 0.00570 0.00554 0.00539 0.00523 0.00508 0.00494 0.00480
- 2 . 6 0.00466 0.00453 0.00440 0.00427 0.00415 0.00402 0.00391 0.00379 0.00368 0.00357
- 2 . 7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.00280 0.00272 0.00264
- 2 . 8 0.00256 0.00248 0.00240 0.00233 0.00226 0.00219 0.00212 0.00205 0.00199 0.00193
- 2 . 9 0.00187 0.00181 0.00175 0.00169 0.00164 0.00159 0.00154 0.00149 0.00144 0.00139
- 3 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00104 0.00100
- 3 . 1 0.00097 0.00094 0.00090 0.00087 0.00084 0.00082 0.00079 0.00076 0.00074 0.00071
- 3 . 2 0.00069 0.00066 0.00064 0.00062 0.00060 0.00058 0.00056 0.00054 0.00052 0.00050
- 3 . 3 0.00048 0.00047 0.00045 0.00043 0.00042 0.00040 0.00039 0.00038 0.00036 0.00035
- 3 . 4 0.00034 0.00032 0.00031 0.00030 0.00029 0.00028 0.00027 0.00026 0.00025 0.00024
- 3 . 5 0.00023 0.00022 0.00022 0.00021 0.00020 0.00019 0.00019 0.00018 0.00017 0.00017
- 3 . 6 0.00016 0.00015 0.00015 0.00014 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011
- 3 . 7 0.00011 0.00010 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00008 0.00008
- 3 . 8 0.00007 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00005 0.00005 0.00005
- 3 . 9 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004 0.00004 0.00004 0.00003 0.00003
48
Positive z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 . 0 0.50000 0.50399 0.50798 0.51197 0.51595 0.51994 0.52392 0.52790 0.53188 0.53586
0 . 1 0.53983 0.54380 0.54776 0.55172 0.55567 0.55962 0.56356 0.56749 0.57142 0.57535
0 . 2 0.57926 0.58317 0.58706 0.59095 0.59483 0.59871 0.60257 0.60642 0.61026 0.61409
0 . 3 0.61791 0.62172 0.62552 0.62930 0.63307 0.63683 0.64058 0.64431 0.64803 0.65173
0 . 4 0.65542 0.65910 0.66276 0.66640 0.67003 0.67364 0.67724 0.68082 0.68439 0.68793
0 . 5 0.69146 0.69497 0.69847 0.70194 0.70540 0.70884 0.71226 0.71566 0.71904 0.72240
0 . 6 0.72575 0.72907 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75175 0.75490
0 . 7 0.75804 0.76115 0.76424 0.76730 0.77035 0.77337 0.77637 0.77935 0.78230 0.78524
0 . 8 0.78814 0.79103 0.79389 0.79673 0.79955 0.80234 0.80511 0.80785 0.81057 0.81327
0 . 9 0.81594 0.81859 0.82121 0.82381 0.82639 0.82894 0.83147 0.83398 0.83646 0.83891
1 . 0 0.84134 0.84375 0.84614 0.84849 0.85083 0.85314 0.85543 0.85769 0.85993 0.86214
1 . 1 0.86433 0.86650 0.86864 0.87076 0.87286 0.87493 0.87698 0.87900 0.88100 0.88298
1 . 2 0.88493 0.88686 0.88877 0.89065 0.89251 0.89435 0.89617 0.89796 0.89973 0.90147
1 . 3 0.90320 0.90490 0.90658 0.90824 0.90988 0.91149 0.91308 0.91466 0.91621 0.91774
1 . 4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1 . 5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.94408
1 . 6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1 . 7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1 . 8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96856 0.96926 0.96995 0.97062
1 . 9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2 . 0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169
2 . 1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2 . 2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899
2 . 3 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158
2 . 4 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361
2 . 5 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520
2 . 6 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643
2 . 7 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736
2 . 8 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807
2 . 9 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861
3 . 0 0.99865 0.99869 0.99874 0.99878 0.99882 0.99886 0.99889 0.99893 0.99896 0.99900
3 . 1 0.99903 0.99906 0.99910 0.99913 0.99916 0.99918 0.99921 0.99924 0.99926 0.99929
3 . 2 0.99931 0.99934 0.99936 0.99938 0.99940 0.99942 0.99944 0.99946 0.99948 0.99950
3 . 3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3 . 4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3 . 5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983
3 . 6 0.99984 0.99985 0.99985 0.99986 0.99986 0.99987 0.99987 0.99988 0.99988 0.99989
3 . 7 0.99989 0.99990 0.99990 0.99990 0.99991 0.99991 0.99992 0.99992 0.99992 0.99992
3 . 8 0.99993 0.99993 0.99993 0.99994 0.99994 0.99994 0.99994 0.99995 0.99995 0.99995
3 . 9 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996 0.99996 0.99997 0.99997
49
Example 2 Calculate the following probabilities using the table.
(a) P(0 < Z < 1.34)
(b) P(-1.34 < Z < 1.34)
(c) P(-1.23 < Z < 0.44)
(d) P(Z > 0.22)
(e) P(Z < 0.32)
(f) P(|Z| > 1.96)
x-µ
z=
ß
Note The following calculations are true for any normal random variable, are very useful to
remember:
P(µ-ß ≤!X ≤!µ+ß) ‡ 0.68268
P(µ-2ß ≤!X ≤!µ+2ß) ‡ 0.95450
P(µ-3ß ≤!X ≤!µ+3ß) ‡ 0.99730
Now you see where the empirical rule comes from!
50
(a) The claim is correct, and you were in the unlucky 1% group that gets the duds
(b) The claim is wrong—perhaps the standard deviation should be bigger...
Example 5 (based on Example 5.10 in the book) Daily paint production at a manufacturing
plant has a mean of 100,000 gals. with a standard deviation of 10,000 gals. Management
wants to reward production crews that exceed the 90th percentile. How many gallons of
paint does this represent?
Solution We want a value of x0 such that 90% of production is below that level; that is,
P(X ≤ x0) = 0.90.
First obtain the appropriate z-score:
P(Z ≤ z0) = 0.90
From the table, we find z0 ‡ 1.28.
Technology Notes
Graphing Calculator
Many calculators permit you to calculate the area under the standard normal curve
without using a table. On the TI-83, press [2nd] VARS to obtain the selection of
distribution functions. The first function, normalpdf, gives the values of the normal
density function (whose graph is the normal curve). The second, normalcdf, gives
P(a ≤ Z ≤ b). For example, to compute P(0!≤!Z!≤ 2.43), enter
normalcdf(0, 2.43).
normalcdf(-1.37, 2.43).
51
A B C D
1
2
3
3 Spreadsheet
Spreadsheet programs also come equipped with built-in statistical software that allows
you to compute P(a ≤ Z ≤ b). For example, to compute P(0!≤!Z!≤ 2.43) in Excel, enter
=NORMSDIST(2.43)-NORMSDIST(0)
=NORMSDIST(2.43)-NORMSDIST(-1.37).
Web site
Follow the path
Web site Æ Everything for Finite Math Æ Chapter 8 Æ Normal Distribution Utility
where you will find an on-line utility that computes areas under the normal curve to a
high accuracy.
Notes
1. The condition that 0 ≤!µ-3ß < µ+3ß ≤!n is satisfied if n is sufficiently large and p is
not too close to 0 or 1, and ensures that almost all of the normal curve lies in the range 0
to n.
2. In the formula P(a ≤!X ≤!b) ‡ P(a-0.5 ≤!Y ≤!b+0.5) we assume that a and b are
integers. The use of a-0.5 and b+0.5 is called the continuity correction. To see that it
is necessary, consider what would happen if you wanted to approximate, say, P(X = 2)
= P(2 ≤!X ≤!2).
Here are some binomial distributions with their normal approximations superimposed.
52
n = 15, p = 0.5 n = 30, p = 0.3
Solution
(a) We are asking for
P(X<45 or X>55) = 1 - P(45 ≤ X ≤!55).
We could compute this by calculating
C(100,45)(0.5)45(0.5)55 + C(100,46)(0.5)46(0.5)54 + . . .,
but we can much more easily approximate it by looking at a normal distribution with mean
µ = 50 and standard deviation ß = 100·0.5·0.5 = 5. (Notice that three standard
deviations above and below the mean is the range 35 to 65, which is well within the range of
possible values for X, which is 0 to 100, so the approximation should be a good one.) Let Y
have this normal distribution. Then
P(45 ≤!X ≤!55) ‡ P(44.5 ≤!Y ≤!55.5)
= P(-1.1 ≤!Z ≤!1.1)
= .86433 - .13567 = .72866.
Therefore,
P(X<45 or X>55) ‡ 1 - .72866 = 0.27134.
(b) This is a deep question, touching on the question of statistical significance: what
evidence is strong enough to overturn a reasonable assumption (the assumption that the coin
is fair)? Statisticians have developed various sophisticated ways of answering this question,
but we can look at one simple test now. Suppose that we tossed a coin 100 times and threw
66 heads. If the coin were fair, then P(X > 65) ‡ P(Y > 65.5) = P(Z!> 3.1) ‡ 0.001. This
is small enough to raise a reasonable doubt that the coin is fair. However, we should not be
too surprised if we threw 56 heads, since we can calculate that P(X > 55) ‡ 0.1357, which
is not such a small probability. As we said, the actual tests of statistical significance are
more sophisticated than this, but we shall not go into them.
Homework
p. 217, #6
p. 229, #10, 12, 14, 18, 24
53
Also:
1. If you roll a die 100 times, what is the probability that you will roll between 15 and 20
ones? (Round your answer to 2 decimal places.)
[ 0.57 ]
3. Polls In a certain political poll, each person polled has a 90% probability of telling his or
her real preference. Suppose that 55% of the population really prefer candidate Goode, and
45% prefer candidate Slick. Find first the probability that a person polled will say that he or
she prefers Goode, then find the probability that, if 1,000 people are polled, candidate
Goode will get more than 52%. !
[ Probability that a person will say Goode = 0.54. Probability that Goode polls more than
52% ‡ 0.892. ]
11 Source for this exercise and the following three: National Transportation Safety Board.
54
Topic 11
Sampling Distributions and Central Limit Theorem
(Based on Section 5.5 in the book)
Q. OK, so why not use the sample statistic as an estimate of the corresponding population
parameter: for instance, why not use the sample mean as an estimate of the population
mean?
A. This is exactly what we do to estimate population means and medians (with a slight
modification in the case of the standard deviation). However, a sample statistic (such as the
sample mean) may be “all over the place,” so a further question is: how confident can we
be in the sample statistic?
Q. Give me an example.
A. If we cast a fair die and take X to be the uppermost number, we know that the population
mean is µ = 3.5, and that the population median is also m = 3.5. But if we take a sample
of, say, four throws, the mean may be far from 3.5. Here are the results of 5 such samples
of 4 throws (we used a random number generator to obtain these samples):
X1 X2 X3 X4 X—
Sample 1 6 2 5 6 4.75
Sample 2 2 3 1 6 3
Sample 3 1 1 4 6 3
Sample 4 6 2 2 1 2.75
Sample 5 1 5 1 3 2.5
Notice that none of the five samples gave us the correct mean, and that the mean of the first
sample is far from the actual mean.
Q. The table above is interesting: look at the values of the mean X. Their median is 3 and
their mean is 3.2. Thus, although the mean of a particular sample may not be a good
predictor of the population mean, we get better results if we take the mean of a whole bunch
of sample means.
55
A. You have put your thumb on one of the most important concepts inferential statistics; the
values of X— are values of a random variable (take a sample of 5, and measure the mean), and
its probability distribution is called the sampling distribution of the sample mean. The
above table suggests that the expected value of the sampling distribution of the mean is the
same as the population mean, and this turns out to be true.
Sampling Distribution
The sampling distribution of a statistic S for samples of size n is defined as follows. The
experiment consists of choosing a sample of size n from the population and measuring the
statistic S. The sampling distribution is the resulting probability distribution.
The values of X— are 0, 1/3, 2/3, and 1. The desired sampling distribution is its probability
distribution, shown below.
X— 0 1 2 1
3 3
P(X— = x–) 1 9 27 27
64 64 64 64
Note For this small sample size, the distribution of the sample mean is a binomial
distribution. The Central Limit Theorem will tell us that, for large sample sizes, it must look
more and more like a normal distribution.
Example 2
Look at Example 6.1 on p. 242-243, which has a larger list of possible samples.
56
X1 X2 X3 X4 X5 X6 X—
Sample 1 0.136 0.397 0.278 0.029 0.810 0.496 0.358
Sample 2 0.918 0.455 0.482 0.148 0.494 0.440 0.49
Sample 3 0.076 0.868 0.626 0.104 0.902 0.425 0.5
Sample 4 0.374 0.772 0.748 0.415 0.043 0.612 0.494
Sample 5 0.855 0.005 0.203 0.950 0.526 0.246 0.464
Sample 6 0.147 0.579 0.790 0.906 0.766 0.998 0.698
Sample 7 0.303 0.159 0.990 0.055 0.031 0.715 0.376
Sample 8 0.143 0.362 0.093 0.047 0.767 0.769 0.364
Sample 9 0.523 0.232 0.296 0.096 0.983 0.423 0.426
Sample 10 0.566 0.598 0.253 0.943 0.757 0.588 0.618
Sample 11 0.096 0.375 0.062 0.230 0.437 0.434 0.272
Sample 12 0.887 0.952 0.019 0.242 0.637 0.358 0.516
Sample 13 0.208 0.099 0.802 0.157 0.956 0.818 0.507
Sample 14 0.096 0.375 0.062 0.230 0.725 0.434 0.32
Sample 15 0.887 0.952 0.019 0.242 0.393 0.358 0.475
Solution If we use the measurement classes above, we obtain the following frequency table
(omitted classes have frequency 0) and histogram (using the center values of each class).
0.4
0.3
0.2
0.1
x–
0
0.1 0.2 0.3 0.4 0.5 0.6 .7. 0.8 0.9
Note The histogram gives a "sample" of the actual sampling distribution; we can't produce
the whole sampling distribution in the above manner, since there are, in principle, infinitely
many possible samples.
57
First of all, there are going to be lots of different medians corresponding to the
different samples of 100. If we knew the sampling distribution of the sample median with n
= 100, we could compute the expected value (mean) of this sampling distribution. That is,
we can compute the expected value of the sample median. If it equals the population mean,
we would say that the sample median is an unbiased estimator of the population mean.
Otherwise, we say that it is a biased estimator with bias equal to the difference between the
expected value of the estimator and the value of the population parameter.
Further, in order to obtain a more accurate estimate of the population parameter, we
should use a sample statistic whose standard deviation (the standard deviation of its
sampling distribution) is as small as possible. In this way, the statistic of a single sample is
more likely to be close to the expected value.
Example 4 Refer to Example 1: X is the number of heads when we toss an unfair coin
(with a 75% chance of heads coming up). That is, X = 1 if it's a head and X = 0 if it's a tail.
Determine whether the sample mean is an unbiased estimator of the population mean.
Solution
We need to compare the population mean with the expected value of the sampling
distribution of the sample means.
Step 2 Compute the expected value of the sampling distribution of the sample mean.
To do this, we need the sampling distribution of the sample mean, and we already calculated
that: the sampling distribution of X— was found to be
X— 0 1 2 1
3 3
P(X— = x–) 1 9 27 27
64 64 64 64
X— 0 1 2 1
3 3
P(X— = x–) 1 9 27 27
64 64 64 64
X—P(X— = x–) 0 9 54 27
192 192 64
Adding up the numbers in the bottom row gives the expected value of the sampling
distribution:
58
144 3
E(X—) = = !
192 4
Since this is the same as the population mean, the estimator is unbiased.
Note The following results can be proved (but are apparently not mentioned in the text!)
1. The sample mean is always an unbiased estimator of the population mean, regardless of
the distribution or the sample size!
2. The sample standard deviation (recall that it uses a different formula from the population
standard deviation) is always an unbiased estimator of the population standard deviation,
again regardless of the distribution of the sample size! That is why we used n-1 instead of
n in the formula for sample standard deviation; if we used the same formula as for the
population standard deviation, it would have been a biased estimator.
Example 5 (Based on Example 6.8 of Statistics for Business and Economics 8th Ed by
McClave, Benson, and Sicich, Prentice Hall, 2001)
A battery manufacturer claims that the lifespan of the batteries produced has a mean of 54
months and a st. deviation of 6 months. Your consumer advocacy group tests 50 of them.
What is the probability that it finds a mean lifetime of less than 52 months?
59
Answer In symbols, we are seeking P(X— ≤ 52). Now, X— is approximately normally
distributed by the CLT, and has a mean of µ = 54 and a standard deviation of ßX— = 6/ 50
‡ 0.85 months. To find the required probability, we need to convert to z-scores:
x–!-!µX— 52!-!54
z= = ‡ -2.35
ßX— 0.85
Therefore,
Thus, the probability of this happening is 0.00939, or approximately 0.94%. Thus, we can
be 99.06% certain that this won't happen (if they are right!).
Exercises
p. 263 # 19, 26, and also the on-line exercises at
Web site Æ Everything for Finite Math Æ Chapter 8 Æ Sampling Distributions
60
Topic 12
Confidence Interval for a Population Mean
(Based on Sections 8.1, 8.2, 8.3 in the book)
Large Samples
Suppose we have calculated the mean x– of a large sample (n ≥ 30) of a random variable X,
and we get 120. We would like to say something like:
“We can be 95.44% certain that the population mean is 120 ±___.”
To make things easier, let us assume we know the population standard deviation ß. Then,
since the sampling distribution is approximately normal with mean µ and standard deviation
ß/ n . Looking at the standard normal tables, we find that
In other words:
In other words:
61
Answer: It is the value of Z such that P(0 ≤ Z ≤ z0.05) = 0.45. In other words, z0.05 is the
number such that
P(Z ≥ z0.05) = 0.05,
That is, it is the number (measured in standard deviations) such that the area of the upper
half of the tail of the Z-distribution is 0.05.
Here is the usual convention. We let å be such that (1-å) is the desired confidence. For
instance, here
1-å = 0.9,
so å = 0.10
for 90% confidence. Then, the z-value we want is z0.05 = zå/2.
Confidence Level:
(1-å) å å/2 Zå/2
0.90 0.10 0.05 1.645
0.95 0.05 0.025 1.96
0.99 0.01 0.005 2.575
How to find the (1-å) Confidence Interval for the Population Mean µ
(1) If we know the population standard deviation ß, then we can be 100(1-å)% certain that
µ is in the interval
x– ± zå/2ß/ n .
†Californian residents are fleeing to New York in droves now that Arnold Schwartzenegger has become
governor.
62
Example 2 (Estimating the Sample Size)
Referring to the above example, if I wanted to estimate the average number of seats to within
±0.5 with a confidence of 99%, how large a sample would I need?
Solution This time, we know the confidence interval is
z ß
± å/2 = ±0.5.
n
For 99%, å/2 = 0.005, and zå/2 = 2.575. Thus we have
2.575¿4.1
= 0.5,
n
2.575¿4.1
giving n = ‡ 21.115,
0.5
so n ‡ 445.8.
Thus, we would require a sample of size at least 446 to ensure this interval with a 99%
confidence.
Small Samples
We address (1) by making the assumption that the population is approximately normal, so
that we no longer need the Central Limit Theorem. For (2), there is still a problem since
when we were calculating zå/2, we computed its value for large samples using the normal
distribution of the statistic
x–-µ x–-µ
z= = .
ßx– ß/ n
(recall that x– is normally distributed here.) If we use the sample standard deviation instead,
we get
x–-µ
t=
s/ n
instead, it is no longer normally distributed (even if the original population is; note that we
are taking the quotient of two random variables here, and we cannot expect the result to be a
normal variable). The sampling distribution of this statistic, called the t-statistic for ( n - 1 )
degrees of freedom15 is also bell-shaped, but a little broader than the normal distribution,
and depends upon the value of n; the smaller n, the broader the distribution. Its values are
given in the book (front inside cover).
63
Summary: Dealing with Small Samples
1. If we know the population standard deviation, we can use the z-statistic as usual (making
the assumption that the original distribution is approximately normal), and the
confidence interval is
x– ± zå/2ß/ n
as usual.
2 . If all we know is the sample standard deviation, we must use the t-statistic for n-1
degrees of freedom (ñ = n-1), and the confidence interval is
x– ± tå/2s/ n .
Assumption in both cases: The population distribution is approximately normal.
Example 3
The lifetime of an inkjet printer head (in millions of characters printed until failure) for 15
different inkjet heads is 1.24, with a sample standard deviation of 0.19.
(a) Form the 99% confidence interval.
(b) If the population standard deviation is also 0.19, is the resulting confidence interval
wider or narrower?
Solution
(a) Number of degrees of freedom is ñ = n-1 = 14
t0.005 = 2.977.
Thus, the interval is
2.977¿0.19
x– ± tå/2s/ n = 1.24 ± ‡ 1.24±0.146,
15
or [1.094, 1.386] million characters.
(b) If we used z instead, we would obtain
2.575¿0.19
x– ± zå/2ß/ n = 1.24 ± ‡ 1.24±0.126,
15
a much narrower interval.
64
Homework
p. 293 # 2, 8
p. 300, #18, 22
p. 304 # 30
65
Topic 13
Introduction to Hypothesis Testing
(Based on Sections 9.1-9.3 in the book)
We have seen that the sample mean can be used to estimate the population mean, if
the latter is unknown. More precisely, when we used confidence intervals, we were making
an inference about the value of the population mean. In this section, we will test a hypothesis
about the value of the population mean.
For example, you might want to test whether the vitamin tablets made by your
company have more than 120 mg vitamin C. In such a scenario, you know that the
population mean is supposed to be > 120, and the question you ask is this: Can I be “95%
confident” (whatever that means) that the average vitamin C content in my pills is > 120
mg?
Q How do we determine whether to reject the null hypothesis H0? That is, how can I be
confident that µ is above 120?
A To simplify things, let us talk in terms of the standard normal distribution and the number
of standard deviations from the mean. We know that 95% of the sample means will be ≤
1.645 standard deviations bigger than the population mean. (See the figure.)
95%
1.645
Put another way, if the population mean is 0, then 95% of the readings will be less than
1.645. Thus, if the population mean is 0 (or less), the probability of getting a sample mean
greater than 1.645 is < 5%. In terms of conditional probability,
66
Then, the above formula translates to:
Rejecting H0 (using the rule) when in fact it is true is called a Type I error. (Accepting the
null hypothesis when it is false is called a Type II error.) Thus,
Note: This does not mean that, if we reject the null hypothesis, the probability that
it is true is < 0.05. (In other words, we cannot be 95% certain that the null hypothesis is
false; ie, that the vitamin C content is > 120 mg.) The probability that the null hypothesis is
true is
How confident can I be that H0 is false if Rule R tells me to reject it? That's hard to say, as
we would need to compute P(H0 is false |Rule R tells me to reject H0). What we can be
95% confident about is that we have not made a Type I error: that is we can be 95% certain
that if H0 was true, we would not reject it.
Here is an example: Suppose H0 is "Football player Hugo Huge has not been taking
steroids" and my steroids test has only a 5% false positive rate. That is, if Hugo is not using
steroids, then there is only a 5% chance that the test will be positive. Now, suppose Hugo
Huge's test comes up positive. If I am the coach, and my policy is to reject everyone who
comes up positive (regardless of whether or not they are actually using steroids) then Hugo
Huge will be rejected. In this context, the probability that he actually uses steroids need not
be 95%. For instance, if only 1 in a million athletes actually used steroids, then the vast
majority of those who, like Hugo, test positive (5%, or 50,000 in each million) are not using
steroids!
Thus, I cannot be 95% confident that H0 is false (i.e., that Hugo is using steroids) at all. All
I can be sure of, is that, if Hugo was not using steroids, then there would only be a 5%
chance that the test came up positive. In this context, a Type I error would be rejecting Hugo
if he is not using steroids, and I can be 95% certain that I am not making a Type I error in
rejecting Hugo (even though I can not be 95% certain that Hugo is using steroids.) Put
another way, I can be 95% confident that my policy (Rule R) is reliable in the sense that I
don't get a false positive, but I cannot be 95% certain that it is reliable if it comes up positive.
67
So, if z– > 1.645, I would therefore reject the null hypothesis, and I can be 95% confident
that I am not making a Type I error. The possible values of z– that would cause us to reject
H0 is the area to the right of the vertical line in the diagram above. We call this the rejection
region.
Now we can go back to the vitamin C pills. To convert everything to the normal
variable, we use the "test statistic"
x–-120 x–-120
z= = ,
ßx– ß/ n
where n is the sample size. Then, if z > 1.645, we reject H0. Simple as that.
Example 1
(a) Your measurements on a sample of 35 vitamin C pills give an average of 120.4 mg with
a sample standard deviation of 1.2. How can I be certain (with 95%, or å = 0.0516) that the
average dose in all my pills is > 120 mg?
Q Does that mean I should accept the null hypothesis (that is, reject the alternative
hypothesis)?
A Suppose we invented a new rule:
Rule T: Accept the null hypothesis if z– ≤ 1.645.
Accepting the null hypothesis (using Rule T) when it is false would be called a Type II
error. The probability of a Type II error is (going back to the standard distribution)
∫ = P(Rule T tells me to reject Ha | Ha is true) = P(z– ≤ 1.645 | µ > 0).
In general, this probability is difficult to estimate, and it depends on exactly how big µ
actually is. (You need to supply a value of µ in order to say anything—see Section 8.6 in
the book.)
Summary:
• To decide what H0 and Ha should be, follow the following guideline: Ha is the
hypothesis you are deciding whether to accept. (You will never accept H0.) This, Ha is
16 å is the probability of making a Type I error. The probability of making a Type II error is called ∫.
68
the hypothesis you are testing, and H0 is the "status quo:" the hypothesis that is
assumed true until you have found evidence to the contrary.
• To test a hypothesis with level of significance å, take the test statistic and compute the
value of zå for the rejection region.
• If your value of z is in the rejection region, you must, by Rule R, reject the null
hypothesis.
• If your value of z is not in the rejection region, you cannot reject the null hypothesis (but
that does not mean you must accept it!)
Note Although the corresponding null hypothesis is H0: µ ≤ 1200, some textbooks take H0
to be µ = 1200, since if we reject the hypothesis that µ ≤ 1200, then we can reject the
hypothesis that µ = 1200 as well.
In the case we looked at, the rejection region was to the right of zå in the normal
distribution. This is one of three possibilities:
-zå
Two-tailed H0 : µ =!µ0
Ha: µ ≠ µ0
-zå/2 zå/2
69
Example 1
You want to test whether the cereal boxes made by your plant conform to the requirement
that they contain 12 oz cereal. You wish to test at the 99% significance level, and you sample
100 boxes, finding x– = 11.85, s = 0.5. Do your cereal boxes meet the standard?
Solution
Take H0 to be µ = 12 (two-tailed).
The test statistic is
x–-12 11.85-12 -0.15
z= = = = -3.
s/ n 0.5/10 0.05
The value of zå/2 for the test is
z.005 = 2.575.
Referring to the diagram, we see that z is in the rejection region, so we reject H0. In other
words, your cereal does not conform to the requirement; the boxes are being under-filled.
Example 2 Your muffler factory claims to manufacture mufflers with a lifespan of more
than 10,000 miles of usage. A consumer group tests this claim at the 95% significance level,
and finds that a sample of 64 mufflers have a mean lifespan of 10,002 miles, with a standard
deviation of 10 miles. Test the following alternate hypotheses using this data, and interpret
the results:
(a) Manufacturer's hypothesis: Ha : µ > 10,000
(b) Consumer group's hypothesis: Ha : µ < 10,000
(c) If the manufacturer wanted to state that the survey proved their claim to be true,
what should x– have been?
(d) If the consumer group wanted to state that the survey proved the manufacturer's
claim to be false, what should x– have been?
Solution
The test statistic is
x–-10,000 10,002-10,000 2
z= = = = 1.6
s/ n 10/8 1.25
zå = z.05 = 1.645
(a) The rejection region is the area to the right of 1.645. Since z is below this, we cannot
reject H0, so we cannot reject the hypothesis that µ ≤ 10,000. Thus, the manufacturer cannot
claim that the lifespan of the mufflers is above 10,000 miles.
(b) The rejection region is the area to the left of -1.645. Since z is positive, it is not in the
rejection region. Thus, we cannot reject the hypothesis that µ ≥ 10,000. In other words, the
consumer group cannot state that the manufacturer's claim is wrong.
(c) To validate the manufacturer's claim, z should have been in the rejection region. That is,
x–-10,000
z= > 1.645.
1.25
This gives
x– - 10,000 > 2.05625,
so x– > 10,002.06.
70
(c) To validate the consumer group's claim, z would have to have been in their rejection
region: to the left of -1.645. Thus,
x–-10,000
z= < -1.645.
1.25
This gives
x– - 10,000 < -2.05625,
so x– < 9,997.94.
Homework:
p. 327 #2, 4
p. 329 #5, 8
p. 337 # 10, 18
p. 345 # 25, 30
71
Topic 14
Observed Significance & Small Samples
Q Instead of selecting an å first and then testing a hypothesis, can we first test the
hypothesis and then get a value for the appropriate å/ For instance, suppose you test H0
with a right-tailed test (Ha : µ > µ0) and you get a test statistic of z = 2.12. The question is,
at what significance level can you reject H0?
In other words, we can reject H0 with a significance level of p = 0.0170, (or 98.3%). Since
this value is small, we say that the test result is "statistically very significant."
Example 1
(Based on Examples 8.1 and 8.2 in the book)
You want to test whether the cereal boxes made by your plant conform to the requirement
that they contain 12 oz cereal. You sample 100 boxes, finding x– = 11.85, s = 0.5. At what
level of significance do your cereal boxes meet the standard?
Solution
Take H0 to be µ = 12 (two-tailed).
The test statistic is
x–-12 11.85-12 -0.15
z= = = = -3.
s/ n 0.5/10 0.05
Since this is two-tailed, we calculate twice the area beyond z = -3. This is found to be
2(.00135) = 0.00270.
Thus, p = 0.0027 (corresponding to 99.73%.
Thus, there is a large statistical significance that H0 should be rejected.
Q Suppose I am given a significance level å to test beforehand. Then should I bother with
the p-value at all?
A Yes. Calculate p anyway. If p is less than, or approximately equal to å, then you can
safely reject H0. If not, you cannot do so.
72
Example 2 (Cereal boxes again)
You want to test whether the cereal boxes made by your plant conform to the requirement
that they contain 12 oz cereal. You sample another 100 boxes, this time finding x– = 11.88, s
= 0.5. Do the boxes meet the 12 oz standard at the 99% level of confidence?
Since these values are "approximately equal" you can still reject Ho at the 99% level, so the
cereal boxes are still not up to par...
This is essentially the same as the testing for large samples, except for the following
adjustments:
1. If the sample size is small and the population distribution is approximately normal, we
can still use the sample standard deviation in our calculations, provided we use tå instead of
zå when forming the rejection region. For consistency, we refer the test statistic as t rather
than z.
2. When calculating p, we need to use the t-table "backwards" and we can only get an
approximate answer without statistical software packages.
Example 3
The emission (in parts of carbon per million) of 10 engines is found to be:
15.6 16.2 22.5 20.5 16.4 19.4 16.6 17.9 12.7 13.9
The mean emission must, according to regulations, be µ < 20 parts per million. Test this at
a significance level of å = 0.01.
Answer
We have H0: µ ≥ 20, and Ha : µ < 20.
Computations reveal that x– = 17.17, s = 2.98. Thus, the t-statistic is
x–-20 17.17-20
t= = = -3.00
s/ n 2.98/ 10
For the t-table, the number of degrees of freedom is ñ = n-1 = 9, so for the one-tailed test,
we use
t0.01 = 2.821. =TINV(0.02,9)
Since t falls in the rejection region, we can reject H0 at this level, so the auto manufacturer
can claim that the engines meet the standard of less than 20 parts per million at the 99%
significance level.
73
Q What about p for this test?
A Since t = -3.0 we look at the ñ = 9 row of the t-table to find the value closest to 3.0, and
we find p ‡ 0.0075. In other words, we can also reject H0 at the 99.25% level if we wanted
to.
Homework
Finish up Exercises on Previous Section, and
p. 350 # 34, 38
74
Topic 15
Confidence Intervals and Hypothesis Testing for the Proportion
(Sections 8.4 and 9.6 in the text)
Suppose you are interested in the percentage of the population that uses Wishy-Washy
detergent. Your market research people conduct a telephone survey of 200 domestic
workers and find that 32 of them, or 16% of them use Wishy-Washy.
Q1 What is a 95% confidence interval for the proportion of the whole population that uses
Wishy-Washy?
To answer the question, let us assume that the proportion p of the population actually uses
the product. (We express p as a decimal; 0 ≤ p ≤ 1). This is the population parameter. We
can phrase the scenario in terms of a binomial random variable:
Number!of!people!using!Wishy-Washy 32
x– = n = 200 = 0.16 in our example.
This is an estimate of the population parameter p, and we call it ^p. Thus, ^p = 0.16.
Similarly, the population mean µ of X is just p, the actual proportion of the population that
uses Wishy-Washy. In this way, finding a confidence interval for p amounts to nothing
more than finding a confidence interval for a population mean. All we need are:
ß= p(1-p) .
Thus, by the Central Limit Theorem, the standard deviation of x– = p^ for large samples is
approximately
p(1-p)
ß^p = ,
n
where n is the same size (200 in our example). However, we don’t know what p actually is,
so we use the approximation
75
^ (1-p
p ^)
ß^p – ‡
n
The normal approximation is good if the interval ^p ± 3ß^p does not include 0 or 1.
^p(1-p^)
^p ± zå/2
n
x
where ^p = .
n
Acid Test: The formula is valid if the interval ^p ± 3ß^p does not include 0 or 1, where
^ (1-p
p ^)
ß^p – ‡
n
Example 1 Let us find a 95% CI for the actual percentage of people who use Wishy-
Washy (done in class)
Q OK Fine, but even when n is large, the Acid Test may fail if p is very close to 0 or 1 (e.g.
as in the chance of being killed in an auto accident). When that happens, we use the
“Wilson” estimator of p instead of p^. This is given by
p~ (1-p~ )
p~ ± zå/2
n+4
76
Example 2
In a sample of 200 Americans, 3 were victims of violent crime . Estimate the true proportion
of Americans who were victims of violent crime using a 95% CI.
Q2 OK Now I know how to find CIs for population proportions. What about doing some
hypothesis testing?
A Since we already have everything we need, we can give the following procedure:
H0 : either p = p0 , p ≥!p0 , or p ≤ p0
Ha: either: p ≠ p0 , p < p0 , or p > p0 as usual
p0 (1-p0 )
where ßp ‡
0 n
Example 3
That battery manufacturer must show that fewer than 5% of its batteries are defective. It
tests 300 and finds 10 defective ones. Can the manufacturer rest assured that the number of
defectives is less than 5%. (Test at the 95% significance level).
Exercises 13
p. 309 #31, 38
p. 357 #44, 46
77
Table 1: Normal Distribution Probabilities: P(Z ≤ z)
Negative z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 0 . 0 0.50000 0.49601 0.49202 0.48803 0.48405 0.48006 0.47608 0.47210 0.46812 0.46414
- 0 . 1 0.46017 0.45620 0.45224 0.44828 0.44433 0.44038 0.43644 0.43251 0.42858 0.42465
- 0 . 2 0.42074 0.41683 0.41294 0.40905 0.40517 0.40129 0.39743 0.39358 0.38974 0.38591
- 0 . 3 0.38209 0.37828 0.37448 0.37070 0.36693 0.36317 0.35942 0.35569 0.35197 0.34827
- 0 . 4 0.34458 0.34090 0.33724 0.33360 0.32997 0.32636 0.32276 0.31918 0.31561 0.31207
- 0 . 5 0.30854 0.30503 0.30153 0.29806 0.29460 0.29116 0.28774 0.28434 0.28096 0.27760
- 0 . 6 0.27425 0.27093 0.26763 0.26435 0.26109 0.25785 0.25463 0.25143 0.24825 0.24510
- 0 . 7 0.24196 0.23885 0.23576 0.23270 0.22965 0.22663 0.22363 0.22065 0.21770 0.21476
- 0 . 8 0.21186 0.20897 0.20611 0.20327 0.20045 0.19766 0.19489 0.19215 0.18943 0.18673
- 0 . 9 0.18406 0.18141 0.17879 0.17619 0.17361 0.17106 0.16853 0.16602 0.16354 0.16109
- 1 0.15866 0.15625 0.15386 0.15151 0.14917 0.14686 0.14457 0.14231 0.14007 0.13786
- 1 . 1 0.13567 0.13350 0.13136 0.12924 0.12714 0.12507 0.12302 0.12100 0.11900 0.11702
- 1 . 2 0.11507 0.11314 0.11123 0.10935 0.10749 0.10565 0.10383 0.10204 0.10027 0.09853
- 1 . 3 0.09680 0.09510 0.09342 0.09176 0.09012 0.08851 0.08692 0.08534 0.08379 0.08226
- 1 . 4 0.08076 0.07927 0.07780 0.07636 0.07493 0.07353 0.07215 0.07078 0.06944 0.06811
- 1 . 5 0.06681 0.06552 0.06426 0.06301 0.06178 0.06057 0.05938 0.05821 0.05705 0.05592
- 1 . 6 0.05480 0.05370 0.05262 0.05155 0.05050 0.04947 0.04846 0.04746 0.04648 0.04551
- 1 . 7 0.04457 0.04363 0.04272 0.04182 0.04093 0.04006 0.03920 0.03836 0.03754 0.03673
- 1 . 8 0.03593 0.03515 0.03438 0.03362 0.03288 0.03216 0.03144 0.03074 0.03005 0.02938
- 1 . 9 0.02872 0.02807 0.02743 0.02680 0.02619 0.02559 0.02500 0.02442 0.02385 0.02330
- 2 0.02275 0.02222 0.02169 0.02118 0.02068 0.02018 0.01970 0.01923 0.01876 0.01831
- 2 . 1 0.01786 0.01743 0.01700 0.01659 0.01618 0.01578 0.01539 0.01500 0.01463 0.01426
- 2 . 2 0.01390 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.01160 0.01130 0.01101
- 2 . 3 0.01072 0.01044 0.01017 0.00990 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842
- 2 . 4 0.00820 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639
- 2 . 5 0.00621 0.00604 0.00587 0.00570 0.00554 0.00539 0.00523 0.00508 0.00494 0.00480
- 2 . 6 0.00466 0.00453 0.00440 0.00427 0.00415 0.00402 0.00391 0.00379 0.00368 0.00357
- 2 . 7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.00280 0.00272 0.00264
- 2 . 8 0.00256 0.00248 0.00240 0.00233 0.00226 0.00219 0.00212 0.00205 0.00199 0.00193
- 2 . 9 0.00187 0.00181 0.00175 0.00169 0.00164 0.00159 0.00154 0.00149 0.00144 0.00139
- 3 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00104 0.00100
- 3 . 1 0.00097 0.00094 0.00090 0.00087 0.00084 0.00082 0.00079 0.00076 0.00074 0.00071
- 3 . 2 0.00069 0.00066 0.00064 0.00062 0.00060 0.00058 0.00056 0.00054 0.00052 0.00050
- 3 . 3 0.00048 0.00047 0.00045 0.00043 0.00042 0.00040 0.00039 0.00038 0.00036 0.00035
- 3 . 4 0.00034 0.00032 0.00031 0.00030 0.00029 0.00028 0.00027 0.00026 0.00025 0.00024
- 3 . 5 0.00023 0.00022 0.00022 0.00021 0.00020 0.00019 0.00019 0.00018 0.00017 0.00017
- 3 . 6 0.00016 0.00015 0.00015 0.00014 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011
- 3 . 7 0.00011 0.00010 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00008 0.00008
- 3 . 8 0.00007 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00005 0.00005 0.00005
- 3 . 9 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004 0.00004 0.00004 0.00003 0.00003
78
Positive z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 . 0 0.50000 0.50399 0.50798 0.51197 0.51595 0.51994 0.52392 0.52790 0.53188 0.53586
0 . 1 0.53983 0.54380 0.54776 0.55172 0.55567 0.55962 0.56356 0.56749 0.57142 0.57535
0 . 2 0.57926 0.58317 0.58706 0.59095 0.59483 0.59871 0.60257 0.60642 0.61026 0.61409
0 . 3 0.61791 0.62172 0.62552 0.62930 0.63307 0.63683 0.64058 0.64431 0.64803 0.65173
0 . 4 0.65542 0.65910 0.66276 0.66640 0.67003 0.67364 0.67724 0.68082 0.68439 0.68793
0 . 5 0.69146 0.69497 0.69847 0.70194 0.70540 0.70884 0.71226 0.71566 0.71904 0.72240
0 . 6 0.72575 0.72907 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75175 0.75490
0 . 7 0.75804 0.76115 0.76424 0.76730 0.77035 0.77337 0.77637 0.77935 0.78230 0.78524
0 . 8 0.78814 0.79103 0.79389 0.79673 0.79955 0.80234 0.80511 0.80785 0.81057 0.81327
0 . 9 0.81594 0.81859 0.82121 0.82381 0.82639 0.82894 0.83147 0.83398 0.83646 0.83891
1 . 0 0.84134 0.84375 0.84614 0.84849 0.85083 0.85314 0.85543 0.85769 0.85993 0.86214
1 . 1 0.86433 0.86650 0.86864 0.87076 0.87286 0.87493 0.87698 0.87900 0.88100 0.88298
1 . 2 0.88493 0.88686 0.88877 0.89065 0.89251 0.89435 0.89617 0.89796 0.89973 0.90147
1 . 3 0.90320 0.90490 0.90658 0.90824 0.90988 0.91149 0.91308 0.91466 0.91621 0.91774
1 . 4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1 . 5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.94408
1 . 6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1 . 7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1 . 8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96856 0.96926 0.96995 0.97062
1 . 9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2 . 0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169
2 . 1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2 . 2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899
2 . 3 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158
2 . 4 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361
2 . 5 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520
2 . 6 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643
2 . 7 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736
2 . 8 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807
2 . 9 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861
3 . 0 0.99865 0.99869 0.99874 0.99878 0.99882 0.99886 0.99889 0.99893 0.99896 0.99900
3 . 1 0.99903 0.99906 0.99910 0.99913 0.99916 0.99918 0.99921 0.99924 0.99926 0.99929
3 . 2 0.99931 0.99934 0.99936 0.99938 0.99940 0.99942 0.99944 0.99946 0.99948 0.99950
3 . 3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3 . 4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3 . 5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983
3 . 6 0.99984 0.99985 0.99985 0.99986 0.99986 0.99987 0.99987 0.99988 0.99988 0.99989
3 . 7 0.99989 0.99990 0.99990 0.99990 0.99991 0.99991 0.99992 0.99992 0.99992 0.99992
3 . 8 0.99993 0.99993 0.99993 0.99994 0.99994 0.99994 0.99994 0.99995 0.99995 0.99995
3 . 9 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996 0.99996 0.99997 0.99997
79
t-Statistic Excel: =TINV(2*a,df)
df 0.1 0.05 0.01 0.025 0.005
1 3.078 6.314 31.821 12.706 63.656
2 1.886 2.920 6.965 4.303 9.925
3 1.638 2.353 4.541 3.182 5.841
4 1.533 2.132 3.747 2.776 4.604
5 1.476 2.015 3.365 2.571 4.032
6 1.440 1.943 3.143 2.447 3.707
7 1.415 1.895 2.998 2.365 3.499
8 1.397 1.860 2.896 2.306 3.355
9 1.383 1.833 2.821 2.262 3.250
10 1.372 1.812 2.764 2.228 3.169
11 1.363 1.796 2.718 2.201 3.106
12 1.356 1.782 2.681 2.179 3.055
13 1.350 1.771 2.650 2.160 3.012
14 1.345 1.761 2.624 2.145 2.977
15 1.341 1.753 2.602 2.131 2.947
16 1.337 1.746 2.583 2.120 2.921
17 1.333 1.740 2.567 2.110 2.898
18 1.330 1.734 2.552 2.101 2.878
19 1.328 1.729 2.539 2.093 2.861
20 1.325 1.725 2.528 2.086 2.845
21 1.323 1.721 2.518 2.080 2.831
22 1.321 1.717 2.508 2.074 2.819
23 1.319 1.714 2.500 2.069 2.807
24 1.318 1.711 2.492 2.064 2.797
25 1.316 1.708 2.485 2.060 2.787
26 1.315 1.706 2.479 2.056 2.779
27 1.314 1.703 2.473 2.052 2.771
28 1.313 1.701 2.467 2.048 2.763
29 1.311 1.699 2.462 2.045 2.756
30 1.310 1.697 2.457 2.042 2.750
31 1.309 1.696 2.453 2.040 2.744
32 1.309 1.694 2.449 2.037 2.738
33 1.308 1.692 2.445 2.035 2.733
34 1.307 1.691 2.441 2.032 2.728
35 1.306 1.690 2.438 2.030 2.724
40 1.303 1.684 2.423 2.021 2.704
45 1.301 1.679 2.412 2.014 2.690
50 1.299 1.676 2.403 2.009 2.678
75 1.293 1.665 2.377 1.992 2.643
100 1.290 1.660 2.364 1.984 2.626
200 1.286 1.653 2.345 1.972 2.601
1000 1.282 1.646 2.330 1.962 2.581
80