Chapter 2
Chapter 2
PROBABILITY
1. Basic Definitions
By the term experiment, we mean any well-defined procedure that produces an observable outcome. In
this course, we are interested in experiments whose outcomes cannot be perfectly predicted in
advance. We refer to these as random experiments.
Random Experiment:
An experiment that can result in different outcomes even when repeated in exactly the same way is
called a random experiment. Each single performance of a random experiment is called a trial. So, for
example, flipping a coin or generating a random number would be a trial.
Sample Space: The set of all possible outcomes of a random experiment is called the sample space
associated with that experiment. We denote the sample space of an experiment by 𝑆.
EXAMPLES
𝑆 = {𝐻, 𝑇}
𝑆 = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
… … … … … … … … … … … … … … … … ….
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Example 6. A bus is known to arrive at a bus stop any time between 10:00 am and 10:30 am. A person
arrives at this bus stop at 10:00. The experiment is to note the time of arrival of the bus past 10:00 am.
𝑆 = {𝑡|0 ≤ 𝑡 ≤ 30}
The first four sample spaces are finite; they contain, two, four, eight, and thirty-six elements,
respectively. The fifth sample and the sixth sample spaces are infinite.
We say a set is countably infinite if it can be put into a one-to-one correspondence with the set of
integers. Thus, the sample space we have in example 5 is countably infinite. We say a set is
uncountably infinite if it is an interval on the real line. Thus, the sample space of example 6 is
uncountably infinite.
A sample space that is finite or countably infinite is called discrete and a sample space that is
uncountably infinite is called continuous. Thus, the sample spaces we have in examples 1, 2, 3, 4, and 5
are discrete. The sample space of example 6 is continuous.
Event: Any subset of a sample space is called an event. Events are denoted by capital letters, 𝐸, 𝐹, .. or
𝐸! , 𝐸" , …
So, if we think of a sample space as a universal set, we can think of an event as a set. With this analogy,
we can use Venn diagrams to depict events.
The Union of Two Events 𝑬𝟏 , 𝑬𝟐 , denoted 𝐸! ∪ 𝐸" , is the event “either 𝐸! occurs or 𝐸" occurs.
The Intersection of Two Events 𝑬𝟏 , 𝑬𝟐 , denoted 𝐸! ∩ 𝐸" , is the event “both 𝐸! occurs and 𝐸" occurs.
If there is no intersection, i.e., if 𝐸! ∩ 𝐸" = ∅, we say the events 𝐸! , 𝐸" are mutually exclusive (or
disjoint).
An experiment consists of selecting three components each of which can be defective (𝑑) or non-
defective (𝑛). In this case, the sample space is
Then,
𝐸 & ∩ 𝐹 = { 𝑛𝑛𝑛}
Below are some important properties of the set operations, unions, intersections, and complements:
I. COMMUTATIVITY
𝐸∪𝐹 =𝐹∪𝐸
𝐸∩𝐹 =𝐹∩𝐸
II. ASSOCIATIVITY
IV. COMPLIMENTATION
(𝐸 ( )( = 𝐸
∅( = 𝑆
𝑆( = ∅
V. De Morgan’s Laws
(𝐸 ∪ 𝐹)( = 𝐸′ ∩ 𝐹′
(𝐸 ∩ 𝐹)( = 𝐸′ ∪ 𝐹′
2. Counting Techniques
As we will see later, in case of finite sample spaces, computing probabilities reduces to counting the
number of elements in an event. To this end, we will learn certain counting techniques.
(ii) Factorial
(v) Combinations
Suppose a procedure requires 𝑘 steps, and that step 1 can be completed in 𝑛! ways, step 2 can be
completed in 𝑛" ways, …, and step can be completed in 𝑛) ways. Then, the entire procedure can be
completed in
𝑛! × 𝑛" × ⋯ × 𝑛)
ways.
EXAMPLES
Example 1. New designs for a treatment tank have proposed three possible shapes, four possible sizes,
three locations for input valves, and five locations for output valves. How many different product
designs are possible?
3 × 4 × 3 × 5 = 180
possible designs.
Example 2. If a state uses three letters and four digits in its license plates, how many different license
plates can be made?
(26)(26)(26)(10)(10)(10)(10) = 175,760,000
Example 3. Using the digits 1,2,3 and 5, how many 4-digit numbers can be formed if,
a) The first digit must be 1 and repetition of the digits are allowed?
(1)(4)(4)(4) = 64
b) The first digit must be 1 and repetition of the digits are not allowed?
(1)(3)(2)(1) = 6
If the number is divisible by 2, then the last digit must be an even number.
(4)(4)(4)(1) = 64
(3)(2)(1)(1) = 6
(ii) Factorial
𝑛! = 1 × 2 × ⋯ × 𝑛
We define 0! = 1.
To compute 𝑛!, in your calculators put 𝑛, then → MATH → PRB and choose option 4 (which is !) and hit
ENTER twice.
Example 1. Suppose nine students are to be seated in a row. How many different ways can this be
done?
9! = 362,880
In problems involving the selection of a number of items from a larger group of items, the question of
the order of selection becomes important. For example, selecting two items out of three, we have three
choices if order is not important, but six if order is important.
10!
b) Assume that books numbered 5 and 6 should stay together. Book 6 should be placed after Book
5. How many arrangements are there?
Here for this part let’s consider books 5 and 6 as a one unit that’s glued together. Now we have
9 books.
c) Assume that books numbered 5 and 6 should stay together. How many different arrangements
are there?
For this part again let’s consider books 5 and 6 together as a one unit with 9! arrangements as
the previous part. But the books 5 and 6 can have another 2! arrangements.
9! × 2!
When selecting 𝑟 items from a group of 𝑛 distinct items (0 ≤ 𝑟 ≤ 𝑛) if order is important, the number of
selections is called a permutation and is denoted as 𝑃*+ .
𝑛!
𝑃*+ =
(𝑛 − 𝑟)!
𝑃,+ = 1
𝑃++ = 𝑛!
EXAMPLES
Example 1. On a page with nine locations, in how many ways can four distinct pictures be placed?
Clearly, order matters – pictures placed differently will give us different pages. Thus, the required
number is
9!
𝑃-. = = 3024
5!
To compute 𝑃*+ , in your calculators put 𝑛, then → MATH → PRB and choose option 2 (which is nPr), then
enter 𝑟 and hit ENTER.
Example 2. Ten athletes compete in a race. How many different first three place finishes are there?
Obviously, order is important ABC and BAC are two different finishes. Hence the required number is
𝑃'!, = 720
Recall that if all 𝑛 items are distinct, the number of different arrangements is 𝑛! However, if we have a
group of 𝑛 objects where 𝑛! are of one kind, 𝑛" are of one kind, …, and 𝑛) are of one kind, then what
happens to the number of different arrangements?
For example, if we have three distinct objects a, b, and c, there are 3! = 6 distinct arrangements
EXAMPLES
Example 1. How many different barcodes can be formed using three thick, four medium, and six thin
lines?
Here 𝑛! = 3, 𝑛" = 4, 𝑛' = 6, and 𝑛 = 13. Thus, the required number is.
13!
= 60,060
3! × 4! × 6!
Example 2. A hospital needs to schedule five knee surgeries, two hip surgeries, and four shoulder
surgeries in one day. How many schedules are possible?
Here 𝑛! = 5, 𝑛" = 2, 𝑛' = 4, and 𝑛 = 11. Thus, the required number is.
11!
= 6930
5! × 2! × 4!
Example 3. How many different arrangements of the letters of the word MISSISSIPPI are there?
Let 𝑛! be the number of M’s, 𝑛" the number of I’s, 𝑛' the number of S’s, and 𝑛- the number of P’s.
Then, 𝑛! = 1, 𝑛" = 4, 𝑛' = 4, 𝑛- = 2, and 𝑛 = 11. Thus, the number of different arrangements is
11!
= 34,650
1! × 4! × 4! × 2!
(v) Combinations
When selecting 𝑟 items from a group of 𝑛 distinct items (0 ≤ 𝑟 ≤ 𝑛) if order is not important, the
number of selections is called a combination and is denoted as O+*P.
Obviously, this number should be less than the number of permutations. By what factor?
𝑛 𝑛!
Q R=
𝑟 𝑟! (𝑛 − 𝑟)!
Some Special Cases
𝑛
Q R=1
0
𝑛
Q R=1
𝑛
Again, this is why it makes sense to define 0! = 1
EXAMPLES
Example 1. From a group of 12 people, how many different committees of three can be chosen?
Here ABC BCA are the same group of three people, so order does not matter. Thus, the number of
choices is
12 12!
S T= = 220
3 3! × 9!
To compute O+*P, in your calculators put 𝑛, then → MATH → PRB and choose option 3 (which is nCr), then
enter 𝑟 and hit ENTER.
Example 2. From a group of 20 items five of which are defective, choose five such that two are
defective.
In this case, we want to choose two from the group of five defectives and three from the group of non-
defectives. Clearly, order does not matter. Thus, the required number is
5 15
S T × S T = 4550
2 3
Note that we are also applying the multiplication principle in this problem.
Example 3. (Hoosier Lottery) When you buy a Powerball ticket, you select 5 different white numbers
from among the numbers 1 through 59 (order of selection does not matter), and one red number from
among the numbers 1 through 35. How many different Powerball tickets can you buy?
59
Combinations of white numbers: Q R
5
35
Combinations of red numbers: Q R
1
59 35
Total number of different tickets: Q R Q R
5 1
Introduction to Probability
1. AXIOMS OF PROBABILITY
An axiom or postulate is a statement that is taken to be true, to serve as a premise or starting point for
further reasoning and arguments. The word comes from the Greek axíōma meaning “something that is
evident”. An axiomatic system is any set of axioms from which we derive all other theorems.
An axiomatic system is said to be consistent if it lacks contradiction. That is, it is impossible to derive both
a statement and its denial from the system's axioms.
In an axiomatic system, an axiom is called independent if it is not a theorem that can be derived from
other axioms in the system. A system is called independent if each of its underlying axioms is independent.
An axiomatic system is called complete if for every statement, either itself or its negation is derivable from
the system's axioms.
Axioms of Probability
Let 𝐸 be any event. We will call a number 𝑃(𝐸) the probability of 𝑬, if it satisfies the following:
Axiom 1. 𝑃(𝑆) = 1
Axiom 3. If 𝐸! , 𝐸" are mutually exclusive, then 𝑃(𝐸! ∪ 𝐸" ) = 𝑃(𝐸! ) + 𝑃(𝐸" )
Definition. The outcomes of a sample space are called equally likely if all of them have the same chance
of occurring.
Theorem 1. Let 𝑆 be a finite sample space of 𝑛 equally likely outcomes. Then the probability of each
!
outcome is +.
Theorem 3. Let 𝑆 be a finite sample space of 𝑛 equally likely outcomes. Let 𝐸 be an event with 𝑚
0
elements. Then 𝑃(𝐸) = +
Theorem 4. 𝑃(∅) = 0
This result can be extended to any number of sets. For example, if we have three sets 𝐸, 𝐹, 𝐻
Find
(a) 𝑃(𝐴% )
𝑃(𝐴% ) = 1 − 𝑃(𝐴) = 1 − 0.3 = 0.7
(b) 𝑃(𝐴 ∪ 𝐵)
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = 0.3 + 0.2 − 0.1 = 0.4
(c) 𝑃(𝐴% ∩ 𝐵)
Note that 𝐵 can be written as a union of two disjoint sets
𝐵 = (𝐴 ∩ 𝐵) ∪ (𝐴& ∩ 𝐵)
Thus,
Example 2:
A wafer is randomly selected from a batch of 940 wafers in a semiconductor manufacturing process.
Let H denote the event that the wafer contains high levels of contamination.
𝑃(𝐻 ∩ 𝐶) is the probability that the water is from the center of the sputtering tool and
𝑃(𝐻 ∩ 𝐶) = 112/940
The event (𝐻 ∩ 𝐶) is the event that a wafer is from the center of the sputtering tool or contains high
levels of contamination (or both)
2. CONDITIONAL PROBABILITY
Sometimes probabilities need to be reevaluated and revised as additional information becomes available.
The probability of an event 𝐵 given that an event 𝐴 has occurred is called the conditional probability of 𝐵
given 𝐴 and is denoted as 𝑃(𝐵|𝐴).
One can think in terms of a simple Venn diagram. By saying “an event 𝐴 has occurred,” we are “reducing
the sample space to 𝐴. Thus, the part that corresponds to 𝐵 in this new sample space is 𝐴 ∩ 𝐵.
B A
New B
Similarly,
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
provided that 𝑃(𝐵) ≠ 0.
Example 1. The analysis of results from a leaf transmutation experiment is as follows:
Let 𝐴 be the event “There is textural change,” and 𝐵 the event “There is color change”. Find
(a) 𝑃(𝐴)
356
𝑃(𝐴) = = 0.712
500
(b) 𝑃(𝐵)
369
𝑃(𝐵) = = 0.738
500
(c) 𝑃(𝐴 ∩ 𝐵)
293
𝑃(𝐴 ∩ 𝐵) = = 0.586
500
(d) 𝑃(𝐴|𝐵)
𝑃(𝐴 ∩ 𝐵) 293 369 293
𝑃(𝐴|𝐵) = = ÷ = = 0.794
𝑃(𝐵) 500 500 369
Note that the probability of finding a leaf with textural change was 𝑃(𝐴) = 0.712. With the extra given
information (that the leaf had color change) this probability changed to 0.794.
(e) 𝑃(𝐵|𝐴)
𝑃(𝐴 ∩ 𝐵) 293 356 293
𝑃(𝐵|𝐴) = = ÷ = = 0.823
𝑃(𝐴) 500 500 356
Note that the probability of finding a leaf with color change was 𝑃(𝐵) = 0.738. With the extra given
information (that the leaf had texture change) this probability changed to 0.823.
Example 2. Surface Flaws and Detectives
Table provides an example of 400 parts classified by surface flaws and as (functionally) defective. Of the
parts with surface flaws (40 parts), the number of defective ones is 10.
Therefore,
And of the parts without surface flaws (360 parts), the number of defective ones is 18. Therefore,
3. MULTIPLICATION RULE
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵)𝑃(𝐵)
or
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴)
we get the multiplication rule.
Example 1. Suppose a machine operation requires two steps. The probability that the first step is
successful is 0.86. Given that the first step is successful, the probability that the second step is successful
is 0.94. What is the probability that this operation will be successful?
Let 𝐴 be the event the first step is successful, and 𝐵 be the event the second step is successful. We want
what is the probability that the first card is the ace of spades, and the second card is a heart?
Let A be the event where the 1st card is “A of Spades” and let B be the event where the 2nd card is a
“Heart”.
We want,
13 1
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) = ∗ = 0.0049
51 52
The total probability rule breaks up probability calculations into distinct parts. It is used to find the
probability of an event, 𝐵, when we do not know enough about 𝐵 to calculate this probability directly.
Instead, we take a related event, 𝐴, and use that to calculate the probability of 𝐵.
Here is how we can do this. Any set 𝐵 can be written as a union of two disjoint sets. In fact,
𝐵 = (𝐵 ∩ 𝐴) ∪ (𝐵 ∩ 𝐴% )
𝐴 𝐵∩𝐴 𝐵 ∩ 𝐴% 𝐴%
𝐹 = component fails, 𝐻 = it has high level of contamination, 𝐻% = it has low level of contamination.
EXAMPLE. A component may have high, medium, or low level of contamination, which affects its failure
rate.
Let 𝐹 = component fails, 𝐻 = it has high level of contamination, 𝑀 = it has medium level of
contamination, and 𝐿 = it has low level of contamination.
6. INDEPENDENCE
We say two events 𝐴, 𝐵 are independent if knowing one these events has occurred (or has not occurred)
does not affect the probability of the other. Consequently, if 𝐴, 𝐵 are independent,
𝑃(𝐵|𝐴) = 𝑃(𝐵)
Since,
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
EXAMPLES
Example 1. In the following table let 𝑈 be the event a person lives in the city, 𝑅 the event the person lives
in the suburbs, 𝑇 the event the person takes a bus to work, 𝐶 the person drives his/her own car to work,
and 𝐵 the person rides a bike to work:
𝑈 𝑅 Total
𝑇 42 18 60
𝐶 23 61 84
𝐵 12 4 16
Total 77 83 160
Thus,
𝑃(𝑈)𝑃(𝑇) = (0.481)(0.375) = 0.180
Note that
42
𝑃(𝑈 ∩ 𝑇) = = 0.2625
160
Since
𝑃(𝑈 ∩ 𝑇) ≠ 𝑃(𝑈)𝑃(𝑇)
Example 2. Suppose we have a series circuit with two switches, where the switches fail independently.
Let 𝐴 be the event the first switch works and 𝐵 the event the second switch works. If we are given, 𝑃(𝐴) =
0.96 and 𝑃(𝐵) = 0.94, what is the probability that the circuit will work?
A B
In this case, both the first and the second switch must work, that is, we want 𝑃(𝐴 ∩ 𝐵). But since we are
given they are independent, this probability is
Example 3. Suppose we have a parallel circuit with two switches, where the switches fail independently.
Let 𝐴 be the event the first switch works and 𝐵 the event the second switch works. If we are given, 𝑃(𝐴) =
0.96 and 𝑃(𝐵) = 0.94, what is the probability that the circuit will work?
In this case, we want either the first or the second switch to work, that is, we want 𝑃(𝐴 ∪ 𝐵).
Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a formula for
determining conditional probabilities. The theorem provides a way to revise existing predictions or
theories, that is, update probabilities, given new or additional evidence. The simplest interpretation is
that Bayes’ Theorem is about calculating conditional probabilities – we use it to find the conditional
probability of an event 𝑃(𝐴|𝐵) when the "reverse" conditional probability 𝑃(𝐵|𝐴) is known.
Example 1
A company gets 40% of its products from Factory I and 60% from Factory II. It is known that Factory I have
a defective rate of 3%, and Factory II that of 1%. A product is selected at random and is found to be
defective. What is the probability it came from Factory I?
Let 𝐴 be the event “the product is from Factory I”. Of course, then 𝐴% is the event “the product is from
Factory II”. Let 𝐵 be the event the item is defective. Clearly, we want to compute 𝑃(𝐴|𝐵).
𝑃(𝐵|𝐴)𝑃(𝐴) (0.03)(0.40)
𝑃(𝐴|𝐵) = % = = 0.67
𝑃 (𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴 )𝑃(𝐴% ) (0.03)(0.40) + (0.01)(0.60)
Example 2 (False Positives, False Negatives)
Suppose we know that 1% of a population has a type of viral infection (and therefore 99% do not). Doctors
have designed a test to detect this virus. In this design, 93% of the time, the test detects the virus when
it is there (and therefore 7% of the time misses it). Also, 8% of the time, the test detects the virus when
it is not there (and therefore 92 % correctly return a negative result).
A person takes the test, and the test is positive. What is the probability that the person has the virus?
Let 𝑉 be having the virus and 𝑇 be testing positive. We want to compute 𝑃(𝑉|𝑇). By Bayes’ theorem,
𝑃(𝑇|𝑉)𝑃(𝑉)
𝑃(𝑉|𝑇) =
𝑃(𝑇|𝑉)𝑃(𝑉 ) + 𝑃(𝑇|𝑉 & )𝑃(𝑉 & )
𝑃(𝑇|𝑉) is the probability that the test will be positive given that the person has the virus. Thus, 𝑃(𝑇|𝑉) =
0.93. Clearly, 𝑃(𝑉) = 0.01, and consequently, 𝑃(𝑉 & ) = 0.99
Note that, 𝑃(𝑇|𝑉 & ) is the probability that the test will be positive even though the person does not have
the virus. Thus, 𝑃(𝑇|𝑉 & ) = 0.08
Hence,
(0.93)(0.01)
𝑃(𝑉|𝑇) = = 0.105
(0.93)(0.01) + (0.08)(0.99)
Example 3:
At a certain university, 4% of men are over 6 feet tall and 1% of women are over 6 feet tall. The total
student population is divided in the ratio 3:2 in favor of women. If a student is selected at random from
among all those over six feet tall, what is the probability that the student is a woman?
Let M be the event of Men and W be the event of Women and T be the event of being taller than 6ft.
2
𝑃(𝑀) = = 0.4
5
3
𝑃(𝑊) = = 0.6
5
𝑃(𝑇|𝑀) = 0.04
𝑃(𝑇|𝑊) = 0.01
𝑃(𝑇|𝑊)𝑃(𝑊)
𝑃(𝑊|𝑇) =
𝑃 (𝑇|𝑊 )𝑃(𝑊 ) + 𝑃(𝑇|𝑀)𝑃(𝑀)
0.01 × 0.6
=
0.01 × 0.6 + 0.04 × 0.4
= 0.2727
𝒏 event version
Let 𝐵 be any event and let 𝐴! , 𝐴" , … , 𝐴+ be a partition of 𝑆. Then, for any 1 ≤ 𝑗 ≤ 𝑛,
𝑃O𝐵l𝐴1 P𝑃(𝐴1 )
𝑃O𝐴1 l𝐵P =
𝑃(𝐵|𝐴! )𝑃(𝐴! ) + 𝑃(𝐵|𝐴" )𝑃(𝐴" ) + ⋯ + 𝑃(𝐵|𝐴+ )𝑃(𝐴+ )
In statistical applications, 𝐴! , 𝐴" , … , 𝐴+ are called hypotheses and 𝑃(𝐴1 ) is called the prior probability of
𝐴1 and 𝑃O𝐴1 l𝐵P is called the posterior probability of 𝐴1 given that 𝐵 has occurred.
Example 1 . Students are going to evaluate a professor. We know that 45% of the students in that class
have very high grades, 37% moderately high grades, and 18% failing grades. It has been determined that
90% of the students with high grades, 63% of the students with moderately high grades, and 12% of the
students with failing grades have given good evaluations.
An evaluation is picked at random and is found to be good. What is the probability that it was from a
student with very high grades?
Let
𝐴! : student has very high grade
𝐴" : student has moderately high grade
𝐴' : student has a failing grades
𝐵: the evaluation is good
𝑃(𝐵|𝐴! )𝑃(𝐴! )
𝑃(𝐴! |𝐵) =
𝑃(𝐵|𝐴! )𝑃(𝐴! ) + 𝑃(𝐵|𝐴" )𝑃(𝐴" ) + 𝑃(𝐵|𝐴' )𝑃(𝐴' )
(0.90)(0.45)
= = 0.614
(0.90)(0.45) + (0.63)(0.37) + (0.12)(0.18)
Example 2 .
Identifying the Source of a Defective Item. Three different machines M1, M2, and M3 were used for
producing a large batch of similar manufactured items. Suppose that 20 percent of the items were
produced by machine M1, 30 percent by machine M2, and 50 percent by machine M3. Suppose further
that 1 percent of the items produced by machine M1 are defective, that 2 percent of the items produced
by machine M2 are defective, and that 3 percent of the items produced by machine M3 are defective.
Finally, suppose that one item is selected at random from the entire batch and it is found to be defective.
We shall determine the probability that this item was produced by machine M2.