0% found this document useful (0 votes)
51 views107 pages

CH 1

This document provides an overview of a probability and statistics course taught by instructor Shengyu Zhang. It includes information about the course website, content, textbook, and grading. The content will cover topics like sets, probability models, conditional probability, total probability theorem, Bayes' rule, independence, and counting. Examples are provided for concepts like sample spaces, events, discrete and continuous probability models, conditional probability, and a problem involving the probability that Romeo and Juliet will meet based on their arrival times.

Uploaded by

ctpoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views107 pages

CH 1

This document provides an overview of a probability and statistics course taught by instructor Shengyu Zhang. It includes information about the course website, content, textbook, and grading. The content will cover topics like sets, probability models, conditional probability, total probability theorem, Bayes' rule, independence, and counting. Examples are provided for concepts like sample spaces, events, discrete and continuous probability models, conditional probability, and a problem involving the probability that Romeo and Juliet will meet based on their arrival times.

Uploaded by

ctpoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

Instructor: Shengyu Zhang

About the course

 Website:
http://www.cse.cuhk.edu.hk/~syzhang/course
/Prob17/
 You can find the lecture slides, tutorial slides,
info for time, venue, TA, textbook, grading
method, etc.
 No tutorial in the first week.
 Announcements will be posted on web.
 The important ones will be sent to your cuhk email
as well.
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Sets

 Probability makes extensive use of set


operations.
 A set is a collection of objects, which are the
elements of the set.
 𝑥 ∈ 𝑆: 𝑆 is a set and 𝑥 is an element of 𝑆
 𝑥 ∉ 𝑆: 𝑥 is not an element of 𝑆.
 ∅: A set that has no elements; called empty
set.
Sets

 Subset: 𝑆 ⊆ 𝑇
 Equal sets: 𝑆 = 𝑇
 Countable vs. uncountable
 Universal set Ω: The set which contains all
objects that could conceivably be of interest
in a particular context.
 Complement: 𝑆 = 𝑆 𝑐 = Ω − 𝑆.
Sets

 Union of sets: 𝑆 ∪ 𝑇, ∞ 𝑖=1 𝑆𝑖 , 𝑖∈𝐼 𝑆𝑖 .


 Intersection of sets: 𝑆 ∩ 𝑇, ∞ 𝑖=1 𝑆𝑖 , 𝑖∈𝐼 𝑆𝑖 .
 Disjoint sets: empty pairwise intersection.
 Partition of set 𝑆: a collection of disjoint sets
whose union is 𝑆.
 De Morgan’s laws:
𝑖 𝑆𝑖 = 𝑖 𝑆𝑖 , 𝑖 𝑆𝑖 = 𝑖 𝑆𝑖
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Experiment and outcomes

 A probabilistic model is a mathematical


description of an uncertain situation.
 Every probabilistic model involves an
underlying process, called the experiment.
 Example. Flip two coins.
 The experiment produces exactly one out
of several possible outcomes.
 Example. four outcomes: 𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇
Sample space and events

 The set of all possible outcomes is the


sample space, usually denoted by Ω.
 Example. Ω = 𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇 .

 Event: a subset of sample space.


 𝐴 ⊆ Ω is a set of possible outcomes
 Example. 𝐴 = 𝐻𝐻, 𝑇𝑇 , the event that the two
coins give the same side.
Infinite sample space
 The sample space of an experiment may consist
of a finite or an infinite number of possible
outcomes.
 Finite sample spaces are conceptually and
mathematically simpler.
 Sample spaces with an infinite number of elements
are quite common.
 As an example, consider throwing
a dart on a board and viewing
the point of impact as the
outcome.
 The region “Bullseye” is an event:
it’s a subset of the sample space.
Be careful with the sample space

 One should choose an appropriate sample


space.
 Different elements of the sample space should
be distinct and mutually exclusive, so that when
the experiment is carried out there is a unique
outcome.
 The sample space must also be collectively
exhaustive, in the sense that no matter what
happens in the experiment, we always obtain an
outcome that has been included in the sample
space.
Sequential models

 Many experiments have an inherently sequential


character.
 Examples:
 tossing a coin three times,
 observing the value of a stock on 5 successive days,
 receiving eight successive digits at a communication
receiver.
 It is then often useful to describe the experiment
and the associated sample space by means of a
tree-based sequential description.
Sequential models

 Example: row a 4-sided die twice.


Probabilistic laws

 After we settled on the sample space Ω


associated with an experiment, we need to
introduce a probabilistic law.
 The probability law assigns to a set 𝐴 of
possible outcomes a nonnegative number
𝑃 𝐴 .
 The value 𝑃 𝐴 encodes our knowledge or
belief about the collective "likelihood" of the
elements of 𝐴.
Probabilistic laws

 Consider the example of tossing two coins.


 What’s 𝑃 𝐻𝐻 ? 𝑃 𝐻𝑇 ? 𝑃 𝑇𝐻 ? 𝑃 𝑇𝑇 ?
 Many possibilities. For example, uniform
distribution says the following:
𝑃 𝐻𝐻 = 𝑃 𝐻𝑇 = 𝑃 𝑇𝐻 = 𝑃 𝑇𝑇 = 1/4.

 For 𝐴 = 𝐻𝐻, 𝑇𝑇 , what’s 𝑃(𝐴)?


 In uniform distribution, 𝑃 𝐴 = 1/2.
Probability Axioms

1. (Non-negativity) 𝑃(𝐴) ≥ 0, for every event 𝐴.

2. (Additivity) For any two disjoint events 𝐴


and 𝐵, 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵
In general, if 𝐴1 , 𝐴2 , … are disjoint
events, then
𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯

3. (Normalization) 𝑃(Ω) = 1.
Probabilistic model: summary
 An experiment produces exactly one out of several
possible outcomes.
 The sample space is the set of all possible outcomes.
 An event a subset of the sample space.
 The probability law assigns to any event 𝐴 a number
𝑃 𝐴 ≥ 0.
Discrete Model

 In many cases, the sample space is


discrete, and actually finite.
 Then the probability law is specified by the
probabilities of the events that consist of a
single element.
 It holds that for any event 𝐴 = {𝑎1 , … , 𝑎𝑛 },
𝑃 𝐴 = 𝑃 𝑎1 + ⋯ + 𝑃 𝑎𝑛 .
 When the probability law is uniform, then
𝑃 𝐴 = |𝐴|/|Ω|.
Discrete model

 Example: toss a coin three times.


 The sample space is
 Ω = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇 .
 Assume that each possible outcome has the
same probability of 1/8.
 Consider event
𝐴 = 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 2 ℎ𝑒𝑎𝑑𝑠 𝑜𝑐𝑐𝑢𝑟 = 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻 .
 𝑃 𝐴 = 𝑃 𝐻𝐻𝑇 + 𝑃 𝐻𝑇𝐻 + 𝑃 𝑇𝐻𝐻 = 3/8.
Continuous Model

 Sample space can also be infinite, and


continuous.

 Caution: For continuous sample spaces,


the probabilities of the single-element
events may not be sufficient to
characterize the probability law.
Continuous Model

 Consider Ω = 0,1 .
 Any number in the interval is a possible outcome.
 Assume uniform distribution: all outcomes
happen equally likely.
 Then what’s the probability of “½” as an
outcome?
 What if you replace ½ with any of your
favorite numbers?
Continuous Model

 Suppose the probability of a single element is 𝜀 > 0.


 No matter how small 𝜀 is, there is an integer 𝑛 > 0,
such that 1/𝑛 < 𝜀.
 Consider disjoint events 𝐴𝑘 = 𝑘/𝑛 for 𝑘 = 1,2, … , 𝑛.
 By additivity axiom
𝑃 Ω ≥ 𝑃 𝐴1 + 𝑃 𝐴2 +∙∙∙ 𝑃 𝐴𝑛 = 𝑛𝜀 > 1,
violating the rule that 𝑃 Ω = 1.
 Conclusion: 𝑃(𝑎) = 0 for any outcome 𝑎 ∈ 0,1 .
 So … what to do?
Continuous Model

 A natural candidate: Define the probability on


any subinterval 𝑎, 𝑏 ⊆ 0,1 to be
𝑃 𝑎, 𝑏 = 𝑏 − 𝑎
 Probability = “the length of the interval.”
 And for disjoint union of intervals,
𝐴 = 𝑎1 , 𝑏1 ∪ 𝑎2 , 𝑏2 ∪ ⋯ ∪ 𝑎𝑘 , 𝑏𝑘 ∪ ⋯ ,
define its probability by 𝑃 𝐴 = 𝑖=1,2,…,(𝑏𝑖 − 𝑎𝑖 )
 Verify that all three axioms are satisfied.
Example: Meeting

 Romeo and Juliet have a date.


 Each will arrive at the meeting place with a delay
between 0 and 1 hour, with all pairs of delays
being equally likely.
 The first to arrive will wait for 15 minutes and will
leave if the other has not yet arrived.
 Question: What is the probability that they will
meet?
Example: Meeting

 Sample space: the unit


square 0,1 × 0,1 ,
 Its elements are the
possible pairs of delays.
 “equally likely” pairs of
delays: let 𝑃 𝐴 for event
𝐴 ⊆ Ω be equal to 𝐴’s
“area”.
 This satisfies the axioms.
Example: Meeting

 The event that Romeo


and Juliet will meet is the
shaded region.
 Its probability is
calculated to be 7/16.
 = 1 − the area of the two
unshaded triangles
3 3
=1−2⋅ · /2
4 4
= 7/16. 𝑀 = 𝑥, 𝑦 :
1
𝑥 − 𝑦 ≤ , 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1
4
Properties of Probability Laws

 Consider a probability law, and let 𝐴, 𝐵,


and 𝐶 be events.
1. If 𝐴 ⊆ 𝐵, then 𝑃 𝐴 ≤ 𝑃 𝐵 .
2. 𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵 .
3. 𝑃(𝐴 ∪ 𝐵) ≤ 𝑃 𝐴 + 𝑃 𝐵 .
4. 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐴𝑐 ∩ 𝐵) +
𝑃(𝐴𝑐 ∩ 𝐵𝑐 ∩ 𝐶).
 𝐴𝑐 is the complement of 𝐴.
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Partial information

 Conditional probability provides us with a way to


reason about the outcome of an experiment, based
on partial information.
 Example: In an experiment involving two successive
rolls of a die, you are told that the sum of the two
rolls is 9. How likely is it that the first roll was a 6?
 Example: How likely is it that a person has a certain
disease given that a medical test was negative?
 Example: A spot shows up on a radar screen. How
likely is it to correspond to an aircraft?
Conditional Probability

 In previous examples, we know that the


outcome is within some given event 𝐵.
 We wish to quantify the likelihood that the
outcome also belongs to some other event 𝐴.
 We seek to construct a new probability law
that takes into account the available
knowledge:
 a probability law that specifies the conditional
probability of 𝐴 given 𝐵.
Conditional Probability

 Definition. Conditional probability of 𝐴


given 𝐵 is
𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = ,
𝑃 𝐵
where we assume that 𝑃(𝐵) > 0.
 If 𝑃(𝐵) = 0: then 𝑃 𝐴 𝐵 is undefined.
 Fact. 𝑃 𝐴 𝐵 form a legitimate probability
law satisfying the three axioms.
Verification

1. Nonnegativity:
𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = ≥ 0.
𝑃 𝐵
2. Normalization:
𝑃 𝛺∩𝐵 𝑃(𝐵)
𝑃 𝛺𝐵 = = = 1.
𝑃 𝐵 𝑃(𝐵)
3. Additivity: For two disjoint events 𝐴1 and 𝐴2 ,
see the next slide.
 The argument for a countable collection of
disjoint sets is similar.
𝑃 𝐴1 ∪ 𝐴2 ∩ 𝐵
𝑃 𝐴1 ∪ 𝐴2 𝐵 =
𝑃 𝐵
𝑃 𝐴1 ∩ 𝐵 ∪ 𝐴2 ∩ 𝐵
=
𝑃 𝐵
𝑃 𝐴1 ∩ 𝐵 + 𝑃 𝐴2 ∩ 𝐵
=
𝑃(𝐵)
𝑃 𝐴1 ∩ 𝐵 𝑃 𝐴2 ∩ 𝐵
= +
𝑃(𝐵) 𝑃(𝐵)
= 𝑃 𝐴1 |𝐵 + 𝑃 𝐴2 |𝐵
Conditional probability: uniform case

 If the possible outcomes are finitely many


and equally likely, then
𝐴∩𝐵
𝑃 𝐴𝐵 = .
𝐵
 Example 1. Toss a fair coin three times.
 Question: What is the conditional
probability 𝑃(𝐴|𝐵) when 𝐴 and 𝐵 are:
 𝐴 = {more heads than tails come up}
 𝐵 = {1𝑠𝑡 toss is a head}
Conditional Probability: Example 1

 Sample space:
𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇,
Ω=
𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇.

 Event 𝑩 = {1𝑠𝑡 toss is a head}:


𝐵 = 𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇

 The probability of 𝑩:
𝑃 𝐵 = 4/8 = 1/2.
Conditional Probability: Example 1

 Event 𝑨 ∩ 𝑩:
𝐴 ∩ 𝐵 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻}

 The probability of 𝑨 ∩ 𝑩:
𝑃 𝐴 ∩ 𝐵 = 3/8.

 The conditional probability:


𝑃 𝐴∩𝐵 3/8 3
𝑃 𝐴𝐵 = = = .
𝑃 𝐵 4/8 4
Conditional Probability: Example 2

 Roll a fair 4-sided die twice


 𝑋 = 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑓 1𝑠𝑡 𝑟𝑜𝑙𝑙
 𝑌 = 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑓 2𝑛𝑑 𝑟𝑜𝑙𝑙

 The events 𝐴, 𝐵
 𝐴 = {max(𝑋, 𝑌) = 𝑚} 𝑚 = 1,2,3,4
 𝐵 = {min(𝑋, 𝑌) = 2}

 Question: What is the conditional probability


𝑃(𝐴|𝐵)?
Conditional Probability: Example 2

 Counting the number of


elements of 𝐴 ∩ 𝐵 and 𝐵
 𝐴 = max 𝑋, 𝑌 = 𝑚
 𝐵 = min 𝑋, 𝑌 = 2

2/5, if 𝑚 = 3 or 𝑚 = 4
 𝑃 𝐴 𝐵 = 1/5, if 𝑚 = 2
0, if 𝑚 = 1
Conditional Probability: Example 3

 Two teams 𝑁 and 𝐶 design a product within a


month.
 𝑃(𝐶 is successful) = 2/3
 𝑃(𝑁 is successful) = 1/2
 𝑃(at least one team is successful) = 3/4
 Question: Assuming that exactly one successful
design is produced, what is the probability that it
was designed by team 𝑁?
Conditional Probability: Example 3

 4 possible outcomes:
SS: both succeed FF: both fail
SF: 𝐶 succeeds, 𝑁 fails FS: 𝐶 fails, 𝑁 succeeds

 We know that
𝑃(𝑆𝑆) + 𝑃(𝑆𝐹) = 2/3
𝑃(𝑆𝑆) + 𝑃(𝐹𝑆) = 1/2
𝑃(𝑆𝑆) + 𝑃(𝑆𝐹) + 𝑃(𝐹𝑆) = 3/4
And the normalization equation
𝑃(𝑆𝑆) + 𝑃(𝑆𝐹) + 𝑃(𝐹𝑆) + 𝑃(𝐹𝐹) = 1
Conditional Probability: Example 3

 Solving the system of equations, we can obtain


the probabilities of individual outcomes:
5 1
𝑃(𝑆𝑆) = 𝑃(𝑆𝐹) =
12 4
1 1
𝑃(𝐹𝑆) = 𝑃(𝐹𝐹) =
12 4
 The desired conditional probability is
1
12 1
𝑃 𝐹𝑆 𝑆𝐹, 𝐹𝑆 = =
1 1 4
+
4 12
Multiplication Rule
 Fact. Assuming that all of the conditioning events
𝐴1 , 𝐴1 ∩ 𝐴2 , … have positive probability, we have
𝑛

𝑃 𝐴𝑖 = 𝑃 𝐴1
𝑖=1
⋅ 𝑃 𝐴2 𝐴1
⋅ 𝑃 𝐴3 𝐴1 ∩ 𝐴2

𝑛−1
⋅ 𝑃 𝐴𝑛 𝐴𝑖
𝑖=1
Multiplication Rule: Example 1

 3 cards drawn from 52-card deck without


replacement.
 drawn cards are not placed back in the deck.
 Question: What’s the probability that none of
the three cards is a heart?
 One approach: count the number of card
triplets that do not include a heart, and divide
it with the number of all possible card triplets.
 Cumbersome.
Multiplication Rule: Example 1

 Another approach uses multiplication rule.


 𝐴𝑖 = the 𝑖th card is not a heart , 𝑖 = 1,2,3.
 multiplication rule:
𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ) = 𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 )𝑃(𝐴3 |𝐴1 ∩ 𝐴2 )
 Since there are 39 cards that are not
hearts,
39
𝑃(𝐴1 ) =
52
Multiplication Rule: Example 1

 Given that the first card is not a heart, we are left


with 51 cards, 38 of which are not hearts:
38
𝑃 𝐴2 𝐴1 = .
51
 Finally, given that the first two cards drawn are
not hearts. there are 37 cards which are not
hearts in the remaining 50 cards:
37
𝑃 𝐴3 𝐴1 ∩ 𝐴2 = .
50
39 38 37
 Thus 𝑃 𝐴1 ∩ 𝐴2 ∩ 𝐴3 = ∙ ∙ ≈ 0.41.
52 51 50
Multiplication Rule: Example 2

 4 graduate and 12 undergraduate students


are randomly divided into 4 groups of 4.
 “randomly”: given assignment of some students to
certain slots, any of the remaining students is
equally likely to be assigned to any of the
remaining slots.
 Question: What is the probability that each
group includes a graduate student?
Multiplication Rule: Example 2

 Denote the four graduate students by 1, 2, 3, 4


 Define events
 𝐴1 = {students 1 and 2 are in different groups},
 𝐴2 = {students 1, 2 and 3 are in different groups},
 𝐴3 = {students 1, 2, 3 and 4 are in different groups}.
 We will use multiplication rule
𝑃 𝐴3 = 𝑃 𝐴1 ∩ 𝐴2 ∩ 𝐴3 = 𝑃 𝐴1 𝑃 𝐴2 𝐴1 𝑃 𝐴3 𝐴1 ∩ 𝐴2
 𝑃 𝐴1 = 12/15, 𝑃 𝐴2 𝐴1 = 8/14,
𝑃 𝐴3 𝐴1 ∩ 𝐴2 = 4/13.
12 8 4
 So 𝑃 𝐴3 = ∙ ∙ ≈ 0.14.
15 14 13
The Monty Hall Problem

 A prize is randomly
put behind one of the
three closed doors.
 You point to one door.
 A friend opens one of the remaining two
doors, after making sure that the prize is not
behind it.
 Question: Should you stick to your initial
choice, or switch to the other unopened
door?
The Monty Hall Problem

 If sticking to the initial choice: the initial


choice determines whether you win or not.
 Thus the winning probability is 1/3.
 If switching to the other unopened door:
 Case 1: prize is behind the initial door, which
happens with probability 1/3. You don’t win.
 Case 2: prize is not behind the initial door, which
happens with probability 2/3. You win for sure.
 So you should switch.
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Total Probability Theorem

 Let 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 be disjoint events that form a


partition of the sample space. Assume 𝑃(𝐴𝑖 ) > 0 for
all 𝑖. Then, for any event 𝐵, we have
𝑃 𝐵 = 𝑃 𝐴1 ∩ 𝐵 + ⋯ + 𝑃 𝐴𝑛 ∩ 𝐵
= 𝑃 𝐴1 𝑃 𝐵 𝐴1 + ⋯ + 𝑃 𝐴𝑛 𝑃 𝐵 𝐴𝑛
 Indeed, 𝐵 is the the disjoint union of 𝐴1 ∩ 𝐵 , …,
𝐴𝑛 ∩ 𝐵 .
 The second equality is given by
𝑃 𝐴𝑖 ∩ 𝐵 = 𝑃 𝐴𝑖 𝑃 𝐵 𝐴𝑖 .
Example: chess tournament

 Three types of players.


 Type 1: 50%
 Type 2: 25%
 Type 3: 25%
 You winning probability with these players:
 Against type 1: 0.3.
 Against type 2: 0.4.
 Against type 3: 0.5.
 Now you play a game with a randomly chosen
player.
 Question: What’s your winning probability?
Example: chess tournament

 𝐴𝑖 : playing with an opponent of type 𝑖


 𝑃 𝐴1 = 0.5, 𝑃 𝐴2 = 0.25, 𝑃(𝐴3 ) = 0.25.
 𝐵: winning
 𝑃 𝐵 𝐴1 = 0.3, 𝑃 𝐵 𝐴2 = 0.4, 𝑃(𝐵|𝐴3 ) = 0.5

 The probability of 𝐵:
𝑃 𝐵 = 𝑃 𝐴1 𝑃 𝐵 𝐴1 + 𝑃 𝐴2 𝑃 𝐵 𝐴2 + 𝑃(𝐴3 )𝑃(𝐵|𝐴3 )
= 0.50 × 0.3 + 0.25 × 0.4 + 0.25 × 0.5
= 0.375
Example: Four-Sided Die

 Roll a fair 4-sided die.


 Rule: Roll once more if result is 1 or 2, otherwise
stop

 Question: What is the probability that the sum


total of your rolls is at least 4?
Example: Four-Sided Die

 𝐴𝑖 : the result of first roll is 𝑖


𝑃 𝐴𝑖 = 1/4, ∀𝑖 = 1,2,3,4.
 𝐵: the sum total is at least 4.
 𝑃 𝐵 = 4𝑖=1 𝑃 𝐴𝑖 𝑃 𝐵 𝐴𝑖 . Let’s calculate each 𝑃 𝐵 𝐴𝑖 .
 Given 𝐴1 : the sum total will be ≥ 4 if the second roll
results in 3 or 4, which happens with probability 1/2.
1
 Thus 𝑃(𝐵|𝐴1 ) = ,
2
3
 Similarly 𝑃(𝐵|𝐴2 ) =
4
.
 Given 𝐴2 , the sum total will be ≥ 4 if the second roll results in 2,
3, or 4, which happens with probability 3/4.
 Given 𝐴3 : you stop and the sum total remains
below 4.
 Thus 𝑃(𝐵|𝐴3 ) = 0,
 Given 𝐴4 : you stop but the sum total is already
4.
 Thus 𝑃(𝐵|𝐴4 ) = 1.
 By the total probability theorem
1 1 1 3 1 1 9
𝑃 𝐵 = ∙ + ∙ + ∙0+ ∙1=
4 2 4 4 4 4 16
Example: Up-to-date or Behind

 Alice is taking a probability class. At the end of each


week,
 she can be either up-to-date
 or she may have fallen behind
 If she is up-to-date in week 𝑖, the probability that she
will be up-to-date (or behind) in week 𝑖 + 1 is 0.8 (or
0.2, respectively).
 If she is behind in a given week, the probability that
she will be up-to-date (or behind) in the next week is
0.4 (or 0.6, respectively).
 Alice is up-to-date when she starts the class.
Example: Up-to-date or Behind

 Question: What is the probability that she is


up-to-date after three weeks?
 𝑈𝑖 : Alice is up-to-date after 𝑖 weeks
 𝐵𝑖 : Alice is behind after 𝑖 weeks
 Previous slide:
𝑃 𝑈𝑖+1 𝑈𝑖 = 0.8, 𝑃 𝑈𝑖+1 𝐵𝑖 = 0.4, 𝑃(𝑈0 ) = 1

 Question (rephrased): What is the


probability of 𝑈3 ?
Example: Up-to-date or Behind

 By total probability theorem


𝑃 𝑈3 = 𝑃 𝑈2 𝑃 𝑈3 𝑈2 + 𝑃 𝐵2 𝑃 𝑈3 𝐵2
= 𝑃 𝑈2 ⋅ 0.8 + 𝑃 𝐵2 ⋅ 0.4
 Similarly
𝑃 𝑈2 = 𝑃 𝑈1 ⋅ 0.8 + 𝑃 𝐵1 ⋅ 0.4 = 0.72
𝑃 𝐵2 = 𝑃 𝑈1 ⋅ 0.2 + 𝑃 𝐵1 ⋅ 0.6 = 0.28
 Since Alice starts her class up-to-date, we have
𝑃 𝑈1 = 0.8, 𝑃 𝐵1 = 0.2.
 The probability of 𝑈3
𝑃 𝑈3 = 0.72 ⋅ 0.8 + 0.28 ⋅ 0.4 = 0.688
Bayes’ Rule

 Let 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 be disjoint events that form a


partition of the sample space, and assume that
𝑃(𝐴𝑖 ) > 0, for all 𝑖.
 Then, for any event 𝐵 with 𝑃(𝐵) > 0, we have
𝑃 𝐴𝑖 ∩𝐵
𝑃 𝐴𝑖 𝐵 =
𝑃 𝐵
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
=
𝑃(𝐵)
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
=
𝑃(𝐴1 )𝑃(𝐵|𝐴1 )+⋯+𝑃(𝐴𝑛 )𝑃(𝐵|𝐴𝑛 )
Inference using Bayes’ rule

 Bayes’ rule is often used for inference.


 There are a number of causes that may result in
a certain effect.
 We observe the effect and we wish to infer the
cause.
 Causes: 𝐴1 , … , 𝐴𝑛
 Effects: event 𝐵
 𝑃(𝐵|𝐴𝑖 ): suppose known
 𝑃(𝐴𝑖 |𝐵): Posterior probability
 𝑃 𝐴𝑖 : Prior probability
Example: Chess Revisited

 Three types of players.


 Type 1: 50%
 Type 2: 25%
 Type 3: 25%
 You winning probability with these players:
 Against type 1: 0.3.
 Against type 2: 0.4.
 Against type 3: 0.5.
 Question: Suppose that you win. What is the
probability that you had an opponent of type 1?
Example: Chess Revisited

 𝐴𝑖 : getting an opponent of type 𝑖


 𝑃 𝐴1 = 0.5, 𝑃 𝐴2 = 0.25, 𝑃(𝐴3 ) = 0.25.
 𝐵: the event of winning
 𝑃 𝐵 𝐴1 = 0.3, 𝑃 𝐵 𝐴2 = 0.4, 𝑃(𝐵|𝐴3 ) = 0.5
 By Bayes’ rule:
𝑃(𝐴1 )𝑃(𝐵|𝐴1 )
𝑃(𝐴1 |𝐵) =
𝑃 𝐴1 𝑃(𝐵|𝐴1 )+𝑃 𝐴2 𝑃(𝐵|𝐴2 )+𝑃(𝐴3 )𝑃(𝐵|𝐴3 )
0.5∙0.3
=
0.3∙0.5+0.25∙0.4+0.25∙0.5
= 0.4
Example: Diagnosis

 A random person drawn from a certain


population has probability 0.001 of having a
certain disease.
 The test satisfies
 Pr[test positive | disease] = 0.95
 Pr test negative no disease] = 0.95
 Question: Given that the person just tested
positive, what is the probability of having the
disease?
Example: Diagnosis

 𝐴: person has the disease


 𝐵: test result is positive
𝑃(𝐴)𝑃(𝐵|𝐴)
𝑃(𝐴|𝐵) =
𝑃 𝐴 𝑃(𝐵|𝐴) + 𝑃 𝐴𝑐 𝑃(𝐵|𝐴𝑐 )
0.001∙0.95
=
0.001∙0.95+0.999∙0.05
= 0.0187
 Much smaller than 95%!
 The Economist (February 20th, 1999): 80% of those questioned at a
leading American hospital substantially missed the correct answer to
a question of this type; most of them thought that the probability that
the person has the disease is 0.95!
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Independence

 Consider two events 𝐴 and 𝐵.


 𝐴 and 𝐵 are independent: 𝐵 provides no
information of 𝐴.
𝑃(𝐴|𝐵) = 𝑃(𝐴)
 Equivalently:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
𝑃 𝐴∩𝐵
 Why equivalent? 𝑃 𝐴 𝐵 = .
𝑃 𝐵
Example: Dice rolling

 Consider the experiment of rolling a fair 4-


sided dice twice.

 Question: Are the following events


independent?
 𝐴𝑖 = 1𝑠𝑡 roll results in 𝑖
 𝐵𝑗 = 2𝑛𝑑 roll results in 𝑗
Example: Dice rolling

 The probability of 𝐴𝑖 ∩ 𝐵𝑗 :
1
𝑃 𝐴𝑖 ∩ 𝐵𝑗 = 𝑃 the outcome of the two rolls is(𝑖, 𝑗) =
16
 The probability of 𝐴𝑖 :
number of elements of 𝐴𝑖 4
𝑃 𝐴𝑖 = =
total number of possible outcomes 16
 The probability of 𝐵𝑗 :
number of elements of 𝐵𝑗 4
𝑃 𝐵𝑗 = =
total number of possible outcomes 16
 Check the independence condition
𝑃 𝐴𝑖 ∩ 𝐵𝑗 = 𝑃 𝐴𝑖 𝑃(𝐵𝑗 )
 It holds, so the two events are independent.
Example: Dice rolling

 Question: Are the following events


independent?
 𝐴 = 1𝑠𝑡 roll is 1
 𝐵 = {sum of the two rolls is a 5}
 The probability of 𝐴 ∩ 𝐵:
1
𝑃 𝐴 ∩ 𝐵 = 𝑃 1𝑠𝑡 roll is 1, 2𝑛𝑑 roll is 4 =
16
Example: Dice rolling

 The probability of 𝐴
number of elements of 𝐴 4
𝑃 𝐴 = =
total number of possible outcomes 16

 The probability of 𝐵
number of elements of 𝐵 4
𝑃 𝐵 = =
total number of possible outcomes 16

 Check the independence condition


𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)
 It holds, so the two events are independent.
Example: Dice rolling

 Question: Are the following events


independent?
 𝐴 = maximum of the two rolls is 2
 𝐵 = {minimum of the two rolls is 2}
 The probability of 𝐴 ∩ 𝐵:
𝑃 𝐴 ∩ 𝐵 = 𝑃 the result of the two rolls is (2,2)
1
=
16
Example: Dice rolling
 The probability of 𝐴
number of elements of 𝐴
𝑃 𝐴 =
total number of possible outcomes
1,2 , 2,1 , 2,2 3
= =
16 16
 The probability of 𝐵
number of elements of 𝐵
𝑃 𝐵 =
total number of possible outcomes
2,2 , 2,3 , 2,4 , 3,2 , 4,2 5
= =
16 16
 Check the independence condition and find
𝑃 𝐴∩𝐵 ≠𝑃 𝐴 𝑃 𝐵
 Thus the two events are not independent. They are dependent.
Conditional independence

 Given an event 𝐶, the events 𝐴 and 𝐵 are


conditionally independent if
𝑃 𝐴 ∩ 𝐵 𝐶) = 𝑃 𝐴 𝐶 ⋅ 𝑃 𝐵 𝐶
 An equivalent formula is
𝑃 𝐴 𝐵∩𝐶 =𝑃 𝐴 𝐶
 The equivalence is because
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) 𝑃 𝐶 𝑃 𝐵 𝐶 𝑃(𝐴|𝐵 ∩ 𝐶)
𝑃 𝐴 ∩ 𝐵 𝐶) = =
𝑃(𝐶) 𝑃(𝐶)
=𝑃 𝐵 𝐶 𝑃 𝐴 𝐵∩𝐶
Conditional independence: Example 1

 Consider two independent fair coin tosses


 𝐴 = 1st toss is a head
 𝐵 = 2nd toss is a head
 𝐷 = the two tosses have different results
 Events 𝐴 and 𝐵 are independent, but
1 1
𝑃 𝐴𝐷 = , 𝑃 𝐵𝐷 = , 𝑃 𝐴 ∩ 𝐵 𝐷 = 0.
2 2
 Events 𝐴 and 𝐵 are not conditionally
independent.
Conditional independence: Example 2

 Two biased coins, a blue one and a red one.


Choose each with probability 1/2.
 Blue coin: 𝑃 𝐻 = 0.99
 Red coin: 𝑃 𝐻 = 0.01
 Consider the events
 𝐴 = 1st toss results in head
 𝐵 = 2nd toss results in head
 𝐷 = the blue coin is selected
Conditional independence: Example 2

 No matter which coin is chosen, the two


tosses are independent.
 Namely, conditioned on 𝐷, 𝐴 and 𝐵 are
independent.
 The probability of 𝐴 ∩ 𝐵 conditioned on 𝐷:
𝑃 𝐴 ∩ 𝐵 𝐷 = 𝑃 𝐴 𝐷 𝑃 𝐵 𝐷 = 0.99 × 0.99
Conditional independence: Example 2

 The probability of 𝐴:
𝑃 𝐴 = 𝑃 𝐷 𝑃 𝐴 𝐷 + 𝑃 𝐷𝑐 𝑃 𝐴 𝐷𝑐 = 1/2
 Similarly, we have 𝑃 𝐵 = 1/2.
 Check the independence condition
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐷 𝑃 𝐴 ∩ 𝐵 𝐷 + 𝑃 𝐷𝑐 𝑃 𝐴 ∩ 𝐵 𝐷𝑐
1 1 1
= × 0.99 × 0.99 + × 0.01 × 0.01 ≅
2 2 2
≠𝑃 𝐴 𝑃 𝐵
 Thus without the condition, 𝐴 and 𝐵 are dependent.
Independence of many events

 We say that the events 𝐴1 , 𝐴2 , … , 𝐴𝑛 are


independent if for every subset 𝑆 of
1,2, … , 𝑛 ,

𝑃 𝐴𝑖 = 𝑃 𝐴𝑖
𝑖∈𝑆 𝑖∈𝑆
 Note, pairwise independence does not imply
independence.
Content

 Sets.
 Probabilistic models.
 Conditional probability.
 Total Probability Theorem and Bayes’ Rule.
 Independence.
 Counting.
Counting

 The calculation of probabilities often involves


counting the number of outcomes in various
events.
 Uniform distribution over finite sample space:
𝐴
𝑃 𝐴 =
Ω
 An event A with a finite number of equally likely
outcomes, each of which has probability 𝑝:
𝑃 𝐴 =𝑝⋅ 𝐴 .
Combinatorics

 The art of counting constitutes a large portion


of the field of combinatorics.

 Next:
 present the basic principle of counting
 apply it to a number of situations that are often
encountered in probabilistic models.
2 stages

 Consider an experiment that consists of two


consecutive stages.
 The possible results at the first stage are
𝑎1 , 𝑎2 , … , 𝑎𝑚 .
 The possible results at the second stage are
𝑏1 , 𝑏2 , … , 𝑏𝑛 .
 Then the possible results of the two-stage
experiment are all possible ordered pairs
𝑎𝑖 , 𝑏𝑗 , 𝑖 = 1, … , 𝑚, 𝑗 = 1, … , 𝑛.
 The number of such ordered pairs: 𝑚𝑛.
Multiple stages

 And this easily extends


to multiple stages.
 Suppose r stages
 There are 𝑛1 possible results at the first stage.
 For every possible result at the first stage, there are 𝑛2
possible results at the second stage.
 More generally, for any sequence of possible results
at the first 𝑖 − 1 stages, there are 𝑛𝑖 possible results at
the 𝑖th stage.
 Then the total number of possible results of the
𝑟-stage process is 𝑛1 𝑛2 … 𝑛𝑟 .
Example: Tel numbers

 A local telephone number is a 8-digit


sequence, but the first digit has to be different
from 0 or 1.
 Question: How many distinct telephone
numbers are there?
 We have a total of 8 stages,
 the first stage we only have 8 choices.
 For the rest stages we have a 10 choices
 Therefore, the answer is
8 × 107
Example: number of subsets

 Consider an n-element set 𝑠1 , … , 𝑠𝑛


 Question: How many subsets does it have?
 including itself and the empty set
 We can visualize the choice of a subset as a
sequential process
 examine one element at a time and decide whether to
include it in the set or not.
 A total of 𝑛 stages, and a binary choice at each
stage.
 Therefore the number of subsets is 2𝑛 .
𝑘-permutations

 We start with 𝑛 distinct objects, and let 𝑘 be


some positive integer, with 𝑘 ≤ 𝑛.

 We wish to count the number of different


ways that we can pick 𝑘 out of these 𝑛
objects and arrange them in a sequence,
 i.e., the number of distinct 𝑘-object sequences.
 We can choose any of the 𝑛 objects to be the
first one.
 Having chosen the first, there are only 𝑛 − 1
possible choices for the second.
 Given the choice of the first two, there only
remain 𝑛 − 2 available objects for the third
stage, etc.
 When we are ready to select the last (the 𝑘th)
object, we have already chosen 𝑘 − 1 objects,
which leaves us with 𝑛 − 𝑘 − 1 choices for the
last one.
 The number of possible sequences, called 𝑘-
permutations, is
𝑛!
𝑛 𝑛−1 … 𝑛−𝑘+1 =
𝑛−𝑘 !
 In the case of 𝑘 = 𝑛, the number of possible
sequences, called permutations, is
𝑛 𝑛 − 1 … 𝑛 − 𝑘 + 1 = 𝑛!
 Convention: 0! = 1.
 Question: What’s the number of words that
consist of four distinct letters?
 This is the problem of counting the number of
4-permutations of the 26 letters in the
alphabet.
 The number is
26!
= 26 × 25 × 24 × 23 = 358,800
22!
Combination

 There are 𝑛 people and we are interested in


forming a committee of 𝑘. How many different
committees are possible?
 More abstractly, this is the same as the
problem of counting the number of 𝑘-element
subsets of a given 𝑛-element set.
 Forming a combination is different than
forming a 𝑘-permutation, because in a
combination there is no ordering of the
selected elements.
 For example, whereas the 2-permutations of
the letters A, B, C, and D are

AB, BA, AC, CA, AD, DA, BC, CB, BD, DB,
CD, DC,

 The combinations of two out of these four


letters are
AB, AC, AD, BC, BD, CD.
 In the preceding example, the combinations
are obtained from the permutations by
grouping together "duplicates“.
 For example, AB and BA are not viewed as
distinct, and are both associated with the
combination AB.
 In general, each combination is associated
with 𝑘! “duplicate” 𝑘-permutations, so the
number 𝑛!/ 𝑛 — 𝑘 ! of 𝑘-permutations = the
number of combinations times 𝑘!.
 Hence, the number of possible combinations,
𝑛!
is equal to
𝑘! 𝑛−𝑘 !
 This is the same as the binomial coefficient
𝑛
𝑘
.
Example: an algebraic identity

 We have a group of 𝑛 persons.


 Consider clubs that consist of a special
person from the group (the club leader) and a
number (possibly zero) of additional club
members.
 Let us count the number of possible clubs of
this type in two different ways, thereby
obtaining an algebraic identity.
Method 1

 There are 𝑛 choices for club leader.


 Once the leader is chosen, we are left with a
set of 𝑛 − 1 available persons, and we are
free to choose any of the 2𝑛−1 subsets.
 Thus the number of possible clubs is 𝑛2𝑛−1 .
Method 2

 For fixed 𝑘, we can form a 𝑘-person club by first


selecting 𝑘 out of the 𝑛 available persons
𝑛
 There are 𝑘
choices.
 We can then select one of the members to be
the leader (there are 𝑘 choices).
 By adding over all possible club sizes 𝑘, we
obtain the number of possible clubs as
𝑛 𝑛
𝑘=1 𝑘 𝑘
.
𝑛 𝑛
 We thus showed the identity 𝑘=1 𝑘 𝑘
= 𝑛2𝑛−1 .
Partitions

 We are given an 𝑛-element set 𝑆 and


integers 𝑛1 , … , 𝑛𝑟 .
 𝑛𝑖 ≥ 0, ∀𝑖 ∈ 1, … , 𝑟
 𝑛1 + ⋯ + 𝑛𝑟 = 𝑛.
 Task: Partition the set 𝑆 into 𝑟 disjoint
subsets,
 with the 𝑖-th subset containing exactly 𝑛𝑖
elements.
 Question: How many ways can this be done?
 We form the subsets one at a time.
 We have 𝑛𝑛 ways of forming the first subset.
1
 Having formed the first subset, to form the second
subset,
 we are left with 𝑛 − 𝑛1 elements,
 and need to choose 𝑛2 of them.
 We have 𝑛−𝑛 𝑛2
1
choices.
 Similar treatment for the rest…
 Counting Principle: total number of choices is
𝑛 𝑛 − 𝑛1 𝑛 − 𝑛1 − 𝑛2 𝑛 − 𝑛1 − ⋯ − 𝑛𝑟−1

𝑛1 𝑛2 𝑛3 𝑛𝑟
Simplification

𝑛 𝑛−𝑛1 𝑛−𝑛1 −𝑛2 𝑛−𝑛1 −⋯−𝑛𝑟−1



𝑛1 𝑛2 𝑛3
… 𝑛𝑟
𝑛! 𝑛 − 𝑛1 !
= ⋅ ⋅…
𝑛1 ! 𝑛 − 𝑛1 ! 𝑛2 ! 𝑛 − 𝑛1 − 𝑛2 !
𝑛 − 𝑛1 − ⋯ − 𝑛𝑟−1 !

𝑛𝑟 ! 𝑛 − 𝑛1 − 𝑛2 − ⋯ − 𝑛𝑟 !
𝑛!
=
𝑛1 ! 𝑛2 ! … 𝑛𝑟 !
 This is the same as the multinomial coefficient
𝑛
𝑛 ,𝑛 ,…,𝑛
.
1 2 𝑟
Example: Anagrams

 Question: How many different words (letter


sequences) can be obtained by rearranging the
letters in the word TATTOO?
 There are 6 positions to be filled by the available
letters.
 Each rearrangement corresponds to a partition
of the set of the 6 positions into
 a group of size 3: the positions that get the letter T
 a group of size 1: the position that gets the letter A
 a group of size 2: the positions that get the letter O
 Thus, the desired number is
6!
= 60.
1! 2! 3!
Example: Students grouping (again)

 A class consisting of 4 graduate and 12


undergraduate students is randomly divided
into four groups of 4.
 “Randomly”: All partitions (into 4 groups of size 4)
occur equally likely.
 Question: What is the probability that each
group includes a graduate student?
 We’ve seen this before, but we’ll now obtain
the answer using a different argument.
 Sample space Ω: All partitions of the 16
students into 4 groups of size 4.
 The size of the sample space:
16 16!
Ω = =
4,4,4,4 4! 4! 4! 4!
 Consider the event of each group containing
a graduate student.
 Two steps: first allocate the graduate
students, and then the undergraduate ones.
Allocation of grads

 There are
 four choices for the group of the first graduate
student,
 three choices for the second,
 two for the third,
 one for the fourth.
 Thus, there is a total of 4! choices for this
step.
Allocation of under

 Take the remaining 12 undergraduate


students and distribute them to the four
groups
 3 students in each.
 This can be done in
12 12!
=
3,3,3,3 3! 3! 3! 3!
different ways.
 By the Counting Principle, the event of
interest can occur in
4! 12!
3! 3! 3! 3!
different ways.
 The probability of this event is thus
4! 12! 16! 12 ∗ 8 ∗ 4
/ =
3! 3! 3! 3! 4! 4! 4! 4! 15 ∗ 14 ∗ 13
same as previously calculated.

You might also like