0% found this document useful (0 votes)
5 views

MS3227week2

Uploaded by

build852
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

MS3227week2

Uploaded by

build852
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

MS3227 Probability and Applications in

Business
Week 2

Eman Leung, PhD., PMP, LSSBB, CPBI, IHI Improvement Advisor


Recall last week
Probability

Repeat an experiment many times under the same conditions,


count how many times a particular event occurs. The proportion
roughly gives you the probability of that event.

Objective and subjective probability (personal belief).

Sample space ⌦, event, operation of sets (intersect, union,


complement).

General rules P(A) = 1 P(AC )

2
Conditional probability

Terminology includes conditional on, given, take condition with


respect to .. some event.

P(A | B) = P(AP(B)
and B)
, calculate probability of A assume that B
always happen.

Essentially, shrink the set of all possible outcomes, the denominator


of probability was changed (numerator as well).

3
Conditional probability and multiplicative rule

Definition of conditional probability:

P(A and B)
P(A | B) =
P(B)

Multiplicative rule

P(A and B) = P(A | B) ⇥ P(B)

4
Example: Age and Rank of Faculty

Given a faculty is under 40 years old, what’s the probability that he / she
is a full professor?

Age Professor Associate Assistant Lecturer Total


< 40 54 173 220 23 470
40 - 49 156 125 61 6 348
50 - 59 145 68 36 4 253
> 60 75 15 3 0 93
Total 430 381 320 33 1164

54 430
P(professor |< 40) = 470 , P(professor) = 1164

Calculate conditional probability for a contingency table: note that the


denominator is row / column sum instead of the total sum.

5
Monty Hall
Example: The Monty Hall problem

You are on a game show. Three doors are in front of you.


Behind one door is a car, but behind the others, goats.

You pick a door (not open yet), say the first one.

The host who knows what behind doors, opens another door and it
is a goat. The host asks if you want to switch your choice?

What would you do? Switch or not?


6
Example: The Monty Hall problem

You might think: I know there is one car and one goat behind the two
doors. So the probability of getting a car is 12 no matter I switch or not.
It does not matter.

7
Example: The Monty Hall problem

Actually the numbers on the door (order of door) does not matter.

Think deeper: where does the randomness come from?

List all possible cases (remember, the sample space!)

2
If you switch, you will win in two out of the three cases! 3!

8
Example: The Monty Hall problem

Another explaination

9
Independence
Independence

We know P(A | B) = P(AP(B)


and B)
. In general it is not the same as
P(A). You get some information about A given B.

What if P(A | B) = P(A)?

Any evidence about B does not a↵ect our belief about A.

Example: two fair coins are tossed together, we say H1 and H2


represents the event that the first / second outcome is head
respectively.
What is P(H2 | H1 )?
P(H2 | H1 ) = P(HP(H
1 and H2 )
1 )
= 1/4
1/2
= 1
2
= P(H2 )
Knowing H1 does not give you additional information about H2 !

10
Independence

Two random processes (random variables, events) are independent if


knowing the outcome of one provides no useful information about the
outcome of the other.

Knowing that the coin landed on a head on the first toss does not
provide any useful information for determining what the coin will
land on in the second toss.
Outcomes of two tosses of a coin are independent.

Knowing that the first card drawn from a deck is an ace does
provide useful information for determining the probability of drawing
an ace in the second draw.
Outcomes of two draws from a deck of cards (w/o replacement) are
dependent.

11
Independence and Conditional Probabilities

In mathematical notation, if P(A | B) = P(A) then the events A and B


are said to be independent.

Conceptually: Giving B doesn’t tell us anything about A.

Equivalently, one can also check the independency of the events A and B
by check whether P(B | A) = P(B)

12
Practice – Checking for Independence

Survey in US asks ”Do you agree that the widespread gun ownership
can protect citizens”?
58% of all respondents say yes.
Look at di↵erent race: 67% white, 28% black and 64% hispanic
respondents say yes respectively.

Are opinion on gun ownership and race ethnicity independent?

P(protects citizens) = 0.58


P(protects citizens | White) = 0.67
P(protects citizens | Black) = 0.28
P(protects citizens | Hispanic) = 0.64
Probability of protects citizens varies given di↵erent races, therefore the
opinion on gun ownership and race are dependent.
13
Independence of events vs. independence of variables in a con-
tigency table

Recall that for a two-way contingency table, the two variables are
independent if the row proportions do not change from row to row.
This is consistent with the definition of independence of events since

row proportions = P(column variable | row variable) (1)

and if the row proportions do not change from row to row, they will be
equal to the marginal probability of the column variable.
For example of the faculty, P(rank | age) 6= P(rank), the age and rank of
faculty members are dependent.

14
Multiplication rule for independent events

When A and B are independent


P(A and B) = P(A) ⇥ P(B)

This is simply the general multiplication rule


P(A and B) = P(A) ⇥ P(B | A)
where P(B | A) = P(B) if A and B are independent.

More generally,
P(A1 and A2 · · · and Ak ) = P(A1 ) ⇥ · · · ⇥ P(Ak )
If A1 , · · · , Ak are independent.
Exercies: If you roll a die twice, what’s the probability of getting two 1 in
a row?
1 1 1
P(1 in the first roll) ⇥ P(1 in the second roll) = ⇥ =
6 6 36 15
Practice

Some times you need to infer independence from the statement of


the question.

Example: A recent Gallup poll suggests that 25.5% of Texans do not


have health insurance as of June 2012. Assuming that the uninsured
rate stayed constant, what is the probability that two randomly
selected Texans are both uninsured?

0.2552

16
The gambler

A gambler is rolling 4 fair dice. What’s the probability that there is at


least one 6 in 4 rolls?

Each roll is independent. Let Xi denote the event that there is no 6


in the i th roll.

P(at least 1 six in 4 rolls) = 1 P(no 6 in 4 rolls)


=1 P(X1 \ X2 \ X3 \ X4 )
=1 P(X1 )P(X2 )P(X3 )P(X4 )
✓ ◆4
5
=1 = 0.518
6

17
Put everything together

If we were to randomly select 5 Texans, what is the probability that at


least one is uninsured?
By the complement rule

P( at least 1 uninsured) = 1 P(none uninsured)


=1 P(everyone is insured)
=1 [(1 0.255)]5
=1 0.7455 = 0.77

Keep in mind that

P(at least one) = 1 P(none)

18
Abuse of the multiplication rule

Think twice before assuming events are independent.

Example: As estimated in 2012, of the U.S. population


13.4% were 65 or older
52% were male

True or false? 0.134 ⇥ 0.52 = 0.07 of the U.S. populartion were


males age 65 or older.

False, age and gender are dependent. In particular, women on


average live longer than men. There are more old women than old
men.

Among the age 65 or older, only 44% are male (this is the condition
probability!), not 52%. So only 0.134 ⇥ 0.44 = 0.059 were males age
65 or older.
19
Law of total probability
Law of total probability

Next we are going to look at ways of obtaining the probability of a


subset, using conditional probabilities.
Let A1 , · · · , An be a partition of ⌦, such that P(Ai ) > 0 for all Ai .
Partition means that the union of all Ai is ⌦ and they are mutually
exclusive to each other.

20
Law of total probability

Next we are going to look at ways of obtaining the probability of a


subset, using conditional probabilities.
Let A1 , · · · , An be a partition of ⌦, such that P(Ai ) > 0 for all Ai .
Partition means that the union of all Ai is ⌦ and they are mutually
exclusive to each other.
Let B be an event.

21
Law of total probability

Note that Ai also partitions B.


Therefore

P(B) = P(A1 \ B) + P(A2 \ B) + · · · + P(An \ B)

https://www.youtube.com/watch?v=PXDG5XKqutw
22
Law of total probability

From last page


P(B) = P(A1 \ B) + P(A2 \ B) + · · · + P(An \ B)
Use multiplication rule P(Ai \ B) = P(Ai )P(B | Ai )
Therefore we have
P(B) = P(A1 )P(B | A1 ) + · · · + P(An )P(B | An )
This is total probability theorem, or law of total probability.

23
Law of total probability

When do you use total probability theorem?

Divide and conquer! Or discuss case by case.

It’s hard to calculate P(B) directly, you enumerate all possible cases
of Ai and calculate P(B | Ai ) instead.

Ai must be a partition, otherwise the equation does not hold.

24
Example of Law of total probability

Take two cards from a well shu✏ed deck. What’s the probability that the
second card is an ace?

A1 and A2 denote the first / second card is an ace respectively.

Hard to calculate P(A2 ) directly. You need to discuss cases of the


first card.

If the first card is an ace / if it is not.

P(A2 ) = P(A2 | A1 )P(A1 ) + P(A2 | AC1 )P(AC1 )

Much easier to calculate!


3 4 4 48 4
P(A2 ) = ⇥ + ⇥ =
51 52 51 52 52

25
Example of Law of total probability

Insurance company calculates the probability of lung cancer among the


entire population.

Divide people to two groups: smoking (20%) and non-smoking


(80%)

Life time risk of lung cancer: 12% for smoking and 1.3% for
non-smoking people.

P(C ) = P(C | S)P(S) + P(C | S C )P(S C )


= 0.12 ⇥ 0.2 + 0.013 ⇥ 0.8 = 0.0344

26
Example of Law of total probability

There are three roads from your home to your office. Every morning you
pick a road at random, with probability P(L1 ) = 0.5, P(L2 ) = 0.3,
P(L3 ) = 0.2.
The probability of heavy traffic on each road is P(T1 ) = 0.8,
P(T2 ) = 0.6, P(T3 ) = 0.3. What’s the probability that you avoid heavy
traffic everyday?

27
Example of Law of total probability

Let T be the event you see heavy traffic.

P(T ) = P(T1 | L1 )P(L1 ) + P(T2 | L2 )P(L2 ) + P(T3 | L3 )P(L3 )

Notice that whether there is traffic on the road, is independent to which


road do you choose. P(T1 | L1 ) = P(T1 ), therefore

P(T ) = P(T1 )P(L1 ) + P(T2 )P(L2 ) + P(T3 )P(L3 )


= 0.5 ⇥ 0.8 + 0.3 ⇥ 0.6 + 0.2 ⇥ 0.3
= 0.64
P(T C ) = 1 P(T ) = 0.36

28
Tree diagrams and Bayes’
Theorem
Example – Nervous Job Applicant

Suppose an applicant for a job has been invited for an interview.


The probability that

he is nervous is P(N) = 0.7

the interview is successful given that he is nervous is P(S | N) = 0.2

the interview is successful given that he is not nervous is


P(S | N C ) = 0.9

What’s the probability that the interview is successful?

P(S) = P(S and N) + P(S and N C )


= P(N)P(S | N) + P(N C )P(S | N C )
= 0.7 ⇥ 0.2 + 0.3 ⇥ 0.9 = 0.41

29
Nervous job applicant

Conversely, given the interview is successful, what’s the probability that


the job applicant is nervous during the interview?

P(N and S) P(N and S)


P(N | S) = =
P(S) 0.41
P(N)P(S | N) 0.7 ⇥ 0.2
= = ⇡ 0.34
0.41 0.41
in which P(S) = 0.41 was found in the previous slide.

30
Tree diagram for the nervous job applicant

Much easier to solve with the tree diagram.

u c c eed ( and ) = 0.7 ⇥ 0.2


S
0.2
Nervous Fail
0.7 0.8 P(N and S C ) = 0.7 ⇥ 0.8

·
0.3
c c e ed P(N C and S) = 0.3 ⇥ 0.9
Su
Not 0.9
nervous Fail
0.1 P(N C and S C ) = 0.3 ⇥ 0.1
Since we are interested in P(N | S), the red two have S, and the bold
one has N.
0.7 ⇥ 0.2
P(N | S) =
0.7 ⇥ 0.2 + 0.3 ⇥ 0.9
Compare with previous two slides, you get identical results! 31
Bayes’ Theorem

The problem in the previous slide uses Bayes’ Theorem. Knowing


P(B | A), P(B | AC ), and P(A), how do you know P(A | B)? Notice
that you switch the position of event B and condition A.

P(A and B) P(A)P(B | A)


P(A | B) = =
P(B) P(B)
P(A)P(B | A)
=
P(B and A) + P(B and AC )
P(A)P(B | A)
=
P(A)P(B | Ad) + P(AC )P(B | AC )

This is very useful for inferring the hidden causes underlying our
observations. Law of total probability is used in the denominator.

32
Medical Testing

A common application of Bayes’ Theorem is in diagnostic testing

Let D denote the event that an individual has the disease.

Let T + denote the event that the test is positive.

Let T denote the event that the test is negative.

P(T + | D) is called sensitivity of the test.

P(T | D C ) is called specificity of the test.

Ideally, both P(T + | D) and P(T | D C ) would equal to 1.


However, the test may give false positives or false negatives.

33
Enzyme Immunoassay test for HIV

P(T + | D) = 0.98, positive for infected.


P(T | D C ) = 0.995, negative for non-infected
P(D) = 1/300

What’s the probabilitythat the tested person is infected if the test was
positive?

P(D)P(T + | D)
P(D | T +) =
P(D)P(T + | D) + P(D C )P(T + | D C )
1/300 ⇥ 0.98
=
1/300 ⇥ 0.98 + 299/300 ⇥ 0.005
= 0.394

The test is not confirmatory. Need to confirm by a second test.

34
Tree diagram for HIV test

e 1
s i t i v ( and +) = ⇥ 0.98
Po 300
0 . 98
Infected N e ga
1
ti ve
0 0 . 02
30 P(D and T ) = 1
⇥ 0.02
300

· 29
30 9
0
o s i t i ve P(D C and T +) = 299
300 ⇥ 0.005
P
Not 0 .005
Nega
infected ti ve
0.99 P(D C and T ) = 299
⇥ 0.995
5 300

Only the red cases are positive test results, out of them, the bold one is
D, have disease.
bold 1/300 ⇥ 0.98
P(D | T +) = =
red1 + red2 1/300 ⇥ 0.98 + 299/300 ⇥ 0.005

35
One more Bayes example..

In a sample of 100,000 emails, you found 550 are spam. Your next email
has the word “bigly”. From historical experience, you know that half of
all spam emails contains “bigly”, and only 2% non-spam email contains
that. What’s the probability that the new email is a spam?

36
One more Bayes example..

In a sample of 100,000 emails, you found 550 are spam. Your next email
has the word “bigly”. From historical experience, you know that half of
all spam emails contains “bigly”, and only 2% non-spam email contains
that. What’s the probability that the new email is a spam?
What information can you get from the statement?
550
P(spam) = 100000 = 0.0055
P(bigly | spam) = 0.5
P(bigly | non-spam) = 0.02

What else can you infer from those numbers?

P(non-spam) = 1 P(spam) = 0.9945

We have all ingredients for the Bayes theorem, right?

37
One more Bayes example..

bigly (spam and bigly) = 0.0055 ⇥ 0.5

spam 0.5
no b
5 i gl y
5
0 .00 0.5 P(spam and no bigly) = 0.0055 ⇥ 0.5

·
0. 9
94
5 bigly P(not spam and bigly) = 0.9945 ⇥ 0.02

not spam 0.02


no b
igy
0.98 P(not spam and no bigly) = 0.9945 ⇥ 0

Only the red cases have bigly, out of them, the bold one is spam, have
disease.
bold 0.0055 ⇥ 0.5
P(spam | bigly) = =
red1 + red2 0.0055 ⇥ 0.5 + 0.9945 ⇥ 0.02

38
When do you use Bayes’ theorem

You have a homework question asking for P(A | B).

Check what is given to you.

If you have P(A | B) or P(AC | B), then you are all set.

If you not, then you have to use Bayes’ theorem.

You need to know P(B | A), P(B | AC ) and P(A), draw a tree
diagram. If you don’t have these, either the question is incomplete,
or you should read it again carefully.

39
The most general Bayes’ theorem

Let A1 , A2 , · · · An be a partition of the sample space, and B be any


event. Then for each i = 1, · · · , n,

P(B | Ai )P(Ai )
P(Ai | B) =
P(B | A1 )P(A1 ) + · · · + P(B | An )P(An )

Let n = 2, we return to the cases in the previous slides.

40
Counting
Counting

Many basic probability problems are just counting.

Area becomes counts if the outcomes are discrete.


area of E
P(E ) =
area of ⌦

Example: there are 1 boy and 2 girls in the room. If you pick one
randomly, what’s the probability that this is a girl?
# success #girls 2
P= = =
# total possible cases #girls + #boys 3

We will see more complex examples later. But essentially, we need


to count the numerator and denominator.

41
Basic principle of counting

Suppose you do two experiments. The first experiment have n1


possible outcomes, and if each outcome of experiment 1, there are
n2 possible outcomes for experiment 2. Then there are n1 ⇥ n2
possible outcomes in total.

Example: A football tournament has 14 teams, each team has 11


players. If you select one player as the player of the year. How many
possible choices?

Selecting a team is the first experiment, then choose one player from
the team. So there are 14 ⇥ 11 = 154 possible choices in total.

42
Generalized principle of counting

If r experiments that are to be performed are such that the 1st one
may result in any of n1 possible outcomes; and if, for each of these
n1 possible outcomes, there are n2 possible outcomes of the 2nd
experiment; and if, for each of the n1 ⇥ n2 possible outcomes of the
first two experiments, there are n3 possible outcomes of the 3rd
experiment; and if..., then there is a total of n1 ⇥ n2 ⇥ · · · ⇥ nr
possible outcomes of the r experiments.

43
Generalized principle of counting

How many possible combinations for Hong Kong ID number? The


ID number begins 1 letter, then 7 numbers.

HKID number has the form Z 123456(7). So there are

24 ⇥ 10 ⇥ 10 ⇥ 10 ⇥ 10 ⇥ 10 ⇥ 10 ⇥ 10

possible cases.

44
Generalized principle of counting

If you flip a coin 20 times and keep record of the outcome. H for
head and T for tail. How many possible outcomes in total?

You will have a sequence (length 20) such as HHHTTHTH...

220

45
Permutations

Consider the acronym ABC, 3 letters. How many di↵erent ordered


arrangement of the letters A, B, C are possible?

We have (A, B, C), (A, C, B), (B, A, C), (B, C, A), (C, A, B) and
(C, B, A). 6 possible cases in total.

General rules. Suppose you have n objects. The number of


permutations of these n objects is

n ⇥ (n 1) ⇥ (n 2) ⇥ · · · ⇥ 3 ⇥ 2 ⇥ 1 := n!

n! is for notation simplicity, reads n factorial.

Remember the convention 0! = 1.

46
Permutations

A class ”Probability” consists of 20 boys and 30 girls. An


examination is given and students are ranked according to the
performance. Assume no two students or more obtain the same
grade.
How many rankings are possible?

We have 20 + 30 = 50 students, thus 50! possible rankings.

If boys are ranked among themselves, and girls are ranked among
themselves. How many possible rankings?

20! possible rankings for boys, and 30! possible rankings for girls.

47
Permutations: More Examples

Example: You have 8 textbooks that you want to order on your


bookshelf: 3 mathematics books, 3 physics books, 2 chemistry
books. You want to arrange them so that all the books dealing with
the same subject are together on the shelf. How many di↵erent
arrangements are possible?

First, you have 3! ordering of the subjects, MPC, MCP, PMC, PCM,
CMP, CPM.

Second, for each ordering of subject, say MPC . You have 3!


di↵erent ordering for math books, 3! ordering for physics and 2!
ordering for chemistry.

The total number of arrangements is


3! ⇥ (3! ⇥ 3! ⇥ 2!)
48
Combinations

Determine the number of di↵erent groups of r objects that could


be formed from a total of n objects. (How many ways to choose r
from n).

Example: How many di↵erent groups of 3 could be selected from


A,B,C,D and E?

There are 5 ways to select the 1st letter, 4 to select the 2nd and 3
to select the 3rd so 5 ⇥ 4 ⇥ 3 = 60 ways to select when the order in
which the items are selected is relevant. When it is not relevant,
then say the group BCE is the same as BEC, CEB, CBE, EBC, ECB;
there are 3! = 6 permutations. So when the order is irrelevant, we
have 60/6 = 10 di↵erent groups

49
Combinations

Select r objects from n objects.

When the order of selection is relevant, there are


n!
n(n 1) · · · (n r + 1) =
(n r )!

possible groups.

When the order of selection is irrelevant, there are


✓ ◆
n! n
:=
(n r )!r ! r

We call it binomial coefficients, reads ”choose r from n”.

50
Combinations: Examples

Question: Assume we have an horse race with 12 horses. What is


the possible number of combinations of 3 horses when the order
matters and when it does not?

12! 12!
When it matters (12 3)! , when it does not matter, (12 3)!3! .

Question: From a group of 5 women and 7 men. How many


di↵erent committee consisting of 2 women and 3 men can be form?

We have 52 = 10 possible women groups, and 7


3 = 35 men
groups. So 10 ⇥ 35 = 350 groups in total.

51
Combinations: Examples

Example: Assume we have a set of n antennas of which m are


defective. All the defectives and all the functionals are
indistiguishable. How many linear orderings are there in which no
two defectives are consecutives?

If two defectives antennas cannot be consecutive, then the space


among the n m functional antennas can contain at most one
defective antenna. There are n m + 1 positions possible and select
m of them to put the defective antennas, so that is
✓ ◆
n m+1
m

configurations which is obviously equal to zero if m > n m + 1.

52
Random variable
Random variable

So far, we talked about events and sample spaces.

However, for many experiments it is easier to use a summary


variable.

For example, we are taking an opinion poll among 50 students in


this class about how understandable the lectures are.

1 for understandable, 0 for not. The sample space has 250 elements!

However, the thing that matters most is the number of students


who think the class is understandable (or equivalently, not
understandable).

If we define a variable X to be that number, then it is a random


variable, takes range in {0, 1, · · · , 50}. Much easier to handle.

53
Random variable

Random variable is a function from the sample space ⌦ into the


real numbers.

Examples:
You toss a coin, is that head or tail?
You roll a die, what number do you get?
Number of heads in ten coin tosses.
Stock return of tomorrow.
Next month sales revenue for a grocery store.

54
Discrete and continuous random variable

A random variable is discrete if its range (all possible values) is


finite or infinite but countable.
X = sum of two rolls of a die, X 2 {2, 3, · · · , 12}
X = number of heads in 100 coin tosses, X 2 {0, 1, 2, · · · , 100}
X = number of coin tosses to get the first head, X 2 {1, 2, · · · }

A random variable is continuous if it has uncountably infinite


possible values. (Or just think it is (or a subset of) real numbers)
X = temperature of tomorrow in Hong Kong, X 2 [ 10, 50]
X = stock return of tomorrow, X 2 [ 30%, 30%]
X = GDP of Hong Kong in 2021

55
Probability mass function

Earlier, we talked about probability laws which assigns probabilities to


events in a sample space. We can play the same game with random
variables.

X = number of heads in two fair coin tosses.

X can take values in {0, 1, 2}.


1
P(X = 0) = P({TT }) = 4

1
P(X = 1) = P({TH}) + P({HT }) = 2

1
P(X = 2) = P({HH}) = 4

56
Probability mass function

The function P(X = x) is a probability mass function of a


discrete random variable X .

Probability mass function assigns probability to di↵erent potential


outcomes x of a discrete random variable X .

In general, the probability that a random variable X takes up a value


x is written as pX (x), or PX (X = x), or PX (x)

A random variable is always written with upper case and numerical


values that we are trying to evaluate the probability for is written
with a lower case.

57
Summary

Monty Hall problem

Concept of independence

Law of total probability

Bayes Theorem and tree diagrams

Discrete random variable

58

You might also like