Mod02 Intro Probability
Mod02 Intro Probability
Probability
1
Events
2
Venn Diagrams
A B
3
Venn Diagrams
A B
4
Venn Diagrams
AB
A B
5
Venn Diagrams
AB
A B
6
Venn Diagrams
A−B=AB
A B
7
Venn Diagrams
A−B=AB
A B
8
Venn Diagrams
AB
A B
9
Venn Diagrams
AB
B
10
Venn Diagrams
A B
11
Venn Diagrams
A
A B
12
Venn Diagrams
A
A B
13
Relationships among events
14
Probability in discrete space
Probability Axioms:
P(A) 0
P ( ) = 1
For Mutually Exclusive/Disjoint Events:
15
Probability in discrete space
Lemma:
P(A) = 1 − P(A )
P( A B) = P( A ) + P(B) − P(A B)
16
Probability in discrete space
Example 1:
42%
17
Events – class assignments
An insurance company offers four different deductible level- none(N), low(L), medium(M), and high (H). for its
homeowner’s policyholders, and three different for its automobile policyholders. Given the following random
sample of policyholders.
• What is the probability that the individual has a medium auto deductible and a high homeowner’s
deductible?
• What is the probability that the individual has a medium auto deductible ?
• What is the probability that the individual has a high homeowner’s deductible ?
• What is the probability that the individual is in the same category for both auto and homeowner’s
deductibles?
• What is the probability that the individual is in two different categories?
• What is the probability that the individual has a medium auto deductible given he/she has a high
homeowner’s deductible?
• What is the probability that the individual high homeowner’s deductible given he/she has a has a medium
auto deductible?
Home
Auto N L M H
L 40 60 50 30
See the
M 70 100 200 100 Excel File
H 20 30 150 150
18
Conditional Probability
P(A B)
P(A | B) =
P(B)
19
Events – class assignments
Another insurance company offers four different deductible level- none(N), low(L), medium(M), and high (H). for
its homeowner’s policyholders, and three different for its automobile policyholders. Given the following random
sample of policyholders.
Home
Auto N L M H
L 20 40 80 60
M 50 100 200 150
H 30 60 120 90
See the
Excel File
20
Independent Events
21
Conditionally Independent Events
See the
Excel File
22
Example
Home
Auto N L M H
L 4% 6% 5% 3% 18%
M 7% 10% 20% 10% 47%
H 2% 3% 15% 15% 35%
13% 19% 40% 28% 100%
• What is P(Auto=H/Home=H)=?
23
Example
24
Example
Die 2
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
Die 1 3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
See the
Excel File
25
Law of Total Probability
P(B1 B 2 ... B n ) =
26
Law of Total Probability:
P(B1U B2 U B3 U.....)= P(B1) + P(B2) + (B3) + ⋯
B5 B6 B7 B8
B1 B2 B3 B4
B9 B10 B11 A B12
B13 B14 B15 B16
B17 B18 B19 B20
27
Game Show
You are the finalist in a game show, and you have the option of choosing one of the
three briefcases. One of the briefcases contains a $1 million prize, and the other
two are empty. To make the show more exciting, after you choose a briefcase, the
host opens one of the remaining two briefcases, shows you that it is empty, and
gives you the option of either keeping your original briefcase, or switching to the
remaining briefcase. Should you switch?
28
Bayes’ Formula
P(A ∩ B1)= 𝑃 𝐴 𝐵1 𝑃 𝐵1
29
Bayes’ Formula
30
Medical Testing
A new and very accurate test has been developed for the detection of a disease (e.g. Cancer). The test is 99.9
percent accurate with error rates of 0.1 % for both types of errors. In other words:
• Out of 1,000 sick patients, the test misses only 1 patient, and
• Results in only 1 false positive for every 1,000 healthy individuals.
The prevalence rate of the disease in the general populations is about 1,000 per million. Given the positive result
of the test, what is the probability that the individual is in fact sick?
The second test of the patient is also positive. Now, what is probability that the individual is in fact sick?
31
Medical Testing
Positive
999 999 1,998
Not Positive
1 998,001 998,002
1,000 999,000 1,000,000
P(sick/positive)=999/1998=50%
32
Medical Testing
33
Document classification
A developer claims that her app can distinguish AI-generated documents from
human-generated ones. To assess its performance, we have submitted 1000
AI-generated and 1000 human-generated documents to the app.
• The app misidentified/misclassified 60 human-generated documents as AI-
generated
• and 50 AI generated documents as human- generated.
Create a probability table based on this information and answer the following
questions:
1. What is the probability of a randomly selected document being classified
correctly (accuracy?)
2. Given that a document is predicted as AI-generated, what is probability of
the document truly being AI-generated(precision?)
3. Given that a document is truly AI-generated, what is the probability that
the app classifies the document correctly(recall?)
34
Results of document classification
AI-Generated Human-Generated
Total
(Postitive) (Negative)
35
Confusion Matrix, Accuracy, precision , recall and F1
𝑇𝑃+𝑇𝑁 𝑇𝑃
Accuracy= Precision=
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 𝑇𝑃+𝐹𝑃
𝑇𝑃 2∗Precision∗Recall
Recall= F1=
𝑇𝑃+𝐹𝑁 Precision+Recall
Predicted Positive TP FP
Predicted Negative FN TN
36
Expectation of function variables
E[ f ( x)] = x f ( x) p ( x)
37
Expectation of a random variables
= E[X] = x xp ( x )
E[a]=a; E[aX]=aE[X]
38
Variance of a random variables
= x ( x − ) p( x )
2 2
Var(a)=0; Var(aX)=a2Var(X)
39
Example
40
Example
Std= sqrt(2.92)=1.70
41
Example
42
Example
Mean 7
Var 5.83
Std 2.42
43
Normal Distribution
44
Standard Normal Distribution
45
Z- Score
46
Example
C=66
47
Example
C=66
48
49