Lect08 CSN382
Lect08 CSN382
Machine Learning
CSN-382 (Lecture 8)
Dr. R. Balasubramanian
Professor
Department of Computer Science and Engineering
Mehta Family School of Data Science and Artificial Intelligence
Indian Institute of Technology Roorkee
Roorkee 247 667
[email protected]
https://faculty.iitr.ac.in/cs/bala/
EVENTS OF PROBABILITY
►A probability event can be defined as a set of outcomes
of an experiment.
► In other words, an event in probability is the subset of
the respective sample space.
► The entire possible set of outcomes of a random
experiment is the sample space or the individual space
of that experiment.
► The likelihood of occurrence of an event is known
as probability. The probability of occurrence of any
event lies between 0 and 1.
2
EVENTS OF PROBABILITY
3
TYPES OF EVENTS
4
TYPES OF EVENTS
Impossible and Sure Events
► If the probability of occurrence of an event is 0, such an
event is called an impossible event and if the probability
of occurrence of an event is 1, it is called a sure event. In
other words, the empty set ϕ is an impossible event and
the sample space S is a sure event.
Simple Events
► Any event consisting of a single point of the sample
space is known as a simple event in probability. For
example, if S = {56 , 78 , 96 , 54 , 89} and E = {78} then E
is a simple event.
5
TYPES OF EVENTS
Compound Events
► Contrary to the simple event, if any event consists of more than
one single point of the sample space then such an event is called
a compound event.
► Considering the same example again, if S = {56 ,78 ,96 ,54 ,89},
E1 = {56 ,54 }, E2 = {78 ,56 ,89 } then, E1 and E2 represent two
compound events.
Independent Events and Dependent Events
► If the occurrence of any event is completely unaffected by the
occurrence of any other event, such events are known as
an independent event in probability and the events which are
affected by other events are known as dependent events.
6
TYPES OF EVENTS
Mutually Exclusive Events
► If the occurrence of one event excludes the occurrence
of another event, such events are mutually exclusive
events i.e. two events don’t have any common point.
► For example, if S = {1 , 2 , 3 , 4 , 5 , 6} and E1, E2 are two
events such that E1 consists of numbers less than 3 and
E2 consists of numbers greater than 4. So, E1 = {1,2} and
E2 = {5,6} . Then, E1 and E2 are mutually exclusive.
Exhaustive Events
► A set of events is called exhaustive if all the events
together consume the entire sample space.
7
TYPES OF EVENTS
Complementary Events
► For any event E1 there exists another event E1‘ which
represents the remaining elements of the sample space
S.
E1 = S − E 1 ‘
► If a dice is rolled then the sample space S is given as
S = {1 , 2 , 3 , 4 , 5 , 6 }. If event E1 represents all the
outcomes which is greater than 4, then E1 = {5, 6} and
E1‘ = {1, 2, 3, 4}. Thus E1‘ is the complement of the event E1.
► Similarly, the complement of E1, E2, E3……….En will be
represented as E1‘, E2‘, E3‘……….En‘
8
TYPES OF EVENTS
9
TYPES OF EVENTS
Events Associated with “AND”
► If two events E1 and E2 are associated with AND then it
means the intersection of elements which is common to
both the events. The intersection symbol (∩) is used to
represent AND in probability.
► Thus, the event E1 ∩ E2 denotes E1 and E2.
Event E1 but not E2
► It represents the difference between both the events.
Event E1 but not E2 represents all the outcomes which
are present in E1 but not in E2. Thus, the event E1 but not
E2 is represented as
► E1 , E2 = E1 – E2
10
TYPES OF EVENTS
Axioms of Probability:
► Axiom 1: For any event A, P(A)≥0.
► Axiom 2: Probability of the sample space S is P(S)=1.
► Axiom 3: If A1,A2,A3,⋯ are disjoint events, then
P(A1∪A2∪A3⋯)=P(A1)+P(A2)+P(A3)+⋯
► For any event A, P(Ac)=1−P(A).
► The probability of the empty set is zero, i.e., P(∅)=0.
► For any event A, P(A)≤1.
► P(A−B)=P(A)−P(A∩B).
► P(A∪B)=P(A)+P(B)−P(A∩B), (inclusion-exclusion principle for
n=2).
► If A⊂B then P(A)≤P(B)
11
TYPES OF EVENTS
Problem 1:
In a presidential election, there are four candidates. Call
them A, B, C, and D. Based on our polling analysis, we
estimate that A has a 20 percent chance of winning the
election, while B has a 40 percent chance of winning.
What is the probability that A or B win the election?
..cont.
12
TYPES OF EVENTS
Solution:
Notice that the events that {A wins}, {B wins}, {C wins}, and
{D wins} are disjoint since more than one of them cannot
occur at the same time.
For example, if A wins, then B cannot win. From the third
axiom of probability, the probability of the union of two
disjoint events is the summation of individual probabilities.
Therefore,
P(A wins or B wins)=P({A wins}∪{B wins})
=P({A wins})+P({B wins})
=0.2+0.4
=0.6
13
TYPES OF EVENTS
Problem 2:
Suppose we have the following information:
► There is a 60 percent chance that it will rain today.
► There is a 50 percent chance that it will rain tomorrow.
► There is a 30 percent chance that it does not rain either day.
Find the following probabilities:
1. The probability that it will rain today or tomorrow.
2. The probability that it will rain today and tomorrow.
3. The probability that it will rain today but not tomorrow.
4. The probability that it either will rain today or tomorrow, but
not both.
14
TYPES OF EVENTS
Solution:
Convert them to probability language.
let's define A as the event that it will rain today, and
B as the event that it will rain tomorrow.
P(A)=0.6 ,
P(B)=0.5,
P(Ac ∩Bc )=0.3
1). The probability that it will rain today or tomorrow:
this is P(A∪B)?
15
TYPES OF EVENTS
P(A∪B) = 1− P((A∪B)c )
= 1− P(Ac ∩Bc )
= 1−0.3
= 0.7
2). The probability that it will rain today and tomorrow:
this is P(A∩B)?
P(A∩B) = P(A)+P(B)−P(A∪B)
= 0.6+0.5−0.7
= 0.4
16
TYPES OF EVENTS
3). The probability that it will rain today but not tomorrow:
this is P(A∩Bc ).
P(A∩Bc ) = P(A−B)
= P(A)−P(A∩B)
= 0.6−0.4
= 0.2
17
TYPES OF EVENTS
P(B−A) = P(B)−P(B∩A)
= 0.5−0.4
= 0.1
Thus, P(A−B)+P(B−A) = 0.2+0.1 = 0.3
18
TYPES OF EVENTS
Inclusion-exclusion principle:
► P(A∪B)=P(A)+P(B)−P(A∩B).
► P(A∪B∪C)=P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A
∩B∩C).
19
SET THEORY FOR PROBABILITY
► Understanding the basics of set theory is a prerequisite
for studying probability.
► This lecture presents a concise introduction to
1. Set membership
2. Inclusion
3. Unions
4. Intersections
5. Complements
► These are all concepts that are frequently used in the
calculus of probabilities.
20
Conditional Probability
Example: I roll a fair die. Let A be the event that the outcome is an
odd number, i.e., A={1,3,5}. Also let B be the event that the outcome
is less than or equal to 3, i.e., B={1,2,3}. What is the probability of
gettingA, P(A)? What is the probability of getting A given B, P(A|B)?
Solution: This is a finite sample space, so
|𝑨| |{1,3,5}| 𝟏
P(A)= = =
|𝑺| 𝟔 𝟐
Now, let's find the conditional probability of A given that B occurred.
If we know B has occurred, the outcome must be among {1,2,3}. For
A to also happen the outcome must be in A∩B={1,3}. Since all die
rolls are equally likely, we argue that P(A|B) must be equal to
|A∩B| 𝟐
P(A|B)= = .
|𝑩| 𝟑
21
Conditional Probability
22
Conditional Probability
Definition:
P(A∩B)
P(A|B) = , when P(B)>0.
P(B)
23
Conditional Probability
24
Conditional Probability
Note that conditional probability of P(A|B) is undefined
when P(B)=0. That is okay because if P(B)=0, it means
that the event B never occurs so it does not make sense
to talk about the probability of A given B.
Example:
For three events, A, B, and C, with P(C)>0, we have
► P(Ac |C)=1−P(A|C);
► P(∅|C)=0;
► P(A|C)≤1;
► P(A−B|C)=P(A|C)−P(A∩B|C);
► P(A∪B|C)=P(A|C)+P(B|C)−P(A∩B|C);
► if A⊂B then P(A|C)≤P(B|C).
26
Special cases of conditional
probability
Case 1:
When A and B are disjoint: In this case A∩B=∅, so
P(A∩B)
► P(A|B) = = P(∅)/P(B) =0.
P(B)
27
Special cases of conditional
probability
Case 2:
When B is a subset of A: If B⊂A, then whenever B
happens, A also happens. Thus, given that B occurred, we
expect that probability of A be one. In this case A∩B=B, so
P(A∩B)
P(A|B) = =P(B)/P(B) =1.
P(B)
28
Special cases of conditional
probability
Case 3:
P(A∩B)
P(A|B) = = P(A)/P(B).
P(B)
29
CONDITIONAL PROBABILITY
Problems:
I roll a fair die twice and obtain two numbers X1 = result of
the first roll and X2 = result of the second roll. Given that I
know
X1+X2 =7, what is the probability that X1 =4 or X2 =4?
Sol:
Let A be the event that X1=4 or X2=4 and B be the event
that X1+X2=7. We are interested in P(A|B), so we can use
P(A∩B)
P(A|B)=
P(B)
30
CONDITIONAL PROBABILITY
We note that
A={(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(1,4),(2,4),(3,4),(5,4),(6,
4)},
B={(6,1),(5,2),(4,3),(3,4),(2,5),(1,6)},
A∩B={(4,3),(3,4)}.
We conclude
𝟐
P(A∩B) 𝟑𝟔
P(A|B) = = 𝟔 = 1/3
P(B)
𝟑𝟔
31
EXPECTATION AND VARIANCE
32
EXPECTATION
Definition: The expectation of a random variable
(RV) is the real number
33
EXPECTATION
Example:
The expectation of an random variable expressing the
result of throwing a fair die is
34
EXPECTATION
Average =
35
EXPECTATION
36
VARIANCE
Y = (X−E(X))2
Var(X) = E( (X−E(X))2 )
Var(b)=E(b−E(b))2 =E(0)=0, where b is a constant.
► The standard deviation of an RV X is std(X)= Var(X)
37
VARIANCE
Proposition:
Var(X)=E(X2)−(E(X))2
Var(X) = E((X−E(X))2)
= E(X2−2.X.E(X)+(E(X))2)
= E(X2)−2.E(X).E(X)+(E(X))2
= E(X2)−(E(X))2
38
VARIANCE
Assignment 1:
For any constants a,b∈R, prove that
(i) Var(aX+b)=a2Var(X)
(ii) Var(X+b)=Var(X)
39
VARIANCE
Assignment 2: Consider the classical probability model on (a,b) and its
associated random variable X.
(𝒂+𝒃) (𝒂−𝒃)𝟐
Ans: E(x)= , Var(x)=
𝟐 𝟏𝟐
40
Probability basics using python
https://colab.research.google.com/drive/1cCUfPUJlko7vH8rC1-
vTtrcMuNcJ35ea?usp=sharing
41
Bayes Theorem, Conditional probability, and
Naive Bayes
42
Bayes and connection with
Conditional Probability
• Bayes theorem describes the probability of an event which is based on the
preceding events. The bayes theorem is the extended version of conditional
probability.
• Using conditional probability, Bayes theorem provides a way to calculate
posterior probability P(A|B) from P(A), P(B) and P(B|A)
P( A|B) = P(B|A)*P(A)
P(B) , known as Bayes theorem.
43
Bayes Theorem Example
• Independent probabilities:
○ P(A): Probability that the card is Red (event A)
■ => 26/52
P(B|A)*P(A)
○ P(B): Probability that the card is King (event B ) P( A|B) =
P(B)
■ => 4/52
• P( B| A): Conditional probability - Probability that the
card is King (B) given it is Red (B)
○ 2/26
• P( A | B): Posterior probability - Probability that the
card is Red (A) given it is King (B)
○ => P ( B | A) * P (A) / P (B)
○ => (2/26) *(26/52)/ (4/52)
○ = 2/4
44
Bayes Theorem for multiple features
● The Bayes rule computes the probability of class A for given B feature.
● In real life problems, the target class A depends on multiple B variables. So,
the formula of bayes rule can be extended for multiple input features like,
P(A| B1, B2, B3,...Bn) = P(B1, B2, B3,...Bn | A) * P(A)
P(B1, B2, B3,...Bn)
● According to the naïve assumption of input features are independent,
P(A| B1, B2, B3,...Bn) = P(B1|A)* P(B2|A)*... P(Bn|A)* P(A)
P(B1)*P(B2)*P(B3)*...P(Bn)
● Note - The Bayesian classifier works on maximum a posterior or MAP
decision rule
45
Naive Bayes
46
Naive Bayes
● The Bayes rule computes the probability of class Y for given X features. In real
life problems, the target class Y depends on multiple X variables. So, the formula
of bayes rule can be extended for multiple input features like,
47
Steps of a Naive Bayes Classifier
48
Business Problem - label the
email as spam or ham
49
Business Problem - label the
email as spam or ham
50
Business Problem - label the
email as spam or ham
Step2: Obtain the likelihood
The probability that the word Good appears in a Ham email, ie P(Good | Ham)
is = 10/40 = 0.25.
Similarly we can fill the entire table
Spam Ham Spam Ham
Good 2 10 Good 2/50 = 0.04 10/40 = 0.25
Lonely 2 1
Lonely 2/50 = 0.04 1/40 = 0.025
Obtain the
Horoscope 20 5
likelihoods Horoscope 20/50 = 0.4 5/40 = 0.125
Work 5 12 Work 5/50 = 0.1 12/40 = 0.30
Snacks 0 5 Snacks 0/50 = 0 5/40 = 0.125
Money 21 7
Money 21/50 = 0.42 7/40 = 0.175
Total 50 40
51
Business Problem - label the
email as spam or ham
For our instance, compute the posterior probabilities for each of the class
labels - spam or ham
52
Business Problem - label the
email as spam or ham
● Compute the posterior probabilities for each the class labels - Spam or Ham
For Spam,
P(Spam| Good, Work) = P(Spam) . P(Good |
Spam). P(Work | Spam) = (0.55) . (0.04) . (0.1)
= 0.0022
For Ham,
P(Ham| Good, Work) = P(Ham) . P(Good |
Ham). P (Work | Ham) = (0.45) . (0.25) .
(0.30) = 0.033
53
Business Problem - label the
email as spam or ham
Since 0.033 > 0.0022, we assign the class label as Ham to the instance Good Work.
54
Types of Naive Bayes Classifier
55
Thank You!
56