0% found this document useful (0 votes)
10 views

Lect08 CSN382

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lect08 CSN382

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

Machine Learning
CSN-382 (Lecture 8)
Dr. R. Balasubramanian
Professor
Department of Computer Science and Engineering
Mehta Family School of Data Science and Artificial Intelligence
Indian Institute of Technology Roorkee
Roorkee 247 667
[email protected]
https://faculty.iitr.ac.in/cs/bala/
EVENTS OF PROBABILITY
►A probability event can be defined as a set of outcomes
of an experiment.
► In other words, an event in probability is the subset of
the respective sample space.
► The entire possible set of outcomes of a random
experiment is the sample space or the individual space
of that experiment.
► The likelihood of occurrence of an event is known
as probability. The probability of occurrence of any
event lies between 0 and 1.

2
EVENTS OF PROBABILITY

► The sample space for the tossing of three coins simultaneously is


given by:
S = {(T , T , T) , (T , T , H) , (T , H , T) , (T , H , H ) , (H , T , T ) ,
(H , T , H) , (H , H, T) ,(H , H , H)}

► Suppose, if we want to find only the outcomes which have at least


two heads; then the set of all such possibilities can be given as:
E = { (H , T , H) , (H , H ,T) , (H , H ,H) , (T , H , H)}
Thus, an event is a subset of the sample space, i.e., E is a subset of S.

► For any event to occur, the outcome of the experiment must be


an element of the set of event E.

3
TYPES OF EVENTS

 Some of the important probability events are:


► Impossible and Sure Events
► Simple Events
► Compound Events
► Independent and Dependent Events
► Mutually Exclusive Events
► Exhaustive Events
► Complementary Events
► Events Associated with “OR”
► Events Associated with “AND”
► Event E1 but not E2

4
TYPES OF EVENTS
Impossible and Sure Events
► If the probability of occurrence of an event is 0, such an
event is called an impossible event and if the probability
of occurrence of an event is 1, it is called a sure event. In
other words, the empty set ϕ is an impossible event and
the sample space S is a sure event.
Simple Events
► Any event consisting of a single point of the sample
space is known as a simple event in probability. For
example, if S = {56 , 78 , 96 , 54 , 89} and E = {78} then E
is a simple event.

5
TYPES OF EVENTS

Compound Events
► Contrary to the simple event, if any event consists of more than
one single point of the sample space then such an event is called
a compound event.
► Considering the same example again, if S = {56 ,78 ,96 ,54 ,89},
E1 = {56 ,54 }, E2 = {78 ,56 ,89 } then, E1 and E2 represent two
compound events.
Independent Events and Dependent Events
► If the occurrence of any event is completely unaffected by the
occurrence of any other event, such events are known as
an independent event in probability and the events which are
affected by other events are known as dependent events.

6
TYPES OF EVENTS
Mutually Exclusive Events
► If the occurrence of one event excludes the occurrence
of another event, such events are mutually exclusive
events i.e. two events don’t have any common point.
► For example, if S = {1 , 2 , 3 , 4 , 5 , 6} and E1, E2 are two
events such that E1 consists of numbers less than 3 and
E2 consists of numbers greater than 4. So, E1 = {1,2} and
E2 = {5,6} . Then, E1 and E2 are mutually exclusive.
Exhaustive Events
► A set of events is called exhaustive if all the events
together consume the entire sample space.
7
TYPES OF EVENTS
Complementary Events
► For any event E1 there exists another event E1‘ which
represents the remaining elements of the sample space
S.
E1 = S − E 1 ‘
► If a dice is rolled then the sample space S is given as
S = {1 , 2 , 3 , 4 , 5 , 6 }. If event E1 represents all the
outcomes which is greater than 4, then E1 = {5, 6} and
E1‘ = {1, 2, 3, 4}. Thus E1‘ is the complement of the event E1.
► Similarly, the complement of E1, E2, E3……….En will be
represented as E1‘, E2‘, E3‘……….En‘
8
TYPES OF EVENTS

Events Associated with “OR”


► If two events E1 and E2 are associated with OR then it
means that either E1 or E2 or both. The union
symbol (∪) is used to represent OR in probability. Thus,
the event E1U E2 denotes E1 OR E2.
► If we have mutually exhaustive events E1, E2,
E3 ………En associated with sample space S then,
E1 U E2 U E3U ………En = S

9
TYPES OF EVENTS
Events Associated with “AND”
► If two events E1 and E2 are associated with AND then it
means the intersection of elements which is common to
both the events. The intersection symbol (∩) is used to
represent AND in probability.
► Thus, the event E1 ∩ E2 denotes E1 and E2.
Event E1 but not E2
► It represents the difference between both the events.
Event E1 but not E2 represents all the outcomes which
are present in E1 but not in E2. Thus, the event E1 but not
E2 is represented as
► E1 , E2 = E1 – E2
10
TYPES OF EVENTS
Axioms of Probability:
► Axiom 1: For any event A, P(A)≥0.
► Axiom 2: Probability of the sample space S is P(S)=1.
► Axiom 3: If A1,A2,A3,⋯ are disjoint events, then
P(A1∪A2∪A3⋯)=P(A1)+P(A2)+P(A3)+⋯
► For any event A, P(Ac)=1−P(A).
► The probability of the empty set is zero, i.e., P(∅)=0.
► For any event A, P(A)≤1.
► P(A−B)=P(A)−P(A∩B).
► P(A∪B)=P(A)+P(B)−P(A∩B), (inclusion-exclusion principle for
n=2).
► If A⊂B then P(A)≤P(B)
11
TYPES OF EVENTS

Problem 1:
In a presidential election, there are four candidates. Call
them A, B, C, and D. Based on our polling analysis, we
estimate that A has a 20 percent chance of winning the
election, while B has a 40 percent chance of winning.
What is the probability that A or B win the election?

..cont.

12
TYPES OF EVENTS
Solution:
Notice that the events that {A wins}, {B wins}, {C wins}, and
{D wins} are disjoint since more than one of them cannot
occur at the same time.
For example, if A wins, then B cannot win. From the third
axiom of probability, the probability of the union of two
disjoint events is the summation of individual probabilities.
Therefore,
P(A wins or B wins)=P({A wins}∪{B wins})
=P({A wins})+P({B wins})
=0.2+0.4
=0.6
13
TYPES OF EVENTS

Problem 2:
Suppose we have the following information:
► There is a 60 percent chance that it will rain today.
► There is a 50 percent chance that it will rain tomorrow.
► There is a 30 percent chance that it does not rain either day.
Find the following probabilities:
1. The probability that it will rain today or tomorrow.
2. The probability that it will rain today and tomorrow.
3. The probability that it will rain today but not tomorrow.
4. The probability that it either will rain today or tomorrow, but
not both.

14
TYPES OF EVENTS

Solution:
Convert them to probability language.
let's define A as the event that it will rain today, and
B as the event that it will rain tomorrow.

P(A)=0.6 ,
P(B)=0.5,
P(Ac ∩Bc )=0.3
1). The probability that it will rain today or tomorrow:
this is P(A∪B)?
15
TYPES OF EVENTS

P(A∪B) = 1− P((A∪B)c )
= 1− P(Ac ∩Bc )
= 1−0.3
= 0.7
2). The probability that it will rain today and tomorrow:
this is P(A∩B)?
P(A∩B) = P(A)+P(B)−P(A∪B)
= 0.6+0.5−0.7
= 0.4

16
TYPES OF EVENTS

3). The probability that it will rain today but not tomorrow:
this is P(A∩Bc ).

P(A∩Bc ) = P(A−B)
= P(A)−P(A∩B)
= 0.6−0.4
= 0.2

17
TYPES OF EVENTS

4). The probability that it either will rain today or


tomorrow but not both
this is P(A−B)+P(B−A).
We have already found P(A−B)= 0.2
Similarly, we can find P(B−A):

P(B−A) = P(B)−P(B∩A)
= 0.5−0.4
= 0.1
Thus, P(A−B)+P(B−A) = 0.2+0.1 = 0.3

18
TYPES OF EVENTS

Inclusion-exclusion principle:

► P(A∪B)=P(A)+P(B)−P(A∩B).

► P(A∪B∪C)=P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A
∩B∩C).

19
SET THEORY FOR PROBABILITY
► Understanding the basics of set theory is a prerequisite
for studying probability.
► This lecture presents a concise introduction to
1. Set membership
2. Inclusion
3. Unions
4. Intersections
5. Complements
► These are all concepts that are frequently used in the
calculus of probabilities.

20
Conditional Probability

Example: I roll a fair die. Let A be the event that the outcome is an
odd number, i.e., A={1,3,5}. Also let B be the event that the outcome
is less than or equal to 3, i.e., B={1,2,3}. What is the probability of
gettingA, P(A)? What is the probability of getting A given B, P(A|B)?
Solution: This is a finite sample space, so
|𝑨| |{1,3,5}| 𝟏
P(A)= = =
|𝑺| 𝟔 𝟐
Now, let's find the conditional probability of A given that B occurred.
If we know B has occurred, the outcome must be among {1,2,3}. For
A to also happen the outcome must be in A∩B={1,3}. Since all die
rolls are equally likely, we argue that P(A|B) must be equal to
|A∩B| 𝟐
P(A|B)= = .
|𝑩| 𝟑

21
Conditional Probability

► Now let's see how we can generalize the above example.


We can rewrite the calculation by dividing the
numerator and denominator by |S| in the following way
|A∩B|
P(A|B) =
|B|
|A∩B|
|S|
=
|B|
|S|
P(A∩B)
P(A|B) =
P(B)

22
Conditional Probability

Definition:

If A and B are two events in a sample space S, then the


conditional probability of A given B is defined as

P(A∩B)
P(A|B) = , when P(B)>0.
P(B)

23
Conditional Probability

► When we know that B has occurred, every outcome that


is outside B should be discarded. Thus, our sample space
is reduced to the set B.
► Now the only way that A can happen is when the
outcome belongs to the set A∩B. We divide P(A∩B) by
P(B), so that the conditional probability of the new
sample space becomes 1, i.e.,
P(B∩B)
P(B|B)= =1.
P(B)

24
Conditional Probability
Note that conditional probability of P(A|B) is undefined
when P(B)=0. That is okay because if P(B)=0, it means
that the event B never occurs so it does not make sense
to talk about the probability of A given B.

Venn diagram for conditional probability, P(A|B)


25
Conditional Probability

Example:
For three events, A, B, and C, with P(C)>0, we have

► P(Ac |C)=1−P(A|C);
► P(∅|C)=0;
► P(A|C)≤1;
► P(A−B|C)=P(A|C)−P(A∩B|C);
► P(A∪B|C)=P(A|C)+P(B|C)−P(A∩B|C);
► if A⊂B then P(A|C)≤P(B|C).

26
Special cases of conditional
probability
Case 1:
When A and B are disjoint: In this case A∩B=∅, so

P(A∩B)
► P(A|B) = = P(∅)/P(B) =0.
P(B)

► This makes sense. In particular, since A and B are disjoint


they cannot both occur at the same time. Thus, given
that B has occurred, the probability of A must be zero.

27
Special cases of conditional
probability
Case 2:
When B is a subset of A: If B⊂A, then whenever B
happens, A also happens. Thus, given that B occurred, we
expect that probability of A be one. In this case A∩B=B, so

P(A∩B)
P(A|B) = =P(B)/P(B) =1.
P(B)

28
Special cases of conditional
probability
Case 3:

When A is a subset of B: In this case A∩B=A, so

P(A∩B)
P(A|B) = = P(A)/P(B).
P(B)

29
CONDITIONAL PROBABILITY

Problems:
I roll a fair die twice and obtain two numbers X1 = result of
the first roll and X2 = result of the second roll. Given that I
know
X1+X2 =7, what is the probability that X1 =4 or X2 =4?

Sol:
Let A be the event that X1=4 or X2=4 and B be the event
that X1+X2=7. We are interested in P(A|B), so we can use
P(A∩B)
P(A|B)=
P(B)
30
CONDITIONAL PROBABILITY

We note that
A={(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(1,4),(2,4),(3,4),(5,4),(6,
4)},
B={(6,1),(5,2),(4,3),(3,4),(2,5),(1,6)},
A∩B={(4,3),(3,4)}.

We conclude
𝟐
P(A∩B) 𝟑𝟔
P(A|B) = = 𝟔 = 1/3
P(B)
𝟑𝟔

31
EXPECTATION AND VARIANCE

 We frequently calculate the expectation and variance,


two significant summary statistics, given a random
variable.

 The expectation describes the average value.

 The variance describes the spread (amount of


variability) around the expectation.

32
EXPECTATION
Definition: The expectation of a random variable
(RV) is the real number

൛σ𝑥 𝑥 P(𝑥) X is a discrete RV


E(x)= ቐ ∞
‫׬‬−∞ 𝑥 𝑓(𝑥) X is a continuous 𝑅𝑉

provided the sum or integral exists.

33
EXPECTATION
Example:
The expectation of an random variable expressing the
result of throwing a fair die is

E(X)= 1x P(1)+2 x P(2)⋯+ 6 x P(6)


1
= x (1+⋯+6)
6
21
=
6
= 3.5

34
EXPECTATION

► Naturally, the expectation provides the long-term


average of numerous dice rolls.

Average =

1⋅(#−of times we get 1) + ⋯+ 6⋅(#−of−times−we−get 6)


𝑘
= 1⋅frequency-of-getting 1+ ⋯+ 6⋅frequency-of-
getting 6
= E(X)

35
EXPECTATION

For any constants a, b ∈ R, the expectation of Y=ax+b is


E(a x+b)=a E(x)+b.
Proof. Using the previous proposition, we have
E(ax+b)= ∑ (ax+b)p(x)
= a ∑ x p(x)+b ∑ p(x)
= a E(x)+b⋅1
for a discrete RV. In the case of a continuous RV, we
replace the summation with an integral.
► The proposition above specifically states that E(x+b) =
E(x)+b, and E(ax) = a E(x)

36
VARIANCE

► The variance measures the amount of variability of the


RV X around E(X).
► The variance of an RV X is the expectation of the RV

Y = (X−E(X))2
Var(X) = E( (X−E(X))2 )
Var(b)=E(b−E(b))2 =E(0)=0, where b is a constant.
► The standard deviation of an RV X is std(X)= Var(X)

37
VARIANCE

Proposition:
Var(X)=E(X2)−(E(X))2

Proof. Using the linearity of expectations

Var(X) = E((X−E(X))2)
= E(X2−2.X.E(X)+(E(X))2)
= E(X2)−2.E(X).E(X)+(E(X))2
= E(X2)−(E(X))2

38
VARIANCE

Assignment 1:
For any constants a,b∈R, prove that

(i) Var(aX+b)=a2Var(X)

(ii) Var(X+b)=Var(X)

39
VARIANCE
Assignment 2: Consider the classical probability model on (a,b) and its
associated random variable X.

Compute Expectation E(x) and Variance Var(x).

(𝒂+𝒃) (𝒂−𝒃)𝟐
Ans: E(x)= , Var(x)=
𝟐 𝟏𝟐

40
Probability basics using python

Basics of probability have been implemented and we can find it using


link given below. (For Practice)

https://colab.research.google.com/drive/1cCUfPUJlko7vH8rC1-
vTtrcMuNcJ35ea?usp=sharing

41
Bayes Theorem, Conditional probability, and
Naive Bayes

42
Bayes and connection with
Conditional Probability
• Bayes theorem describes the probability of an event which is based on the
preceding events. The bayes theorem is the extended version of conditional
probability.
• Using conditional probability, Bayes theorem provides a way to calculate
posterior probability P(A|B) from P(A), P(B) and P(B|A)

• The probability of event A given B is: P( A|B) = P(A∩B)


P(B)
• The probability of event B given A is: P(A∩B)
P( B|A) =
P(A)
• Using above two equations,

P( A|B) = P(B|A)*P(A)
P(B) , known as Bayes theorem.

43
Bayes Theorem Example
• Independent probabilities:
○ P(A): Probability that the card is Red (event A)
■ => 26/52
P(B|A)*P(A)
○ P(B): Probability that the card is King (event B ) P( A|B) =
P(B)
■ => 4/52
• P( B| A): Conditional probability - Probability that the
card is King (B) given it is Red (B)
○ 2/26
• P( A | B): Posterior probability - Probability that the
card is Red (A) given it is King (B)
○ => P ( B | A) * P (A) / P (B)
○ => (2/26) *(26/52)/ (4/52)
○ = 2/4

44
Bayes Theorem for multiple features

● The Bayes rule computes the probability of class A for given B feature.
● In real life problems, the target class A depends on multiple B variables. So,
the formula of bayes rule can be extended for multiple input features like,
P(A| B1, B2, B3,...Bn) = P(B1, B2, B3,...Bn | A) * P(A)
P(B1, B2, B3,...Bn)
● According to the naïve assumption of input features are independent,
P(A| B1, B2, B3,...Bn) = P(B1|A)* P(B2|A)*... P(Bn|A)* P(A)
P(B1)*P(B2)*P(B3)*...P(Bn)
● Note - The Bayesian classifier works on maximum a posterior or MAP
decision rule

45
Naive Bayes

● It estimates conditional probability which


is the probability that event A will happen,
given that event B has already occurred.
● Hence, used to classify target column
based on given features
● It assumes that the features are
independent.
● It is easy to implement, fast, accurate, and
can be used to make real time predictions.

46
Naive Bayes

● The Bayes rule computes the probability of class Y for given X features. In real
life problems, the target class Y depends on multiple X variables. So, the formula
of bayes rule can be extended for multiple input features like,

P(X1, X2, X3,....Xn|Y) * P(Y)


P(Y|X1,X2,X3,....Xn) =
P(X1, X2, X3,.....Xn)

According to the naive bayes assumption of input features are independent,

P(X1|Y) * P(X2|Y) *.…..P(Xn|Y) * P(Y)


P(Y|X1,X2,X3,....Xn) =
P(X1) * P(X2) * P(X3)*.....P(Xn)

47
Steps of a Naive Bayes Classifier

Compute the prior probabilities for given


class labels.

Compute the Likelihood of evidence with each


attribute for each class.

Calculate the posterior probabilities using Bayes


rule.

Select the class which has higher probability


for given inputs.

48
Business Problem - label the
email as spam or ham

● We shall consider the problem of labelling the


received emails as spam or ham
● Choose a few words you find in emails

Consider the frequency of these words used in


spam and ham emails as shown below:

49
Business Problem - label the
email as spam or ham

Step1: Obtain the prior probability.


The prior probabilities are,
P(Spam) = 50/90 => 0.55
P(Ham) = 40/90 => 0.45

50
Business Problem - label the
email as spam or ham
Step2: Obtain the likelihood
The probability that the word Good appears in a Ham email, ie P(Good | Ham)
is = 10/40 = 0.25.
Similarly we can fill the entire table
Spam Ham Spam Ham
Good 2 10 Good 2/50 = 0.04 10/40 = 0.25
Lonely 2 1
Lonely 2/50 = 0.04 1/40 = 0.025
Obtain the
Horoscope 20 5
likelihoods Horoscope 20/50 = 0.4 5/40 = 0.125
Work 5 12 Work 5/50 = 0.1 12/40 = 0.30
Snacks 0 5 Snacks 0/50 = 0 5/40 = 0.125
Money 21 7
Money 21/50 = 0.42 7/40 = 0.175
Total 50 40

51
Business Problem - label the
email as spam or ham

Step3: Compute posterior probabilities for

- A given email has Good Work mentioned in it, is it a spam/ham message??


- Our instance is Good Work.

For our instance, compute the posterior probabilities for each of the class
labels - spam or ham

52
Business Problem - label the
email as spam or ham

● Compute the posterior probabilities for each the class labels - Spam or Ham

For Spam,
P(Spam| Good, Work) = P(Spam) . P(Good |
Spam). P(Work | Spam) = (0.55) . (0.04) . (0.1)
= 0.0022

For Ham,
P(Ham| Good, Work) = P(Ham) . P(Good |
Ham). P (Work | Ham) = (0.45) . (0.25) .
(0.30) = 0.033

53
Business Problem - label the
email as spam or ham

Step4: Assign the most probable class label.

For Spam, For Ham,


P(Spam| Good, Work) = 0.0022 P(Ham| Good, Work) = 0.033

Since 0.033 > 0.0022, we assign the class label as Ham to the instance Good Work.

54
Types of Naive Bayes Classifier

55
Thank You!

56

You might also like