0% found this document useful (0 votes)
26 views

Bayesian_Inference_for_AI

Uploaded by

Marufa Sultana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Bayesian_Inference_for_AI

Uploaded by

Marufa Sultana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Bayesian Inference for AI: A Comprehensive Guide to Bayesian

Networks, Naı̈ve Bayes and Hidden Markov Models

Bayesian Network
A Bayesian Network is a directed acyclic graph (DAG) that represents the probabilistic relationships
among a set of random variables. Each node in the graph corresponds to a random variable, and the
edges represent the conditional dependencies between these variables.
Formally, a Bayesian Network for a set of random variables X1 , X2 , . . . , Xn consists of:

• A set of nodes representing the variables X1 , X2 , . . . , Xn ,


• A set of directed edges that encode conditional dependencies between the variables,
• A set of conditional probability distributions P (Xi | Parents(Xi )), where Parents(Xi ) represents
the set of parents of Xi in the network.

1 Joint Probability Distribution


The joint probability distribution of all the variables in a Bayesian Network can be factored as the product
of the conditional probabilities of each variable given its parents in the network. Mathematically, the
joint probability distribution can be written as:
n
Y
P (X1 , X2 , . . . , Xn ) = P (Xi | Parents(Xi ))
i=1

This factorization is based on the assumption that each variable is conditionally independent of its
non-descendants given its parents.

Example
Consider a Bayesian Network with three variables: X1 , X2 , and X3 , where:

• X1 has no parents,
• X2 has X1 as its parent,
• X3 has X1 and X2 as its parents.
The joint probability distribution for this network is:

P (X1 , X2 , X3 ) = P (X1 ) · P (X2 | X1 ) · P (X3 | X1 , X2 )


Let’s work through an example similar to the alarm for burglary and earthquake scenario, where we
will calculate everything from scratch, including the dataset, conditional probabilities, the Bayesian Belief
Network and how to perform inference using Conditional Probability Tables (CPTs). A small historical
database that contains simulated data for burglary, earthquake, alarm activation, and whether John
and Mary called. We will use this data to calculate the prior probabilities and conditional probabilities
needed for the Bayesian Belief Network.

1
Bayesian Inference for AI

2 Events
• Burglary (B): There is a burglary at your house.
• Earthquake (E): An earthquake occurs.
• Alarm (A): The alarm goes off.
• JohnCalls (J): John calls if he hears the alarm.
• MaryCalls (M): Mary calls if she hears the alarm.

3 Bayesian Network Graph


The following is the graphical representation of the Bayesian Network for this scenario:

Burglary (B) Earthquake (E)

Alarm (A)

John Calls (J) Mary Calls (M)

Figure 1: Bayesian Network

4 Problem Statement
We are tasked with finding the probability of the event where the alarm has sounded (A = Yes), a
burglary has occurred (B = Yes), an earthquake has not occurred (E = No), and both John and Mary
have called (J = Yes, M = Yes). We will calculate the joint probability of this event using a Bayesian
network model based on the dataset provided.

5 Historical Database
Below is the historical data showing occurrences of burglary, earthquake, alarm activations, and calls
from John and Mary:

6 Prior and Conditional Probabilities


From the dataset, we can calculate the following prior and conditional probabilities:

6.1 Prior Probabilities


• P (B = Yes) = 4
10 = 0.4
• P (B = No) = 6
10 = 0.6
• P (E = Yes) = 5
10 = 0.5
• P (E = No) = 5
10 = 0.5

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Table 1: Historical dataset showing Burglary, Earthquake, Alarm, John Calls, and Mary Calls
Event ID Burglary (B) Earthquake (E) Alarm (A) John Calls (J) Mary Calls (M)
1 Yes No Yes Yes Yes
2 No Yes Yes Yes No
3 Yes Yes Yes Yes Yes
4 No No No No No
5 Yes No Yes No Yes
6 No Yes No No No
7 No No Yes Yes No
8 Yes Yes Yes Yes Yes
9 No No No No No
10 No Yes Yes Yes Yes

6.2 Conditional Probability Tables (CPT)


Next, we calculate the conditional probabilities for the alarm going off given different combinations of
burglary and earthquake, and the likelihood of John and Mary calling given that the alarm went off or
didn’t go off.

6.2.1 Conditional Probability of Alarm P (A|B, E):


• Case 1: P (A = Yes|B = Yes, E = Yes):
– Number of cases where B = Yes and E = Yes: 2.
– Out of these, the alarm went off in all 2 cases.
Thus, we can calculate:
2
P (A = Yes|B = Yes, E = Yes) = = 1.0
2
• Case 2: P (A = Yes|B = Yes, E = No):
– Number of cases where B = Yes and E = No: 2.
– Out of these, the alarm went off in 2 cases.
Thus, we calculate:
2
P (A = Yes|B = Yes, E = No) = = 1.0
2
• Case 3: P (A = Yes|B = No, E = Yes):

– Number of cases where B = No and E = Yes: 3.


– Out of these, the alarm went off in 2 cases.
Thus, we calculate:
2
P (A = Yes|B = No, E = Yes) = = 0.67
3
• Case 4: P (A = Yes|B = No, E = No):
– Number of cases where B = No and E = No: 3.
– Out of these, the alarm went off in 1 case.

Thus, we calculate:
1
P (A = Yes|B = No, E = No) = = 0.33
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Table 2: Conditional Probability Table for Alarm (A) given Burglary (B) and Earthquake (E)
Burglary (B) Earthquake (E) Alarm (A = Yes) Alarm (A = No)
Yes Yes 1.0 0.0
Yes No 1.0 0.0
No Yes 0.67 0.33
No No 0.33 0.67

6.2.2 CPT for John Calls given Alarm P (J|A)


• Case 1: P (J = Yes|A = Yes)

– Number of cases where the alarm went off (A = Yes): 7.


– Out of these, John called in 6 cases.

Thus, we calculate:
6
P (J = Yes|A = Yes) = = 0.86
7
• Case 2: P (J = Yes|A = No)

– Number of cases where the alarm did not go off (A = No): 3.


– Out of these, John called in 0 cases.

Thus, we calculate:
0
P (J = Yes|A = No) = = 0.0
3

Table 3: CPT for John Calls (J) given Alarm (A)


Alarm (A) John Calls (J = Yes) John Calls (J = No)
Yes 0.86 0.14
No 0.0 1.0

6.2.3 CPT for Mary Calls given Alarm P (M |A)


• Case 1: P (M = Yes|A = Yes)

– Number of cases where the alarm went off (A = Yes): 7.


– Out of these, Mary called in 5 cases.

Thus, we calculate:
5
P (M = Yes|A = Yes) = = 0.71
7
• Case 2: P (M = Yes|A = No)

– Number of cases where the alarm did not go off (A = No): 3.


– Out of these, Mary called in 0 cases.

Thus, we calculate:
0
P (M = Yes|A = No) = = 0.0
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Table 4: CPT for Mary Calls (M) given Alarm (A)


Alarm (A) Mary Calls (M = Yes) Mary Calls (M = No)
Yes 0.71 0.29
No 0.0 1.0

7 Solution
We need to calculate the joint probability:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes)


Using the chain rule for conditional probabilities:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = P (B = Yes) · P (E = No) · P (A = Yes|B = Yes, E = No)
· P (J = Yes|A = Yes) · P (M = Yes|A = Yes)

Substituting the values, we get:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = 0.4 × 0.5 × 1.0 × 0.86 × 0.71
= 0.1221

The probability of the event that a burglary has occurred, no earthquake has occurred, the alarm has
sounded, and both John and Mary called is approximately 0.1221. This means there is a 12.21% chance
of this specific combination of events happening, based on the given probabilities. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Bayesian Network (Practice Example)


Your objective is to predict whether a student will pass an exam based on the following features:
• Study Hours: How many hours the student studies per day (Low, Medium, High).
• Attendance: Whether the student attends classes regularly (Regular, Irregular).

• Sleep Hours: How many hours the student sleeps per day (Low, Medium, High).
• Pass Exam: Whether the student passes the exam (Yes, No).

1 Historical Database
The following table shows the raw data for 12 students:

Table 1: Training Data for Exam Passing Prediction


Student ID Study Hours Attendance Sleep Hours Pass Exam (Class)
1 Low Irregular Low No
2 Medium Regular Medium Yes
3 High Regular High Yes
4 Medium Regular High Yes
5 Low Irregular Low No
6 Low Irregular Medium No
7 High Regular High Yes
8 Medium Irregular Medium No
9 Medium Regular Medium Yes
10 High Regular High Yes
11 Low Regular Medium No
12 Medium Irregular Medium No

2 Bayesian Network Structure


The structure of the Bayesian Network is as follows:
• Study Hours and Attendance influence Sleep Hours.

• Study Hours, Attendance, and Sleep Hours together influence the probability of Pass Exam.

3 Bayesian Network Diagram


The diagram below represents the Bayesian Network for predicting whether a student will pass the exam
based on their study habits, attendance, and sleep hours.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Study Hours Attendance

Sleep Hours

Pass Exam

Figure 1: Bayesian Network for Exam Passing Prediction

4 Questions
1. Construct the Bayesian Network: Define the Conditional Probability Tables (CPTs) based on
the raw database.
2. Inference: Calculate the probability that a student will pass the exam given:

• Study Hours: Medium


• Attendance: Regular
• Sleep Hours: Medium
3. Explain the Impact: Explain how changing the values of Study Hours, Attendance, and Sleep
Hours affects the probability of passing the exam.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Naı̈ve Bayes Classification with Laplacian Correction


Let’s consider a small example of a Naı̈ve Bayesian Classifier where we predict whether a person will
buy a computer based on their income and student status. The dataset is small, and in this case,
we need to use Laplacian correction (also known as Laplace smoothing) to handle cases where we
have zero counts in the conditional probabilities.

1 Dataset
We consider the following dataset where we predict whether a person will buy a computer based on their
income and student status:

Table 1: Dataset for Buy Computer Class Prediction


ID Income Student Buys Computer (Class)
1 High No No
2 High No No
3 Medium No Yes
4 Low Yes Yes
5 Low Yes Yes
6 Low No No

We aim to predict whether a person with High Income and Student Status = Yes will buy a computer.

2 Steps to Perform Naı̈ve Bayes Classification


2.1 Prior Probabilities
The prior probabilities for each class are calculated as follows:
Number of people who bought a computer 3
P (Yes) = = = 0.5
Total number of people 6
Number of people who did not buy a computer 3
P (No) = = = 0.5
Total number of people 6

2.2 Likelihoods Without Laplacian Correction


Next, we calculate the likelihoods for each feature given the class.

2.2.1 Income
For the class Yes (Buys Computer):
0
P (Income = High|Yes) = =0
3
For the class No (Does Not Buys Computer):
2
P (Income = High|No) = = 0.67
3

2.2.2 Student Status


For the class Yes:
2
P (Student = Yes|Yes) = = 0.67
3
For the class No:
0
P (Student = Yes|No) = =0
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

2.3 Laplacian Correction


To avoid zero probabilities, we apply Laplacian correction by adding 1 to the counts.

2.3.1 Corrected Likelihoods for Income (with k = 3)


For the class Yes:
0+1 1
P (Income = High|Yes) = = = 0.167
3+3 6
For the class No:
2+1 3
P (Income = High|No) = = = 0.5
3+3 6

2.3.2 Corrected Likelihoods for Student Status (with k = 2)


For the class Yes:
2+1 3
P (Student = Yes|Yes) = = = 0.6
3+2 5
For the class No:
0+1 1
P (Student = Yes|No) = = = 0.2
3+2 5

2.4 Posterior Probabilities


Using the corrected likelihoods, we calculate the posterior probabilities for each class using the Naı̈ve
Bayes formula.

2.4.1 For Class = Yes


P (Yes|High Income, Student = Yes) = P (Yes) · P (Income = High|Yes) · P (Student = Yes|Yes)
= 0.5 × 0.167 × 0.6 = 0.0501

2.4.2 For Class = No


P (No|High Income, Student = Yes) = P (No) · P (Income = High|No) · P (Student = Yes|No)
= 0.5 × 0.5 × 0.2 = 0.05

2.5 Final Prediction


Since P (Yes|High Income, Student = Yes) = 0.0501 is greater than P (No|High Income, Student = Yes) =
0.05, the model predicts that the person will buy a computer. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Effect of Using k in Laplacian Correction for Naı̈ve Bayesian


Classification
In Naı̈ve Bayesian classification, Laplacian correction (also known as Laplace smoothing) is used to
handle zero probabilities when a feature value does not appear in the training data for a particular class.
The correction prevents the entire probability from becoming zero when multiplying the probabilities
of feature values. A key aspect of Laplacian correction is the use of k, the number of possible values a
feature can take. This document demonstrates the effect of not using k in Laplacian correction and how
it can result in incorrect probabilities.

1 Problem Statement
Consider a feature called Weather with three possible values:
• Sunny
• Rainy
• Cloudy
We are predicting whether people will go Outdoors (Yes/No) based on the weather. The training
dataset is shown below:

Table 1: Training Dataset


ID Weather Outdoors (Class)
1 Sunny Yes
2 Sunny Yes
3 Rainy Yes
4 Sunny No
5 Rainy No
6 Sunny No

1.1 Without Laplacian Correction


The probabilities for the weather conditions given “Outdoors = Yes” before any smoothing are:
2 1
P (Sunny | Yes) = , P (Rainy | Yes) = , P (Cloudy | Yes) = 0
3 3
The sum of the probabilities is:
2 1
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = + +0=1
3 3
However, the zero probability for Cloudy will cause problems if this value is encountered during classifi-
cation.

1.2 With Laplacian Correction (Without k)


Using Laplacian correction without k (i.e., adding 1 to the numerator and denominator), we calculate
the probabilities as:
2+1 3 1+1 2
P (Sunny | Yes) = = , P (Rainy | Yes) = = = 0.5
3+1 4 3+1 4
0+1 1
P (Cloudy | Yes) = =
3+1 4
The sum of the probabilities becomes:
3 1 6
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = + 0.5 + = = 1.5
4 4 4
This is greater than 1, which violates the rules of probability.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

1.3 With Laplacian Correction (With k = 3)


Now, using Laplacian correction with k = 3 (since there are three possible values for Weather : Sunny,
Rainy, Cloudy), we recalculate the probabilities:
2+1 3
P (Sunny | Yes) = = = 0.5
3+3 6
1+1 2 1
P (Rainy | Yes) = = =
3+3 6 3
0+1 1
P (Cloudy | Yes) = =
3+3 6
The sum of the probabilities becomes:
1 1 6
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = 0.5 + + = =1
3 6 6

2 Conclusion
The use of k in Laplacian correction is essential for maintaining the sum of probabilities as 1. Without
k, the smoothing effect is not properly distributed across all possible values of the feature, leading to
overestimated probabilities. By including k, we ensure that the probabilities are correctly normalized
and that unseen values do not disproportionately influence the classification. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Naı̈ve Bayes Classification with Laplacian Correction (Practice


Example)
1 Problem Description
You are tasked with building a Naive Bayesian classifier to predict whether a student will pass or fail a
course based on the following features:
• Attendance: (Low, Medium, High)

• Study Hours: (Low, Medium, High)


• Previous Grades: (Pass, Fail)
The dataset is as follows:

Table 1: Dataset for Pass/Fail Prediction


ID Attendance Study Hours Previous Grades Pass/Fail (Class)
1 Low Low Fail Fail
2 High High Pass Pass
3 Medium Low Pass Fail
4 Medium Medium Fail Fail
5 High Medium Pass Pass
6 Low High Fail Fail
7 Medium Medium Pass Pass
8 Low Low Fail Fail
9 High High Pass Pass
10 Medium Low Fail Fail

2 Questions
1. Calculate the Prior Probabilities for both classes (Pass/Fail).

2. Calculate the Conditional Probabilities for each feature given the class, using Laplacian
Correction to handle any zero probabilities.
3. Classify the following student using the Naive Bayesian algorithm:
• Attendance: Medium
• Study Hours: Medium
• Previous Grades: Pass
4. Compare the result with the original dataset and discuss how Laplacian correction helps prevent
issues with zero probabilities.

3 Expected Learning Outcomes


Students will learn how to handle small datasets with zero probabilities by applying Laplacian correction.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Naı̈ve Bayes Classification for Continuous Values with Laplace


Correction
In a Naı̈ve Bayesian Classification scenario with continuous values like income, you need to treat the
continuous attributes differently from discrete ones. The Gaussian Naı̈ve Bayes model is commonly
used for handling continuous features, where each continuous feature is assumed to follow a normal
(Gaussian) distribution.

1 Problem Statement
We will predict whether a person will buy a computer based on their:
• Age (categorical)
• Income (continuous)

• Student status (categorical)

2 Dataset

Table 1: Dataset for Buy Computer Class Prediction


ID Age Income Student Buys Computer (Class)
1 Youth 300 No No
2 Youth 400 No No
3 Middle 100 No Yes
4 Senior 300 Yes Yes
5 Senior 200 Yes Yes
6 Youth 300 Yes No

We want to predict whether a new person with:


• Age = Senior

• Income = 400
• Student Status = Yes

3 Steps to Perform Naı̈ve Bayes Classification


3.1 Calculate Prior Probabilities
We first calculate the prior probabilities for the target class, i.e., the probability of buying a computer
or not.
Number of people who bought a computer 3
P (Yes) = = = 0.5
Total number of people 6
3
P (No) = = 0.5
6

3.2 Likelihood for Categorical Attributes (Age and Student Status)


We calculate the likelihoods for the categorical attributes (Age and Student status) based on the class.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

3.2.1 Age:
Since we have three categories for the Age attribute (Youth, Middle, and Senior), we set k = 3.
-For Class = Yes:
There are 2 people with Age = Senior who bought a computer. Applying Laplacian correction:
Count of Senior and Yes + 1
P (Age = Senior|Yes) =
Total Yes instances + k
2+1 3
P (Age = Senior|Yes) = = = 0.5
3+3 6
-For Class = No:
There are zero people with Age = Senior who did not buy a computer. Applying Laplacian
correction:
Count of Senior and No + 1
P (Age = Senior|No) =
Total No instances + k
0+1 1
P (Age = Senior|No) = = = 0.167
3+3 6

3.2.2 Student Status:


- For Student = Yes given Yes:
2
P (Student = Yes|Yes) = = 0.67
3
- For Student = Yes given No:
1
P (Student = Yes|No) = = 0.33
3

3.3 Likelihood for Continuous Attribute (Income)


For continuous attributes like Income, we assume that they follow a Gaussian (normal) distribution.
The probability density function for a Gaussian distribution is:

(x − µ)2
 
1
P (x|µ, σ) = √ exp −
2πσ 2 2σ 2
Where:
• x is the income value for the new instance (in this case, 400).
• µ is the mean income for the class.
• σ is the standard deviation of income for the class.
Let’s calculate the likelihood of Income = 400 given the class.

For Class = Yes:


- Mean (µYes ) of income for class Yes:
300 + 200 + 100
µYes = = 200
3
- Standard deviation (σYes ) of income for class Yes:
r r
(300 − 200)2 + (200 − 200)2 + (100 − 200)2 10000 + 0 + 10000
σYes = = = 100
3 3
Now, we calculate the likelihood of Income = 400 given Yes:
(400 − 200)2
 
1
P (Income = 400|Yes) = p exp −
2π(100)2 2(100)2
(200)2
 
1 1
=p exp − =√ exp (−2)
2π(100)2 20000 62831.85
P (Income = 400|Yes) ≈ 0.0054

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

3.3.1 For Class = No:


- Mean (µNo ) of income for class No:
300 + 400 + 300
µNo = = 333.33
3
- Standard deviation (σNo ) of income for class No:
r r
(300 − 333.33)2 + (400 − 333.33)2 + (300 − 333.33)2 1111.11 + 4444.44 + 1111.11
σNo = = = 47.14
3 3
Now, we calculate the likelihood of Income = 400 given No:

(400 − 333.33)2
 
1
P (Income = 400|No) = exp −
2(47.14)2
p
2π(47.14)2

(66.67)2
 
1 1
=p exp − 2
=√ exp (−1)
2π(47.14)2 2(47.14) 13948.41
P (Income = 400|No) ≈ 0.0053

3.4 Calculate Posterior Probabilities


Now we calculate the posterior probabilities for both classes using the Naı̈ve Bayes formula:
- For Class = Yes:

P (Yes|Age=Senior, Income = 400, Student = Yes) = P (Yes) · P (Age = Senior|Yes)


· P (Income = 400|Yes) · P (Student = Yes|Yes)
= 0.5 × 0.5 × 0.0054 × 0.67
= 0.00090

- For Class = No:

P (No|Age=Senior, Income = 400, Student = Yes) = P (No) · P (Age = Senior|No)


· P (Income = 400|No) · P (Student = Yes|No)
= 0.5 × 0.167 × 0.0053 × 0.33
= 0.00015

3.5 Final Prediction


Comparing the posterior probabilities:
- P (Yes) = 0.00090
- P (No) = 0.00015
Since P (Yes) > P (No), the model predicts that the person will buy a computer.

4 Conclusion
For continuous attributes like Income, the Gaussian Naı̈ve Bayes classifier is used, and for categorical
attributes, standard probability calculations are performed. Laplacian correction is applied where
needed to avoid zero probabilities. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Naı̈ve Bayes Classification with Continuous Values (Practice Ex-


ample)
1 Problem Description
You are given a dataset that contains the following attributes for employees of a company:
• Age (continuous)

• Salary (continuous)
• Education Level (High School, Bachelor’s, Master’s)
• Buys Car (Yes/No)
The dataset is as follows:

Table 1: Dataset for Car Purchase Prediction


ID Age Salary Education Level Buys Car (Class)
1 25 30000 Bachelor’s No
2 35 50000 Master’s Yes
3 45 70000 Bachelor’s Yes
4 28 32000 High School No
5 55 60000 Master’s Yes
6 33 40000 High School No
7 38 55000 Bachelor’s Yes
8 30 35000 Bachelor’s No
9 50 65000 Master’s Yes
10 40 52000 Bachelor’s Yes

2 Questions
1. Calculate the Prior Probabilities for the class ”Buys Car” (Yes/No).

2. For the continuous attributes Age and Salary, assume they follow a Gaussian distribution.
For each class:
• Calculate the mean and standard deviation for Age and Salary.
3. Classify the following employee using the Gaussian Naive Bayesian algorithm:

• Age: 37
• Salary: 48000
• Education Level: Bachelor’s
4. Show your calculations for the probability density function of the Gaussian distribution for
continuous values of Age and Salary.
5. Compare the final result with the original dataset and discuss how continuous features are handled
differently from categorical features in Naive Bayes.

3 Expected Learning Outcomes


Students will gain experience in handling continuous attributes using the Gaussian Naı̈ve Bayes approach
and understand how to classify data with continuous features.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Hidden Markov Model Evaluation Problem using Forward Algo-


rithm
In this document, we will solve the evaluation problem using the forward algorithm for a given
Hidden Markov Model (HMM). The problem we are solving is to find the probability of the observation
sequence: {Walk, Shop, Clean}

1 Historical Data
The table below shows the historical data of 10 days, with hidden weather states (Sunny or Rainy) and
observable activities (Walk, Shop, Clean).

Table 1: Historical Data of Weather (Hidden States) and Activities (Observations)


Day Weather (Hidden State) Activity (Observation)
1 Sunny Walk
2 Sunny Shop
3 Rainy Clean
4 Sunny Walk
5 Rainy Shop
6 Rainy Clean
7 Sunny Clean
8 Rainy Walk
9 Sunny Shop
10 Rainy Clean

We are given a Hidden Markov Model (HMM) for weather prediction with the following parameters:

• Hidden States: S = {Rainy, Sunny}


• Observations: O = {Walk, Shop, Clean}
• Transition Probabilities:
The transition matrix A for a Hidden Markov Model is defined as:
 
a11 a12 ··· a1N
 a21 a22 ··· a2N 
A= .
 
.. .. .. 
 .. . . . 
aN 1 aN 2 ··· aN N

Where:
– aij = P (St = j | St−1 = i), represents the probability of transitioning from state i to state j,
– St is the hidden state at time step t,
– N is the number of hidden states.
The sum of each row in the transition matrix must equal 1:

N
X
aij = 1, ∀i
j=1

Thus, the transition matrix A shows the probability of moving from one hidden weather state
(Sunny or Rainy) to another.

1 4
 
P (Sunny|Sunny) = 5 P (Rainy|Sunny) = 5
A= 3 1
P (Sunny|Rainy) = 4 P (Rainy|Rainy) = 4

This results in the following transition matrix:

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

 
0.20 0.80
A=
0.75 0.25

• Emission Probabilities:
The emission matrix B for a Hidden Markov Model is defined as:
 
b11 b12 ··· b1M
 b21 b22 ··· b2M 
B= .
 
.. .. .. 
 .. . . . 
bN 1 bN 2 ··· bN M

Where:
– bij = P (oj | Si ), represents the probability of observing oj (the j-th observation) given the
hidden state Si (the i-th hidden state),
– Si is the hidden state at time t,
– oj is an observation,
– N is the number of hidden states,
– M is the number of possible observations.
The sum of each row in the emission matrix must equal 1:

M
X
bij = 1, ∀i
j=1

Thus, the emission matrix B represents the probabilities of observing activities (Walk, Shop, Clean)
given a weather state (Sunny or Rainy).

2 2 1
 
P (Walk|Sunny) = 5 P (Shop|Sunny) = 5 P (Clean|Sunny) = 5
B= 1 1 3
P (Walk|Rainy) = 5 P (Shop|Rainy) = 5 P (Clean|Rainy) = 5

This results in the following emission matrix:


 
0.4 0.4 0.2
B=
0.2 0.2 0.6

• Initial Probabilities:
π = {P (Sunny) = 0.5, P (Rainy) = 0.5}

We want to compute the probability of the observation sequence O = {Walk, Shop, Clean} using the
forward algorithm.

2 Solution
2.1 Initialization
The initialization step involves calculating the forward probabilities for each hidden state at time t = 1,
which corresponds to the first observation (Walk ). The formula for initialization is:

α1 (i) = πi · bi (o1 )
Where:
• α1 (i) is the forward probability for state i at time t = 1,
• πi is the initial probability of state i,
• bi (o1 ) is the emission probability of observing o1 in state i.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Using the initial probabilities and emission probabilities for the first observation (Walk ), we calculate:

α1 (Sunny) = π(Sunny) · P (Walk|Sunny) = 0.5 × 0.4 = 0.2

α1 (Rainy) = π(Rainy) · P (Walk|Rainy) = 0.5 × 0.2 = 0.1

2.2 Recursion
For each subsequent time step t, the forward probabilities are recursively calculated using the formula:
N
X
αt (j) = αt−1 (i) · aij · bj (ot )
i=1
Where:
• αt (j) is the forward probability for state j at time t,
• αt−1 (i) is the forward probability for state i at time t − 1,
• aij is the transition probability from state i to state j,
• bj (ot ) is the emission probability of observing ot in state j.
At t = 2 (second observation: Shop), we calculate:

α2 (Sunny) = [α1 (Sunny) · P (Sunny | Sunny) + α1 (Rainy) · P (Sunny | Rainy)] · P (Shop | Sunny)
α2 (Sunny) = [0.2 · 0.20 + 0.1 · 0.80] · 0.4
= [0.04 + 0.08] · 0.4
= 0.12 · 0.4 = 0.048

α2 (Rainy) = [α1 (Sunny) · P (Rainy | Sunny) + α1 (Rainy) · P (Rainy | Rainy)] · P (Shop | Rainy)
α2 (Rainy) = [0.2 · 0.75 + 0.1 · 0.25] · 0.2
= [0.15 + 0.025] · 0.2
= 0.175 · 0.2 = 0.035
At t = 3 (third observation: Clean), we calculate:

α3 (Sunny) = [α2 (Sunny) · P (Sunny | Sunny) + α2 (Rainy) · P (Sunny | Rainy)] · P (Clean | Sunny)
α3 (Sunny) = [0.048 · 0.20 + 0.035 · 0.80] · 0.2
= [0.0096 + 0.028] · 0.2
= 0.0376 · 0.2 = 0.00752

α3 (Rainy) = [α2 (Sunny) · P (Rainy | Sunny) + α2 (Rainy) · P (Rainy | Rainy)] · P (Clean | Rainy)
α3 (Rainy) = [0.048 · 0.75 + 0.035 · 0.25] · 0.6
= [0.036 + 0.00875] · 0.6
= 0.04475 · 0.6 = 0.02685

2.3 Termination
The total probability of observing the sequence is calculated by summing the forward probabilities at
the final time step T :
N
X
P (O | λ) = αT (i)
i=1
Where:

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

• P (O | λ) is the total probability of the observation sequence O,


• αT (i) is the forward probability for state i at the final time step T .
Thus, the final step involves summing the forward probabilities at time t = 3 to obtain the total
probability of the observation sequence O = {Walk, Shop, Clean}:

P (O|λ) = α3 (Sunny) + α3 (Rainy) = 0.00752 + 0.02685 = 0.03437

Thus, the probability of observing the sequence Walk, Shop, Clean given the HMM is approxi-
mately P (O|λ) = 0.03437.

3 Conclusion
In this example, we used the forward algorithm to calculate the probability of the observation sequence
Walk, Shop, Clean using the Hidden Markov Model’s transition and emission matrices. The result shows
how likely this sequence is under the given HMM parameters. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Practice Example: Hidden Markov Model (HMM) Sequence


Prediction (Human Mood and Activities)
1 Problem Description
Your objective is to predict the probability of an observation sequence based on a person’s mood (hidden
states) and the activities they perform. The possible moods are Happy and Sad, and the activities
people perform are Exercise, Read, and Watch TV.
Your goal is to:
1. Calculate the transition matrix for the hidden states (Happy and Sad).
2. Calculate the emission matrix for the observed activities (Exercise, Read, Watch TV).
3. Use the forward algorithm to calculate the probability of a given observation sequence.

2 Historical Data
The following table represents 12 days of human mood conditions (hidden states) and observed activities.

Table 1: Raw Data of Moods and Activities


Day Mood (Hidden State) Activity (Observation)
1 Happy Exercise
2 Sad Watch TV
3 Happy Read
4 Happy Exercise
5 Sad Watch TV
6 Sad Read
7 Happy Exercise
8 Sad Watch TV
9 Happy Exercise
10 Happy Read
11 Sad Watch TV
12 Happy Exercise

3 Questions
3.1 Calculate the Transition Matrix
The transition matrix represents the probability of moving from one mood to another between consecutive
days (e.g., from Happy to Sad).

3.2 Calculate the Emission Matrix


The emission matrix represents the probability of observing each activity (Exercise, Read, Watch TV)
given a particular mood (Happy or Sad).

3.3 Predict the Probability of an Observation Sequence


Given the observation sequence: O = (Exercise, Read, Watch TV), use the forward algorithm to calculate
the probability that this sequence was generated by the HMM with the transition and emission matrices
you calculated.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong


Bayesian Inference for AI

Instructions
• Transition Matrix Calculation: Count the transitions from Happy to Happy, Happy to Sad,
Sad to Happy, and Sad to Sad from the raw data.
• Emission Matrix Calculation: Count how many times each activity (Exercise, Read, Watch
TV) occurred given the mood of the person.

• Forward Algorithm: Use the forward algorithm step-by-step to calculate the final probability of
the observation sequence O = (Exercise, Read, Watch TV).

Hints
• For the transition matrix, the probabilities should sum to 1 for each row (from a given mood).
• For the emission matrix, the probabilities should sum to 1 for each mood.
• Use the forward algorithm formulas to compute the final probability.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

You might also like