0% found this document useful (0 votes)

26 views

Bayesian_Inference_for_AI

Uploaded by

Marufa Sultana

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Bayesian_Inference_for_AI

Uploaded by

Marufa Sultana

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Bayesian Inference for AI: A Comprehensive Guide to Bayesian

Networks, Naı̈ve Bayes and Hidden Markov Models

Bayesian Network
A Bayesian Network is a directed acyclic graph (DAG) that represents the probabilistic relationships
among a set of random variables. Each node in the graph corresponds to a random variable, and the
edges represent the conditional dependencies between these variables.
Formally, a Bayesian Network for a set of random variables X1 , X2 , . . . , Xn consists of:

• A set of nodes representing the variables X1 , X2 , . . . , Xn ,

• A set of directed edges that encode conditional dependencies between the variables,
• A set of conditional probability distributions P (Xi | Parents(Xi )), where Parents(Xi ) represents
the set of parents of Xi in the network.

1 Joint Probability Distribution

The joint probability distribution of all the variables in a Bayesian Network can be factored as the product
of the conditional probabilities of each variable given its parents in the network. Mathematically, the
joint probability distribution can be written as:
n
Y
P (X1 , X2 , . . . , Xn ) = P (Xi | Parents(Xi ))
i=1

This factorization is based on the assumption that each variable is conditionally independent of its
non-descendants given its parents.

Example
Consider a Bayesian Network with three variables: X1 , X2 , and X3 , where:

• X1 has no parents,
• X2 has X1 as its parent,
• X3 has X1 and X2 as its parents.
The joint probability distribution for this network is:

P (X1 , X2 , X3 ) = P (X1 ) · P (X2 | X1 ) · P (X3 | X1 , X2 )

Let’s work through an example similar to the alarm for burglary and earthquake scenario, where we
will calculate everything from scratch, including the dataset, conditional probabilities, the Bayesian Belief
Network and how to perform inference using Conditional Probability Tables (CPTs). A small historical
database that contains simulated data for burglary, earthquake, alarm activation, and whether John
and Mary called. We will use this data to calculate the prior probabilities and conditional probabilities
needed for the Bayesian Belief Network.

1
Bayesian Inference for AI

2 Events
• Burglary (B): There is a burglary at your house.
• Earthquake (E): An earthquake occurs.
• Alarm (A): The alarm goes off.
• JohnCalls (J): John calls if he hears the alarm.
• MaryCalls (M): Mary calls if she hears the alarm.

3 Bayesian Network Graph

The following is the graphical representation of the Bayesian Network for this scenario:

Burglary (B) Earthquake (E)

Alarm (A)

John Calls (J) Mary Calls (M)

Figure 1: Bayesian Network

4 Problem Statement
We are tasked with finding the probability of the event where the alarm has sounded (A = Yes), a
burglary has occurred (B = Yes), an earthquake has not occurred (E = No), and both John and Mary
have called (J = Yes, M = Yes). We will calculate the joint probability of this event using a Bayesian
network model based on the dataset provided.

5 Historical Database
Below is the historical data showing occurrences of burglary, earthquake, alarm activations, and calls
from John and Mary:

6 Prior and Conditional Probabilities

From the dataset, we can calculate the following prior and conditional probabilities:

6.1 Prior Probabilities

• P (B = Yes) = 4
10 = 0.4
• P (B = No) = 6
10 = 0.6
• P (E = Yes) = 5
10 = 0.5
• P (E = No) = 5
10 = 0.5

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Table 1: Historical dataset showing Burglary, Earthquake, Alarm, John Calls, and Mary Calls
Event ID Burglary (B) Earthquake (E) Alarm (A) John Calls (J) Mary Calls (M)
1 Yes No Yes Yes Yes
2 No Yes Yes Yes No
3 Yes Yes Yes Yes Yes
4 No No No No No
5 Yes No Yes No Yes
6 No Yes No No No
7 No No Yes Yes No
8 Yes Yes Yes Yes Yes
9 No No No No No
10 No Yes Yes Yes Yes

6.2 Conditional Probability Tables (CPT)

Next, we calculate the conditional probabilities for the alarm going off given different combinations of
burglary and earthquake, and the likelihood of John and Mary calling given that the alarm went off or
didn’t go off.

6.2.1 Conditional Probability of Alarm P (A|B, E):

• Case 1: P (A = Yes|B = Yes, E = Yes):
– Number of cases where B = Yes and E = Yes: 2.
– Out of these, the alarm went off in all 2 cases.
Thus, we can calculate:
2
P (A = Yes|B = Yes, E = Yes) = = 1.0
2
• Case 2: P (A = Yes|B = Yes, E = No):
– Number of cases where B = Yes and E = No: 2.
– Out of these, the alarm went off in 2 cases.
Thus, we calculate:
2
P (A = Yes|B = Yes, E = No) = = 1.0
2
• Case 3: P (A = Yes|B = No, E = Yes):

– Number of cases where B = No and E = Yes: 3.

– Out of these, the alarm went off in 2 cases.
Thus, we calculate:
2
P (A = Yes|B = No, E = Yes) = = 0.67
3
• Case 4: P (A = Yes|B = No, E = No):
– Number of cases where B = No and E = No: 3.
– Out of these, the alarm went off in 1 case.

Thus, we calculate:
1
P (A = Yes|B = No, E = No) = = 0.33
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Table 2: Conditional Probability Table for Alarm (A) given Burglary (B) and Earthquake (E)
Burglary (B) Earthquake (E) Alarm (A = Yes) Alarm (A = No)
Yes Yes 1.0 0.0
Yes No 1.0 0.0
No Yes 0.67 0.33
No No 0.33 0.67

6.2.2 CPT for John Calls given Alarm P (J|A)

• Case 1: P (J = Yes|A = Yes)

– Number of cases where the alarm went off (A = Yes): 7.

– Out of these, John called in 6 cases.

Thus, we calculate:
6
P (J = Yes|A = Yes) = = 0.86
7
• Case 2: P (J = Yes|A = No)

– Number of cases where the alarm did not go off (A = No): 3.

– Out of these, John called in 0 cases.

Thus, we calculate:
0
P (J = Yes|A = No) = = 0.0
3

Table 3: CPT for John Calls (J) given Alarm (A)

Alarm (A) John Calls (J = Yes) John Calls (J = No)
Yes 0.86 0.14
No 0.0 1.0

6.2.3 CPT for Mary Calls given Alarm P (M |A)

• Case 1: P (M = Yes|A = Yes)

– Number of cases where the alarm went off (A = Yes): 7.

– Out of these, Mary called in 5 cases.

Thus, we calculate:
5
P (M = Yes|A = Yes) = = 0.71
7
• Case 2: P (M = Yes|A = No)

– Number of cases where the alarm did not go off (A = No): 3.

– Out of these, Mary called in 0 cases.

Thus, we calculate:
0
P (M = Yes|A = No) = = 0.0
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Table 4: CPT for Mary Calls (M) given Alarm (A)

Alarm (A) Mary Calls (M = Yes) Mary Calls (M = No)
Yes 0.71 0.29
No 0.0 1.0

7 Solution
We need to calculate the joint probability:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes)

Using the chain rule for conditional probabilities:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = P (B = Yes) · P (E = No) · P (A = Yes|B = Yes, E = No)
· P (J = Yes|A = Yes) · P (M = Yes|A = Yes)

Substituting the values, we get:

P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = 0.4 × 0.5 × 1.0 × 0.86 × 0.71
= 0.1221

The probability of the event that a burglary has occurred, no earthquake has occurred, the alarm has
sounded, and both John and Mary called is approximately 0.1221. This means there is a 12.21% chance
of this specific combination of events happening, based on the given probabilities. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Bayesian Network (Practice Example)

Your objective is to predict whether a student will pass an exam based on the following features:
• Study Hours: How many hours the student studies per day (Low, Medium, High).
• Attendance: Whether the student attends classes regularly (Regular, Irregular).

• Sleep Hours: How many hours the student sleeps per day (Low, Medium, High).
• Pass Exam: Whether the student passes the exam (Yes, No).

1 Historical Database
The following table shows the raw data for 12 students:

Table 1: Training Data for Exam Passing Prediction

Student ID Study Hours Attendance Sleep Hours Pass Exam (Class)
1 Low Irregular Low No
2 Medium Regular Medium Yes
3 High Regular High Yes
4 Medium Regular High Yes
5 Low Irregular Low No
6 Low Irregular Medium No
7 High Regular High Yes
8 Medium Irregular Medium No
9 Medium Regular Medium Yes
10 High Regular High Yes
11 Low Regular Medium No
12 Medium Irregular Medium No

2 Bayesian Network Structure

The structure of the Bayesian Network is as follows:
• Study Hours and Attendance influence Sleep Hours.

• Study Hours, Attendance, and Sleep Hours together influence the probability of Pass Exam.

3 Bayesian Network Diagram

The diagram below represents the Bayesian Network for predicting whether a student will pass the exam
based on their study habits, attendance, and sleep hours.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Study Hours Attendance

Sleep Hours

Pass Exam

Figure 1: Bayesian Network for Exam Passing Prediction

4 Questions
1. Construct the Bayesian Network: Define the Conditional Probability Tables (CPTs) based on
the raw database.
2. Inference: Calculate the probability that a student will pass the exam given:

• Study Hours: Medium

• Attendance: Regular
• Sleep Hours: Medium
3. Explain the Impact: Explain how changing the values of Study Hours, Attendance, and Sleep
Hours affects the probability of passing the exam.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Naı̈ve Bayes Classification with Laplacian Correction

Let’s consider a small example of a Naı̈ve Bayesian Classifier where we predict whether a person will
buy a computer based on their income and student status. The dataset is small, and in this case,
we need to use Laplacian correction (also known as Laplace smoothing) to handle cases where we
have zero counts in the conditional probabilities.

1 Dataset
We consider the following dataset where we predict whether a person will buy a computer based on their
income and student status:

Table 1: Dataset for Buy Computer Class Prediction

ID Income Student Buys Computer (Class)
1 High No No
2 High No No
3 Medium No Yes
4 Low Yes Yes
5 Low Yes Yes
6 Low No No

We aim to predict whether a person with High Income and Student Status = Yes will buy a computer.

2 Steps to Perform Naı̈ve Bayes Classification

2.1 Prior Probabilities
The prior probabilities for each class are calculated as follows:
Number of people who bought a computer 3
P (Yes) = = = 0.5
Total number of people 6
Number of people who did not buy a computer 3
P (No) = = = 0.5
Total number of people 6

2.2 Likelihoods Without Laplacian Correction

Next, we calculate the likelihoods for each feature given the class.

2.2.1 Income
For the class Yes (Buys Computer):
0
P (Income = High|Yes) = =0
3
For the class No (Does Not Buys Computer):
2
P (Income = High|No) = = 0.67
3

2.2.2 Student Status

For the class Yes:
2
P (Student = Yes|Yes) = = 0.67
3
For the class No:
0
P (Student = Yes|No) = =0
3

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

2.3 Laplacian Correction

To avoid zero probabilities, we apply Laplacian correction by adding 1 to the counts.

2.3.1 Corrected Likelihoods for Income (with k = 3)

For the class Yes:
0+1 1
P (Income = High|Yes) = = = 0.167
3+3 6
For the class No:
2+1 3
P (Income = High|No) = = = 0.5
3+3 6

2.3.2 Corrected Likelihoods for Student Status (with k = 2)

For the class Yes:
2+1 3
P (Student = Yes|Yes) = = = 0.6
3+2 5
For the class No:
0+1 1
P (Student = Yes|No) = = = 0.2
3+2 5

2.4 Posterior Probabilities

Using the corrected likelihoods, we calculate the posterior probabilities for each class using the Naı̈ve
Bayes formula.

2.4.1 For Class = Yes

P (Yes|High Income, Student = Yes) = P (Yes) · P (Income = High|Yes) · P (Student = Yes|Yes)
= 0.5 × 0.167 × 0.6 = 0.0501

2.4.2 For Class = No

P (No|High Income, Student = Yes) = P (No) · P (Income = High|No) · P (Student = Yes|No)
= 0.5 × 0.5 × 0.2 = 0.05

2.5 Final Prediction

Since P (Yes|High Income, Student = Yes) = 0.0501 is greater than P (No|High Income, Student = Yes) =
0.05, the model predicts that the person will buy a computer. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Effect of Using k in Laplacian Correction for Naı̈ve Bayesian

Classification
In Naı̈ve Bayesian classification, Laplacian correction (also known as Laplace smoothing) is used to
handle zero probabilities when a feature value does not appear in the training data for a particular class.
The correction prevents the entire probability from becoming zero when multiplying the probabilities
of feature values. A key aspect of Laplacian correction is the use of k, the number of possible values a
feature can take. This document demonstrates the effect of not using k in Laplacian correction and how
it can result in incorrect probabilities.

1 Problem Statement
Consider a feature called Weather with three possible values:
• Sunny
• Rainy
• Cloudy
We are predicting whether people will go Outdoors (Yes/No) based on the weather. The training
dataset is shown below:

Table 1: Training Dataset

ID Weather Outdoors (Class)
1 Sunny Yes
2 Sunny Yes
3 Rainy Yes
4 Sunny No
5 Rainy No
6 Sunny No

1.1 Without Laplacian Correction

The probabilities for the weather conditions given “Outdoors = Yes” before any smoothing are:
2 1
P (Sunny | Yes) = , P (Rainy | Yes) = , P (Cloudy | Yes) = 0
3 3
The sum of the probabilities is:
2 1
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = + +0=1
3 3
However, the zero probability for Cloudy will cause problems if this value is encountered during classifi-
cation.

1.2 With Laplacian Correction (Without k)

Using Laplacian correction without k (i.e., adding 1 to the numerator and denominator), we calculate
the probabilities as:
2+1 3 1+1 2
P (Sunny | Yes) = = , P (Rainy | Yes) = = = 0.5
3+1 4 3+1 4
0+1 1
P (Cloudy | Yes) = =
3+1 4
The sum of the probabilities becomes:
3 1 6
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = + 0.5 + = = 1.5
4 4 4
This is greater than 1, which violates the rules of probability.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

1.3 With Laplacian Correction (With k = 3)

Now, using Laplacian correction with k = 3 (since there are three possible values for Weather : Sunny,
Rainy, Cloudy), we recalculate the probabilities:
2+1 3
P (Sunny | Yes) = = = 0.5
3+3 6
1+1 2 1
P (Rainy | Yes) = = =
3+3 6 3
0+1 1
P (Cloudy | Yes) = =
3+3 6
The sum of the probabilities becomes:
1 1 6
P (Sunny | Yes) + P (Rainy | Yes) + P (Cloudy | Yes) = 0.5 + + = =1
3 6 6

2 Conclusion
The use of k in Laplacian correction is essential for maintaining the sum of probabilities as 1. Without
k, the smoothing effect is not properly distributed across all possible values of the feature, leading to
overestimated probabilities. By including k, we ensure that the probabilities are correctly normalized
and that unseen values do not disproportionately influence the classification. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Naı̈ve Bayes Classification with Laplacian Correction (Practice

Example)
1 Problem Description
You are tasked with building a Naive Bayesian classifier to predict whether a student will pass or fail a
course based on the following features:
• Attendance: (Low, Medium, High)

• Study Hours: (Low, Medium, High)

• Previous Grades: (Pass, Fail)
The dataset is as follows:

Table 1: Dataset for Pass/Fail Prediction

ID Attendance Study Hours Previous Grades Pass/Fail (Class)
1 Low Low Fail Fail
2 High High Pass Pass
3 Medium Low Pass Fail
4 Medium Medium Fail Fail
5 High Medium Pass Pass
6 Low High Fail Fail
7 Medium Medium Pass Pass
8 Low Low Fail Fail
9 High High Pass Pass
10 Medium Low Fail Fail

2 Questions
1. Calculate the Prior Probabilities for both classes (Pass/Fail).

2. Calculate the Conditional Probabilities for each feature given the class, using Laplacian
Correction to handle any zero probabilities.
3. Classify the following student using the Naive Bayesian algorithm:
• Attendance: Medium
• Study Hours: Medium
• Previous Grades: Pass
4. Compare the result with the original dataset and discuss how Laplacian correction helps prevent
issues with zero probabilities.

3 Expected Learning Outcomes

Students will learn how to handle small datasets with zero probabilities by applying Laplacian correction.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Naı̈ve Bayes Classification for Continuous Values with Laplace

Correction
In a Naı̈ve Bayesian Classification scenario with continuous values like income, you need to treat the
continuous attributes differently from discrete ones. The Gaussian Naı̈ve Bayes model is commonly
used for handling continuous features, where each continuous feature is assumed to follow a normal
(Gaussian) distribution.

1 Problem Statement
We will predict whether a person will buy a computer based on their:
• Age (categorical)
• Income (continuous)

• Student status (categorical)

2 Dataset

Table 1: Dataset for Buy Computer Class Prediction

ID Age Income Student Buys Computer (Class)
1 Youth 300 No No
2 Youth 400 No No
3 Middle 100 No Yes
4 Senior 300 Yes Yes
5 Senior 200 Yes Yes
6 Youth 300 Yes No

We want to predict whether a new person with:

• Age = Senior

• Income = 400
• Student Status = Yes

3 Steps to Perform Naı̈ve Bayes Classification

3.1 Calculate Prior Probabilities
We first calculate the prior probabilities for the target class, i.e., the probability of buying a computer
or not.
Number of people who bought a computer 3
P (Yes) = = = 0.5
Total number of people 6
3
P (No) = = 0.5
6

3.2 Likelihood for Categorical Attributes (Age and Student Status)

We calculate the likelihoods for the categorical attributes (Age and Student status) based on the class.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

3.2.1 Age:
Since we have three categories for the Age attribute (Youth, Middle, and Senior), we set k = 3.
-For Class = Yes:
There are 2 people with Age = Senior who bought a computer. Applying Laplacian correction:
Count of Senior and Yes + 1
P (Age = Senior|Yes) =
Total Yes instances + k
2+1 3
P (Age = Senior|Yes) = = = 0.5
3+3 6
-For Class = No:
There are zero people with Age = Senior who did not buy a computer. Applying Laplacian
correction:
Count of Senior and No + 1
P (Age = Senior|No) =
Total No instances + k
0+1 1
P (Age = Senior|No) = = = 0.167
3+3 6

3.2.2 Student Status:

- For Student = Yes given Yes:
2
P (Student = Yes|Yes) = = 0.67
3
- For Student = Yes given No:
1
P (Student = Yes|No) = = 0.33
3

3.3 Likelihood for Continuous Attribute (Income)

For continuous attributes like Income, we assume that they follow a Gaussian (normal) distribution.
The probability density function for a Gaussian distribution is:

(x − µ)2

1
P (x|µ, σ) = √ exp −
2πσ 2 2σ 2
Where:
• x is the income value for the new instance (in this case, 400).
• µ is the mean income for the class.
• σ is the standard deviation of income for the class.
Let’s calculate the likelihood of Income = 400 given the class.

For Class = Yes:

- Mean (µYes ) of income for class Yes:
300 + 200 + 100
µYes = = 200
3
- Standard deviation (σYes ) of income for class Yes:
r r
(300 − 200)2 + (200 − 200)2 + (100 − 200)2 10000 + 0 + 10000
σYes = = = 100
3 3
Now, we calculate the likelihood of Income = 400 given Yes:
(400 − 200)2

1
P (Income = 400|Yes) = p exp −
2π(100)2 2(100)2
(200)2

1 1
=p exp − =√ exp (−2)
2π(100)2 20000 62831.85
P (Income = 400|Yes) ≈ 0.0054

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

3.3.1 For Class = No:

- Mean (µNo ) of income for class No:
300 + 400 + 300
µNo = = 333.33
3
- Standard deviation (σNo ) of income for class No:
r r
(300 − 333.33)2 + (400 − 333.33)2 + (300 − 333.33)2 1111.11 + 4444.44 + 1111.11
σNo = = = 47.14
3 3
Now, we calculate the likelihood of Income = 400 given No:

(400 − 333.33)2

1
P (Income = 400|No) = exp −
2(47.14)2
p
2π(47.14)2

(66.67)2

1 1
=p exp − 2
=√ exp (−1)
2π(47.14)2 2(47.14) 13948.41
P (Income = 400|No) ≈ 0.0053

3.4 Calculate Posterior Probabilities

Now we calculate the posterior probabilities for both classes using the Naı̈ve Bayes formula:
- For Class = Yes:

P (Yes|Age=Senior, Income = 400, Student = Yes) = P (Yes) · P (Age = Senior|Yes)

· P (Income = 400|Yes) · P (Student = Yes|Yes)
= 0.5 × 0.5 × 0.0054 × 0.67
= 0.00090

- For Class = No:

P (No|Age=Senior, Income = 400, Student = Yes) = P (No) · P (Age = Senior|No)

· P (Income = 400|No) · P (Student = Yes|No)
= 0.5 × 0.167 × 0.0053 × 0.33
= 0.00015

3.5 Final Prediction

Comparing the posterior probabilities:
- P (Yes) = 0.00090
- P (No) = 0.00015
Since P (Yes) > P (No), the model predicts that the person will buy a computer.

4 Conclusion
For continuous attributes like Income, the Gaussian Naı̈ve Bayes classifier is used, and for categorical
attributes, standard probability calculations are performed. Laplacian correction is applied where
needed to avoid zero probabilities. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Naı̈ve Bayes Classification with Continuous Values (Practice Ex-

ample)
1 Problem Description
You are given a dataset that contains the following attributes for employees of a company:
• Age (continuous)

• Salary (continuous)
• Education Level (High School, Bachelor’s, Master’s)
• Buys Car (Yes/No)
The dataset is as follows:

Table 1: Dataset for Car Purchase Prediction

ID Age Salary Education Level Buys Car (Class)
1 25 30000 Bachelor’s No
2 35 50000 Master’s Yes
3 45 70000 Bachelor’s Yes
4 28 32000 High School No
5 55 60000 Master’s Yes
6 33 40000 High School No
7 38 55000 Bachelor’s Yes
8 30 35000 Bachelor’s No
9 50 65000 Master’s Yes
10 40 52000 Bachelor’s Yes

2 Questions
1. Calculate the Prior Probabilities for the class ”Buys Car” (Yes/No).

2. For the continuous attributes Age and Salary, assume they follow a Gaussian distribution.
For each class:
• Calculate the mean and standard deviation for Age and Salary.
3. Classify the following employee using the Gaussian Naive Bayesian algorithm:

• Age: 37
• Salary: 48000
• Education Level: Bachelor’s
4. Show your calculations for the probability density function of the Gaussian distribution for
continuous values of Age and Salary.
5. Compare the final result with the original dataset and discuss how continuous features are handled
differently from categorical features in Naive Bayes.

3 Expected Learning Outcomes

Students will gain experience in handling continuous attributes using the Gaussian Naı̈ve Bayes approach
and understand how to classify data with continuous features.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Hidden Markov Model Evaluation Problem using Forward Algo-

rithm
In this document, we will solve the evaluation problem using the forward algorithm for a given
Hidden Markov Model (HMM). The problem we are solving is to find the probability of the observation
sequence: {Walk, Shop, Clean}

1 Historical Data
The table below shows the historical data of 10 days, with hidden weather states (Sunny or Rainy) and
observable activities (Walk, Shop, Clean).

Table 1: Historical Data of Weather (Hidden States) and Activities (Observations)

Day Weather (Hidden State) Activity (Observation)
1 Sunny Walk
2 Sunny Shop
3 Rainy Clean
4 Sunny Walk
5 Rainy Shop
6 Rainy Clean
7 Sunny Clean
8 Rainy Walk
9 Sunny Shop
10 Rainy Clean

We are given a Hidden Markov Model (HMM) for weather prediction with the following parameters:

• Hidden States: S = {Rainy, Sunny}

• Observations: O = {Walk, Shop, Clean}
• Transition Probabilities:
The transition matrix A for a Hidden Markov Model is defined as:
 
a11 a12 ··· a1N
 a21 a22 ··· a2N 
A= .
 
.. .. .. 
 .. . . . 
aN 1 aN 2 ··· aN N

Where:
– aij = P (St = j | St−1 = i), represents the probability of transitioning from state i to state j,
– St is the hidden state at time step t,
– N is the number of hidden states.
The sum of each row in the transition matrix must equal 1:

N
X
aij = 1, ∀i
j=1

Thus, the transition matrix A shows the probability of moving from one hidden weather state
(Sunny or Rainy) to another.

1 4

P (Sunny|Sunny) = 5 P (Rainy|Sunny) = 5
A= 3 1
P (Sunny|Rainy) = 4 P (Rainy|Rainy) = 4

This results in the following transition matrix:

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

0.20 0.80
A=
0.75 0.25

• Emission Probabilities:
The emission matrix B for a Hidden Markov Model is defined as:
 
b11 b12 ··· b1M
 b21 b22 ··· b2M 
B= .
 
.. .. .. 
 .. . . . 
bN 1 bN 2 ··· bN M

Where:
– bij = P (oj | Si ), represents the probability of observing oj (the j-th observation) given the
hidden state Si (the i-th hidden state),
– Si is the hidden state at time t,
– oj is an observation,
– N is the number of hidden states,
– M is the number of possible observations.
The sum of each row in the emission matrix must equal 1:

M
X
bij = 1, ∀i
j=1

Thus, the emission matrix B represents the probabilities of observing activities (Walk, Shop, Clean)
given a weather state (Sunny or Rainy).

This results in the following emission matrix:

0.4 0.4 0.2
B=
0.2 0.2 0.6

• Initial Probabilities:
π = {P (Sunny) = 0.5, P (Rainy) = 0.5}

We want to compute the probability of the observation sequence O = {Walk, Shop, Clean} using the
forward algorithm.

2 Solution
2.1 Initialization
The initialization step involves calculating the forward probabilities for each hidden state at time t = 1,
which corresponds to the first observation (Walk ). The formula for initialization is:

α1 (i) = πi · bi (o1 )
Where:
• α1 (i) is the forward probability for state i at time t = 1,
• πi is the initial probability of state i,
• bi (o1 ) is the emission probability of observing o1 in state i.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Using the initial probabilities and emission probabilities for the first observation (Walk ), we calculate:

α1 (Sunny) = π(Sunny) · P (Walk|Sunny) = 0.5 × 0.4 = 0.2

α1 (Rainy) = π(Rainy) · P (Walk|Rainy) = 0.5 × 0.2 = 0.1

2.2 Recursion
For each subsequent time step t, the forward probabilities are recursively calculated using the formula:
N
X
αt (j) = αt−1 (i) · aij · bj (ot )
i=1
Where:
• αt (j) is the forward probability for state j at time t,
• αt−1 (i) is the forward probability for state i at time t − 1,
• aij is the transition probability from state i to state j,
• bj (ot ) is the emission probability of observing ot in state j.
At t = 2 (second observation: Shop), we calculate:

α2 (Sunny) = [α1 (Sunny) · P (Sunny | Sunny) + α1 (Rainy) · P (Sunny | Rainy)] · P (Shop | Sunny)
α2 (Sunny) = [0.2 · 0.20 + 0.1 · 0.80] · 0.4
= [0.04 + 0.08] · 0.4
= 0.12 · 0.4 = 0.048

α2 (Rainy) = [α1 (Sunny) · P (Rainy | Sunny) + α1 (Rainy) · P (Rainy | Rainy)] · P (Shop | Rainy)
α2 (Rainy) = [0.2 · 0.75 + 0.1 · 0.25] · 0.2
= [0.15 + 0.025] · 0.2
= 0.175 · 0.2 = 0.035
At t = 3 (third observation: Clean), we calculate:

α3 (Sunny) = [α2 (Sunny) · P (Sunny | Sunny) + α2 (Rainy) · P (Sunny | Rainy)] · P (Clean | Sunny)
α3 (Sunny) = [0.048 · 0.20 + 0.035 · 0.80] · 0.2
= [0.0096 + 0.028] · 0.2
= 0.0376 · 0.2 = 0.00752

α3 (Rainy) = [α2 (Sunny) · P (Rainy | Sunny) + α2 (Rainy) · P (Rainy | Rainy)] · P (Clean | Rainy)
α3 (Rainy) = [0.048 · 0.75 + 0.035 · 0.25] · 0.6
= [0.036 + 0.00875] · 0.6
= 0.04475 · 0.6 = 0.02685

2.3 Termination
The total probability of observing the sequence is calculated by summing the forward probabilities at
the final time step T :
N
X
P (O | λ) = αT (i)
i=1
Where:

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

• P (O | λ) is the total probability of the observation sequence O,

• αT (i) is the forward probability for state i at the final time step T .
Thus, the final step involves summing the forward probabilities at time t = 3 to obtain the total
probability of the observation sequence O = {Walk, Shop, Clean}:

P (O|λ) = α3 (Sunny) + α3 (Rainy) = 0.00752 + 0.02685 = 0.03437

Thus, the probability of observing the sequence Walk, Shop, Clean given the HMM is approxi-
mately P (O|λ) = 0.03437.

3 Conclusion
In this example, we used the forward algorithm to calculate the probability of the observation sequence
Walk, Shop, Clean using the Hidden Markov Model’s transition and emission matrices. The result shows
how likely this sequence is under the given HMM parameters. ■

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Practice Example: Hidden Markov Model (HMM) Sequence

Prediction (Human Mood and Activities)
1 Problem Description
Your objective is to predict the probability of an observation sequence based on a person’s mood (hidden
states) and the activities they perform. The possible moods are Happy and Sad, and the activities
people perform are Exercise, Read, and Watch TV.
Your goal is to:
1. Calculate the transition matrix for the hidden states (Happy and Sad).
2. Calculate the emission matrix for the observed activities (Exercise, Read, Watch TV).
3. Use the forward algorithm to calculate the probability of a given observation sequence.

2 Historical Data
The following table represents 12 days of human mood conditions (hidden states) and observed activities.

Table 1: Raw Data of Moods and Activities

Day Mood (Hidden State) Activity (Observation)
1 Happy Exercise
2 Sad Watch TV
3 Happy Read
4 Happy Exercise
5 Sad Watch TV
6 Sad Read
7 Happy Exercise
8 Sad Watch TV
9 Happy Exercise
10 Happy Read
11 Sad Watch TV
12 Happy Exercise

3 Questions
3.1 Calculate the Transition Matrix
The transition matrix represents the probability of moving from one mood to another between consecutive
days (e.g., from Happy to Sad).

3.2 Calculate the Emission Matrix

The emission matrix represents the probability of observing each activity (Exercise, Read, Watch TV)
given a particular mood (Happy or Sad).

3.3 Predict the Probability of an Observation Sequence

Given the observation sequence: O = (Exercise, Read, Watch TV), use the forward algorithm to calculate
the probability that this sequence was generated by the HMM with the transition and emission matrices
you calculated.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Bayesian Inference for AI

Instructions
• Transition Matrix Calculation: Count the transitions from Happy to Happy, Happy to Sad,
Sad to Happy, and Sad to Sad from the raw data.
• Emission Matrix Calculation: Count how many times each activity (Exercise, Read, Watch
TV) occurred given the mood of the person.

• Forward Algorithm: Use the forward algorithm step-by-step to calculate the final probability of
the observation sequence O = (Exercise, Read, Watch TV).

Hints
• For the transition matrix, the probabilities should sum to 1 for each row (from a given mood).
• For the emission matrix, the probabilities should sum to 1 for each mood.
• Use the forward algorithm formulas to compute the final probability.

Professor Nihad Karim Chowdhury, Department of CSE, University of Chittagong

Teaching Intership Learning Task 5
80% (30)
Teaching Intership Learning Task 5
8 pages
Yes, Virginia There Is A Santa Claus
No ratings yet
Yes, Virginia There Is A Santa Claus
2 pages
AI14
No ratings yet
AI14
6 pages
Bayesian network
No ratings yet
Bayesian network
21 pages
Bayesian Belief Network
100% (1)
Bayesian Belief Network
7 pages
Bayesian belief Network
No ratings yet
Bayesian belief Network
23 pages
Aiml Unit 2
No ratings yet
Aiml Unit 2
15 pages
Unit 4
No ratings yet
Unit 4
37 pages
Unit Iv Learning
No ratings yet
Unit Iv Learning
40 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
6 pages
4.2 Bayes-nets
No ratings yet
4.2 Bayes-nets
33 pages
3. Probabilistic Reasoning
No ratings yet
3. Probabilistic Reasoning
37 pages
Uncertain Knowledge
No ratings yet
Uncertain Knowledge
31 pages
Bayesian Belief Network in Artificial Intelligence
No ratings yet
Bayesian Belief Network in Artificial Intelligence
10 pages
Bayesian Networks
No ratings yet
Bayesian Networks
7 pages
Module 5 2
No ratings yet
Module 5 2
41 pages
Unit 5
No ratings yet
Unit 5
98 pages
13 Bayes Nets
No ratings yet
13 Bayes Nets
38 pages
Unit 6
No ratings yet
Unit 6
126 pages
EXP13
No ratings yet
EXP13
9 pages
CS480 Lecture October24th
No ratings yet
CS480 Lecture October24th
90 pages
4 Unce
No ratings yet
4 Unce
32 pages
Unit-5 Bayes' Rule and Bayesian Network
No ratings yet
Unit-5 Bayes' Rule and Bayesian Network
9 pages
Contact session6
No ratings yet
Contact session6
57 pages
AI Bayes Theorem
No ratings yet
AI Bayes Theorem
10 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
Imp Class Bayes Therom and Basian Network Class
No ratings yet
Imp Class Bayes Therom and Basian Network Class
39 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
5 Uncertainity Problems
No ratings yet
5 Uncertainity Problems
30 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
41 pages
Bayes Nets Representing and Reasoning About Uncertainty (Continued)
No ratings yet
Bayes Nets Representing and Reasoning About Uncertainty (Continued)
31 pages
CCATPREPARATION AI Module 4
No ratings yet
CCATPREPARATION AI Module 4
14 pages
PPT06-Probabilistic Reasoning
No ratings yet
PPT06-Probabilistic Reasoning
31 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
Bayesian Networks and Inference
No ratings yet
Bayesian Networks and Inference
50 pages
Bayesian Network - Problem
100% (1)
Bayesian Network - Problem
4 pages
EECS6895 AdvancedBigDataAnalytics Lecture6
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture6
81 pages
Lecture 8
No ratings yet
Lecture 8
61 pages
Ai Pro
No ratings yet
Ai Pro
11 pages
Wa0031.
No ratings yet
Wa0031.
41 pages
Bayesian Theory - Bayesian Network - Dempster Shafer Theory-AI Seminar
No ratings yet
Bayesian Theory - Bayesian Network - Dempster Shafer Theory-AI Seminar
21 pages
Uncertainty F23 Part1
No ratings yet
Uncertainty F23 Part1
44 pages
Bayes Rule
No ratings yet
Bayes Rule
29 pages
Unit 4 Uncertain Knowledge Complete
No ratings yet
Unit 4 Uncertain Knowledge Complete
14 pages
Lecture Bayesian Networks
No ratings yet
Lecture Bayesian Networks
50 pages
AIFA 25 Bayesian Logic 120324
No ratings yet
AIFA 25 Bayesian Logic 120324
33 pages
Bayesian Statistics and Belief Networks
No ratings yet
Bayesian Statistics and Belief Networks
36 pages
COMP3308/3608 Artificial Intelligence Tutorial Week 10: May 11th, 2015
No ratings yet
COMP3308/3608 Artificial Intelligence Tutorial Week 10: May 11th, 2015
2 pages
Aids Lab PDF
No ratings yet
Aids Lab PDF
53 pages
EXP1_A09_DS
No ratings yet
EXP1_A09_DS
6 pages
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
No ratings yet
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
31 pages
Baes Rule
No ratings yet
Baes Rule
8 pages
2021 Lecture09 BayesianNetworks
No ratings yet
2021 Lecture09 BayesianNetworks
60 pages
Chapter 9
No ratings yet
Chapter 9
72 pages
ML Unit Iv
No ratings yet
ML Unit Iv
17 pages
202004021910158758chandrabhan Artificial Intelligence Probabilistic Reasoning
No ratings yet
202004021910158758chandrabhan Artificial Intelligence Probabilistic Reasoning
11 pages
Bayesian Networks: A Tutorial
No ratings yet
Bayesian Networks: A Tutorial
73 pages
2025 MMP AI-KRR Unit 4 Uncertain Knowledge and Reasoning
No ratings yet
2025 MMP AI-KRR Unit 4 Uncertain Knowledge and Reasoning
79 pages
Bayesian Network Solutions
No ratings yet
Bayesian Network Solutions
7 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
Bayesian Networks: Section 1 - 2
No ratings yet
Bayesian Networks: Section 1 - 2
36 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Where Are You From?: 1 8 12 20 24 31 35 42 46 53 57 Answers
No ratings yet
Where Are You From?: 1 8 12 20 24 31 35 42 46 53 57 Answers
77 pages
CHAT Report - Sample
No ratings yet
CHAT Report - Sample
16 pages
RPH Activities
No ratings yet
RPH Activities
7 pages
43245415
No ratings yet
43245415
11 pages
Lesson Plan Gr 7 Creative Arts Dance T1 W 1, 2
No ratings yet
Lesson Plan Gr 7 Creative Arts Dance T1 W 1, 2
3 pages
(Ebook) The Ancient Egyptian Book of Two Ways by Leonard H. Lesko ISBN 9780520316928, 0520316924 - Download the ebook today and own the complete content
100% (1)
(Ebook) The Ancient Egyptian Book of Two Ways by Leonard H. Lesko ISBN 9780520316928, 0520316924 - Download the ebook today and own the complete content
78 pages
Bandpass Filter Loaded With Open Stubs Using Dual-Mode Ring Resonator
No ratings yet
Bandpass Filter Loaded With Open Stubs Using Dual-Mode Ring Resonator
3 pages
Lesson 1
No ratings yet
Lesson 1
5 pages
Improve Students Listening Comprehension Achievement by Using Video Media
No ratings yet
Improve Students Listening Comprehension Achievement by Using Video Media
2 pages
The Stages of Learning - How You Become More Competent at Skills - Effectiviology
No ratings yet
The Stages of Learning - How You Become More Competent at Skills - Effectiviology
13 pages
Probability Pitman Solutions
50% (2)
Probability Pitman Solutions
7 pages
Sim Function
No ratings yet
Sim Function
16 pages
Biography of Philippine Heroes
50% (2)
Biography of Philippine Heroes
10 pages
O Captain My Captain
100% (2)
O Captain My Captain
3 pages
Poo Maalayil Lyrics - Ooty Varai Uravu Songs
No ratings yet
Poo Maalayil Lyrics - Ooty Varai Uravu Songs
9 pages
Modals Q100
No ratings yet
Modals Q100
25 pages
5 - FAARFIELD Rigid Overlay Design
100% (1)
5 - FAARFIELD Rigid Overlay Design
33 pages
Copulative Verb Grammar
No ratings yet
Copulative Verb Grammar
7 pages
Concept Map - Figures of Speech
No ratings yet
Concept Map - Figures of Speech
1 page
Kristine Karen Davila
No ratings yet
Kristine Karen Davila
3 pages
Going To Used To Grammar Desk
100% (1)
Going To Used To Grammar Desk
4 pages
Clone Zilla
No ratings yet
Clone Zilla
7 pages
Subject: CS MSC A: DUET 2021
No ratings yet
Subject: CS MSC A: DUET 2021
19 pages
Conditional Rendering in REACT
No ratings yet
Conditional Rendering in REACT
6 pages
Glitchy
No ratings yet
Glitchy
12 pages
Format. Hum - Search For Completeness in Hayavadana and Clash of Egos in The Fire and The Rain
No ratings yet
Format. Hum - Search For Completeness in Hayavadana and Clash of Egos in The Fire and The Rain
6 pages
English Translation of Surah Saad
No ratings yet
English Translation of Surah Saad
33 pages
Meaning and Importance of Data
100% (1)
Meaning and Importance of Data
7 pages