Bayesian_Inference_for_AI
Bayesian_Inference_for_AI
Bayesian Network
A Bayesian Network is a directed acyclic graph (DAG) that represents the probabilistic relationships
among a set of random variables. Each node in the graph corresponds to a random variable, and the
edges represent the conditional dependencies between these variables.
Formally, a Bayesian Network for a set of random variables X1 , X2 , . . . , Xn consists of:
This factorization is based on the assumption that each variable is conditionally independent of its
non-descendants given its parents.
Example
Consider a Bayesian Network with three variables: X1 , X2 , and X3 , where:
• X1 has no parents,
• X2 has X1 as its parent,
• X3 has X1 and X2 as its parents.
The joint probability distribution for this network is:
1
Bayesian Inference for AI
2 Events
• Burglary (B): There is a burglary at your house.
• Earthquake (E): An earthquake occurs.
• Alarm (A): The alarm goes off.
• JohnCalls (J): John calls if he hears the alarm.
• MaryCalls (M): Mary calls if she hears the alarm.
Alarm (A)
4 Problem Statement
We are tasked with finding the probability of the event where the alarm has sounded (A = Yes), a
burglary has occurred (B = Yes), an earthquake has not occurred (E = No), and both John and Mary
have called (J = Yes, M = Yes). We will calculate the joint probability of this event using a Bayesian
network model based on the dataset provided.
5 Historical Database
Below is the historical data showing occurrences of burglary, earthquake, alarm activations, and calls
from John and Mary:
Table 1: Historical dataset showing Burglary, Earthquake, Alarm, John Calls, and Mary Calls
Event ID Burglary (B) Earthquake (E) Alarm (A) John Calls (J) Mary Calls (M)
1 Yes No Yes Yes Yes
2 No Yes Yes Yes No
3 Yes Yes Yes Yes Yes
4 No No No No No
5 Yes No Yes No Yes
6 No Yes No No No
7 No No Yes Yes No
8 Yes Yes Yes Yes Yes
9 No No No No No
10 No Yes Yes Yes Yes
Thus, we calculate:
1
P (A = Yes|B = No, E = No) = = 0.33
3
Table 2: Conditional Probability Table for Alarm (A) given Burglary (B) and Earthquake (E)
Burglary (B) Earthquake (E) Alarm (A = Yes) Alarm (A = No)
Yes Yes 1.0 0.0
Yes No 1.0 0.0
No Yes 0.67 0.33
No No 0.33 0.67
Thus, we calculate:
6
P (J = Yes|A = Yes) = = 0.86
7
• Case 2: P (J = Yes|A = No)
Thus, we calculate:
0
P (J = Yes|A = No) = = 0.0
3
Thus, we calculate:
5
P (M = Yes|A = Yes) = = 0.71
7
• Case 2: P (M = Yes|A = No)
Thus, we calculate:
0
P (M = Yes|A = No) = = 0.0
3
7 Solution
We need to calculate the joint probability:
P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = P (B = Yes) · P (E = No) · P (A = Yes|B = Yes, E = No)
· P (J = Yes|A = Yes) · P (M = Yes|A = Yes)
P (B = Yes, E = No, A = Yes, J = Yes, M = Yes) = 0.4 × 0.5 × 1.0 × 0.86 × 0.71
= 0.1221
The probability of the event that a burglary has occurred, no earthquake has occurred, the alarm has
sounded, and both John and Mary called is approximately 0.1221. This means there is a 12.21% chance
of this specific combination of events happening, based on the given probabilities. ■
• Sleep Hours: How many hours the student sleeps per day (Low, Medium, High).
• Pass Exam: Whether the student passes the exam (Yes, No).
1 Historical Database
The following table shows the raw data for 12 students:
• Study Hours, Attendance, and Sleep Hours together influence the probability of Pass Exam.
Sleep Hours
Pass Exam
4 Questions
1. Construct the Bayesian Network: Define the Conditional Probability Tables (CPTs) based on
the raw database.
2. Inference: Calculate the probability that a student will pass the exam given:
1 Dataset
We consider the following dataset where we predict whether a person will buy a computer based on their
income and student status:
We aim to predict whether a person with High Income and Student Status = Yes will buy a computer.
2.2.1 Income
For the class Yes (Buys Computer):
0
P (Income = High|Yes) = =0
3
For the class No (Does Not Buys Computer):
2
P (Income = High|No) = = 0.67
3
1 Problem Statement
Consider a feature called Weather with three possible values:
• Sunny
• Rainy
• Cloudy
We are predicting whether people will go Outdoors (Yes/No) based on the weather. The training
dataset is shown below:
2 Conclusion
The use of k in Laplacian correction is essential for maintaining the sum of probabilities as 1. Without
k, the smoothing effect is not properly distributed across all possible values of the feature, leading to
overestimated probabilities. By including k, we ensure that the probabilities are correctly normalized
and that unseen values do not disproportionately influence the classification. ■
2 Questions
1. Calculate the Prior Probabilities for both classes (Pass/Fail).
2. Calculate the Conditional Probabilities for each feature given the class, using Laplacian
Correction to handle any zero probabilities.
3. Classify the following student using the Naive Bayesian algorithm:
• Attendance: Medium
• Study Hours: Medium
• Previous Grades: Pass
4. Compare the result with the original dataset and discuss how Laplacian correction helps prevent
issues with zero probabilities.
1 Problem Statement
We will predict whether a person will buy a computer based on their:
• Age (categorical)
• Income (continuous)
2 Dataset
• Income = 400
• Student Status = Yes
3.2.1 Age:
Since we have three categories for the Age attribute (Youth, Middle, and Senior), we set k = 3.
-For Class = Yes:
There are 2 people with Age = Senior who bought a computer. Applying Laplacian correction:
Count of Senior and Yes + 1
P (Age = Senior|Yes) =
Total Yes instances + k
2+1 3
P (Age = Senior|Yes) = = = 0.5
3+3 6
-For Class = No:
There are zero people with Age = Senior who did not buy a computer. Applying Laplacian
correction:
Count of Senior and No + 1
P (Age = Senior|No) =
Total No instances + k
0+1 1
P (Age = Senior|No) = = = 0.167
3+3 6
(x − µ)2
1
P (x|µ, σ) = √ exp −
2πσ 2 2σ 2
Where:
• x is the income value for the new instance (in this case, 400).
• µ is the mean income for the class.
• σ is the standard deviation of income for the class.
Let’s calculate the likelihood of Income = 400 given the class.
(400 − 333.33)2
1
P (Income = 400|No) = exp −
2(47.14)2
p
2π(47.14)2
(66.67)2
1 1
=p exp − 2
=√ exp (−1)
2π(47.14)2 2(47.14) 13948.41
P (Income = 400|No) ≈ 0.0053
4 Conclusion
For continuous attributes like Income, the Gaussian Naı̈ve Bayes classifier is used, and for categorical
attributes, standard probability calculations are performed. Laplacian correction is applied where
needed to avoid zero probabilities. ■
• Salary (continuous)
• Education Level (High School, Bachelor’s, Master’s)
• Buys Car (Yes/No)
The dataset is as follows:
2 Questions
1. Calculate the Prior Probabilities for the class ”Buys Car” (Yes/No).
2. For the continuous attributes Age and Salary, assume they follow a Gaussian distribution.
For each class:
• Calculate the mean and standard deviation for Age and Salary.
3. Classify the following employee using the Gaussian Naive Bayesian algorithm:
• Age: 37
• Salary: 48000
• Education Level: Bachelor’s
4. Show your calculations for the probability density function of the Gaussian distribution for
continuous values of Age and Salary.
5. Compare the final result with the original dataset and discuss how continuous features are handled
differently from categorical features in Naive Bayes.
1 Historical Data
The table below shows the historical data of 10 days, with hidden weather states (Sunny or Rainy) and
observable activities (Walk, Shop, Clean).
We are given a Hidden Markov Model (HMM) for weather prediction with the following parameters:
Where:
– aij = P (St = j | St−1 = i), represents the probability of transitioning from state i to state j,
– St is the hidden state at time step t,
– N is the number of hidden states.
The sum of each row in the transition matrix must equal 1:
N
X
aij = 1, ∀i
j=1
Thus, the transition matrix A shows the probability of moving from one hidden weather state
(Sunny or Rainy) to another.
1 4
P (Sunny|Sunny) = 5 P (Rainy|Sunny) = 5
A= 3 1
P (Sunny|Rainy) = 4 P (Rainy|Rainy) = 4
0.20 0.80
A=
0.75 0.25
• Emission Probabilities:
The emission matrix B for a Hidden Markov Model is defined as:
b11 b12 ··· b1M
b21 b22 ··· b2M
B= .
.. .. ..
.. . . .
bN 1 bN 2 ··· bN M
Where:
– bij = P (oj | Si ), represents the probability of observing oj (the j-th observation) given the
hidden state Si (the i-th hidden state),
– Si is the hidden state at time t,
– oj is an observation,
– N is the number of hidden states,
– M is the number of possible observations.
The sum of each row in the emission matrix must equal 1:
M
X
bij = 1, ∀i
j=1
Thus, the emission matrix B represents the probabilities of observing activities (Walk, Shop, Clean)
given a weather state (Sunny or Rainy).
2 2 1
P (Walk|Sunny) = 5 P (Shop|Sunny) = 5 P (Clean|Sunny) = 5
B= 1 1 3
P (Walk|Rainy) = 5 P (Shop|Rainy) = 5 P (Clean|Rainy) = 5
• Initial Probabilities:
π = {P (Sunny) = 0.5, P (Rainy) = 0.5}
We want to compute the probability of the observation sequence O = {Walk, Shop, Clean} using the
forward algorithm.
2 Solution
2.1 Initialization
The initialization step involves calculating the forward probabilities for each hidden state at time t = 1,
which corresponds to the first observation (Walk ). The formula for initialization is:
α1 (i) = πi · bi (o1 )
Where:
• α1 (i) is the forward probability for state i at time t = 1,
• πi is the initial probability of state i,
• bi (o1 ) is the emission probability of observing o1 in state i.
Using the initial probabilities and emission probabilities for the first observation (Walk ), we calculate:
2.2 Recursion
For each subsequent time step t, the forward probabilities are recursively calculated using the formula:
N
X
αt (j) = αt−1 (i) · aij · bj (ot )
i=1
Where:
• αt (j) is the forward probability for state j at time t,
• αt−1 (i) is the forward probability for state i at time t − 1,
• aij is the transition probability from state i to state j,
• bj (ot ) is the emission probability of observing ot in state j.
At t = 2 (second observation: Shop), we calculate:
α2 (Sunny) = [α1 (Sunny) · P (Sunny | Sunny) + α1 (Rainy) · P (Sunny | Rainy)] · P (Shop | Sunny)
α2 (Sunny) = [0.2 · 0.20 + 0.1 · 0.80] · 0.4
= [0.04 + 0.08] · 0.4
= 0.12 · 0.4 = 0.048
α2 (Rainy) = [α1 (Sunny) · P (Rainy | Sunny) + α1 (Rainy) · P (Rainy | Rainy)] · P (Shop | Rainy)
α2 (Rainy) = [0.2 · 0.75 + 0.1 · 0.25] · 0.2
= [0.15 + 0.025] · 0.2
= 0.175 · 0.2 = 0.035
At t = 3 (third observation: Clean), we calculate:
α3 (Sunny) = [α2 (Sunny) · P (Sunny | Sunny) + α2 (Rainy) · P (Sunny | Rainy)] · P (Clean | Sunny)
α3 (Sunny) = [0.048 · 0.20 + 0.035 · 0.80] · 0.2
= [0.0096 + 0.028] · 0.2
= 0.0376 · 0.2 = 0.00752
α3 (Rainy) = [α2 (Sunny) · P (Rainy | Sunny) + α2 (Rainy) · P (Rainy | Rainy)] · P (Clean | Rainy)
α3 (Rainy) = [0.048 · 0.75 + 0.035 · 0.25] · 0.6
= [0.036 + 0.00875] · 0.6
= 0.04475 · 0.6 = 0.02685
2.3 Termination
The total probability of observing the sequence is calculated by summing the forward probabilities at
the final time step T :
N
X
P (O | λ) = αT (i)
i=1
Where:
Thus, the probability of observing the sequence Walk, Shop, Clean given the HMM is approxi-
mately P (O|λ) = 0.03437.
3 Conclusion
In this example, we used the forward algorithm to calculate the probability of the observation sequence
Walk, Shop, Clean using the Hidden Markov Model’s transition and emission matrices. The result shows
how likely this sequence is under the given HMM parameters. ■
2 Historical Data
The following table represents 12 days of human mood conditions (hidden states) and observed activities.
3 Questions
3.1 Calculate the Transition Matrix
The transition matrix represents the probability of moving from one mood to another between consecutive
days (e.g., from Happy to Sad).
Instructions
• Transition Matrix Calculation: Count the transitions from Happy to Happy, Happy to Sad,
Sad to Happy, and Sad to Sad from the raw data.
• Emission Matrix Calculation: Count how many times each activity (Exercise, Read, Watch
TV) occurred given the mood of the person.
• Forward Algorithm: Use the forward algorithm step-by-step to calculate the final probability of
the observation sequence O = (Exercise, Read, Watch TV).
Hints
• For the transition matrix, the probabilities should sum to 1 for each row (from a given mood).
• For the emission matrix, the probabilities should sum to 1 for each mood.
• Use the forward algorithm formulas to compute the final probability.