Assignment3 Bayesian HMM
Assignment3 Bayesian HMM
Filtering
Deadline : 11:59 PM, 24/11/2024
Instructions:
3. Please read the instructions given in the questions carefully. In case of any ambiguity,
post your queries on Google Classroom at least a week before the deadline. No TA will
be responsible for responding to the queries after this.
4. A part of the assignment evaluation involves automatic testing of your submitted code
on private test cases. Please make sure that you do not change the structure of
the methods provided in the boilerplate code.
5. All the TAs will strictly follow the rubric provided. No requests will be entertained
related to scoring strategy.
6. The use of generative tools (such as ChatGPT, Gemini, etc.) is strictly pro-
hibited. Failure to comply may result in severe consequences related to plagiarism.
2. (10 marks) Given the following statements below, answer the questions. Round off the
probability values computed to 3 decimal places. A statement may have more than one
proposition.
• About 91.0% of people either read books or access academic journals regularly.
• Of the people who read books, 40% also access academic journals, and 60% only
read books.
• Given that a person reads books, the probability that they participate in book clubs
is 0.320, regardless of whether they access academic journals or not.
• About 22.7% of people access academic journals but do not read books.
• There is a 0.090 probability that a person neither reads books nor accesses academic
journals.
• Given that a person does not read books, the probability that they access academic
journals is 0.716.
• The probability that a person participates in book clubs and accesses academic
journals is 0.088.
• About 63.1% of people either participate in book clubs or access academic journals.
• There is a 40.0% chance that a person accesses academic journals, given that they
participate in book clubs.
• There is a 50.0% chance that a person accesses academic journals, whether or not
they read books.
• Given that a person does not read books, the probability that they participate in
book clubs is 0.0044, regardless of whether they access academic journals or not.
(a) (2.5 marks) Identify the random variables in the statements above and write
each statement using symbols for random variables, logical connectives where
necessary, and conditional probability notation.
(b) (2.5 marks) Verify that these propositions create a valid probability distribu-
tion. List the set of axioms that they satisfy.
(c) (2.5 marks) Populate the full joint probability distribution table.
(d) (2.5 marks) Use the joint distribution table and check for conditional indepen-
dence between all the random variables that you have identified.
3. (10 marks) In the context of adversarial machine learning, consider a machine learning
model used for image classification. Two different types of adversarial attacks can cause
the model to misclassify an input: adversarial perturbations (small, imperceptible
modifications to input data) and backdoor attacks (where the model is trained to
misclassify inputs that contain a specific trigger). Both types of attacks can trigger a
misclassification, leading to a ”misclassification alarm” being raised.
Now, suppose you observe a misclassification alarm after querying the model with an in-
put. Initially, adversarial perturbations and backdoor attacks are considered independent
events. However, you come across a report that backdoor triggers have been increas-
ingly present in recent datasets. How does this new information about the prevalence of
backdoor attacks change your belief regarding the likelihood of adversarial perturbations
causing the misclassification?
(a) (5 marks) Formulate this problem using Bayesian inference.
(b) (2.5 marks) Define the probabilities involved (prior, likelihood, and posterior).
(c) (2.5 marks) Explain how conditioning on the detection of a backdoor attack (from
recent reports) changes your belief about the role of adversarial perturbations in
causing the misclassification.
Hint: Consider how the observation of the common effect (the misclassification alarm)
influences your belief about the independent causes (adversarial perturbations and back-
door attacks).
Coding (60 marks)
Use the provided requires.txt file to setup your environment for both the coding
questions. Your code should execute without any errors in this environment; oth-
erwise, you will not be marked. Follow the below folder structure for submission.
A3 RollNumber Report.pdf
roomba class.py
estimated paths.csv
test model.py
4. (30 marks) Bayesian network for fare classification In this assignment, you will
develop a Bayesian network model for fare classification using a public transportation
dataset. The dataset contains information about different bus routes, stops, distances,
and fare categories. Your task is to build an initial Bayesian network, then apply pruning
techniques to improve the model’s efficiency, and optimize it further using structure
refinement methods. All 3 models need to be evaluated on the validation set provided.
The models will be tested on the private test set for final evaluation. Return the .pkl
files for each model along with your report.
Boilerplate code and dataset can be accessed here
Dataset Features
You are expected to use the following features for constructing the Bayesian network:
Testing
The code for evaluation is provided to you along with the boilerplate code in test model.py.
Test your models on the validation subset and report accuracies in the assignment re-
port. You will have to write your own code for calculating runtimes for each network
construction (initialization and training). For evaluation purposes, DO NOT FORGET
TO RETURN the .pkl FILES along with your code and report.
Tasks
1. Task 1: Construct the initial Bayesian Network (A) for fare classification.
(10 Marks)
(a) Build the Bayesian network using the provided features.
(b) Ensure that the structure includes dependencies between all possible feature
pairs.
(c) You should provide a visualization of the initial Bayesian network in the as-
signment report.
2. Task 2: Prune the initial Bayesian Network (A) to enhance performance.
(10 Marks)
(a) Apply pruning techniques such as Edge Pruning, Node Pruning, or simplifying
Conditional Probability Tables (CPTs).
(b) Clearly explain the pruning method applied and how it improves the model’s
efficiency (time taken to fit the data) and/or prediction accuracy.
(c) Provide a visualization of the pruned Bayesian Network (B) with fewer edges
or simplified structure.
3. Task 3: Optimize the Bayesian Network (A) by adjusting parameters or
using structure refinement methods. (10 Marks)
(a) Apply optimization techniques such as structure learning (e.g., Hill Climbing)
to refine the Bayesian network structure.
(b) Compare the performance of the optimized Bayesian network with the initial
network (A) and explain how the optimization improves the model’s accuracy
and/or efficiency.
(c) Provide a visualization of the optimized network.
5. (30 marks)Tracking a Roomba Using the Viterbi Algorithm Imagine you have a
Roomba robotic vacuum cleaner that autonomously cleans your home while you’re away.
The Roomba operates based on specific movement policies specified in the Roomba class
member functions.
You’ve installed sensors that provide noisy observations of the Roomba’s location at
discrete time intervals. Due to sensor limitations, these observations are not always
accurate. Your goal is to model the Roomba’s movement using a Hidden Markov Model
(HMM) and implement the Viterbi algorithm to track its most likely path based on the
noisy sensor observations. Boilerplate code can be accessed here