Module 4
Module 4
Bayesian learning is a probabilistic approach to machine learning that uses Bayes' Theorem to
update the probability of hypotheses as new data is observed, combining prior knowledge and
evidence to make informed predictions and decisions.
Bayesian learning is a machine learning method that uses Bayes' Theorem to update the probability
of a hypothesis as more evidence or data becomes available. It treats learning as a process of
probabilistic inference, where the goal is to estimate the posterior distribution of hypotheses given
observed data, combining prior beliefs and the likelihood of the data under those hypotheses
Bayes' Theorem: Central to Bayesian learning, it mathematically relates the prior probability
of a hypothesis P(h)P(h), the likelihood of observed data given the hypothesis P(D∣h)P(D∣h),
and the posterior probability P(h∣D)P(h∣D) - the updated belief about the hypothesis after
seeing the data:
Prior Probability: Represents initial beliefs about hypotheses before seeing data.
Posterior Probability: Updated probability of the hypothesis after considering the data.
Observes data and calculates how likely this data is under each hypothesis.
Advantages:
If a person uses the drug, the test correctly detects it 98% of the time (true positive
rate).
If a person does not use the drug, the test correctly shows negative 98% of the time
(true negative rate).
A person tests positive. What is the probability that this person actually uses the drug?
Solution
P(B|¬A) = 0.02 (false positive rate: test is positive even if person does not use the drug)
1. To calculate probability
2. for hypothesis
3. To provide useful understanding of learning algorithm that do explicitly manipulate
probabilities
4. To minimize the mean squared error in the neural network
Key features
Probabilistic predictions: It can handle hypotheses that make probabilistic, rather than
deterministic, predictions about data
Limitations
The Brute Force MAP (Maximum A Posteriori) Learning algorithm is a straightforward Bayesian
concept learning method that finds the most probable hypothesis given training data by exhaustively
evaluating all hypotheses in a finite hypothesis space.
Key assumptions:
h is a hypothesis,
NAÏVE-BAYES CLASSIFIER
The Naive Bayes classifier is a simple and popular supervised machine learning algorithm used for
classification tasks, such as text classification or spam detection. It is based on Bayes’ Theorem and
assumes that all features (predictors) are conditionally independent given the class label-this is
called the naive independence assumption
The denominator P(x1,x2,...,xn)P(x1,x2,...,xn) is constant for all classes and thus omitted in
classification.
A Bayesian Belief Network (BBN) is like a smart map that shows how different things (variables) are
connected and influence each other, using probabilities. Imagine you want to understand how
weather, traffic, and being late to work are related. A BBN draws arrows from one factor to another
to show which causes which, and uses numbers (probabilities) to express how likely things are given
other things.
In simple terms:
It’s a diagram made of nodes and arrows; each node is a variable (like "Rain" or "Traffic
Jam").
The arrows show cause-and-effect relationships (e.g., rain can cause traffic jams).
Each node has a table that tells you the chance of that variable happening given its causes.
When you get new information (like it’s raining), the network updates the chances of related
events (like traffic jams or being late).
This helps in making decisions or predictions when things are uncertain by combining what you
know with new evidence.
Mathematical definition
joint probability distribution over all variables X1,X2,...,XnX1,X2,...,Xn in a Bayesian Belief Network
(BBN) can be expressed as the product of the conditional probabilities of each variable given its
parents in the network.
split:
Instead of calculating the probability of every possible combination of all variables together
(which grows exponentially and becomes infeasible for many variables),
The Bayesian network breaks down the joint probability into smaller, manageable pieces.
Each variable depends only on its parent variables (the nodes with arrows pointing to it).
By multiplying these conditional probabilities, you get the overall joint probability
Gradient ascent training of Bayesian networks is an optimization method that updates the
conditional probabilities in the network by following the gradient of the log-likelihood of observed
data, improving the network’s fit to data iteratively.
Each node has a CPT, where each entry wijk represents the probability that variable Yi takes
value yi given its parents Ui have values uik
The goal is to find the set of wijk values that maximize the probability of the observed
training data DD, i.e., maximize P(D∣h) where h represents the hypothesis defined by the
CPTs.
Derivation
Example