ML-9
ML-9
C L LEGPT
9 Graphical Model
Prepared and Edited by:- Mayank Yadav Designed by:- Kussh Prajapati
www.collegpt.com [email protected]
Prepared By : Mayank Yadav Machine Learning
Graphical Model
⭐What are graphical models? : Graphical models are a powerful way to represent probability
distributions over a set of variables. They use a graph structure to depict the relationships between
these variables, making them particularly useful for tasks like classification and reasoning under
uncertainty.
Key Concepts:
● Nodes: Represent random variables in the domain. These variables can be discrete (like weather:
sunny, rainy) or continuous (like temperature).
● Edges: Depict conditional dependencies between variables. An edge connecting two nodes
indicates that the probability of one variable is influenced by the state of the other.
● Markov Property: A core principle in graphical models. It states that a variable is conditionally
independent of its non-descendants in the graph given its parents. In simpler terms, knowing the
values of its parents is sufficient to determine the probability of a variable, and any other
information (non-descendants) becomes irrelevant.
● Directed Acyclic Graphs (DAGs): The most common type, where edges have a direction. The
direction implies a causal relationship between the connected variables (e.g., a parent variable
influences a child variable). A classic example is a Bayesian Network, where each node
represents a variable, and directed edges depict causal relationships.
● Undirected Graphical Models (UGMs): Edges in these models are undirected, indicating a
mutual dependence between the connected variables. They capture statistical relationships
without implying causality. Markov Random Fields (MRFs) are a popular example, often used in
image processing and computer vision.
Prepared By : Mayank Yadav Machine Learning
● Visual Representation: The graph structure provides a clear and intuitive understanding of the
relationships between variables, making them easier to analyze and interpret.
● Efficient Inference: Algorithms can exploit the conditional independence properties encoded in
the graph to perform efficient probabilistic inference tasks like calculating marginal probabilities
or finding the most probable configuration of variables.
● Scalability: Graphical models can handle complex problems with many variables by leveraging
the sparse structure of the graph (not all variables are connected).
● Classification: Naive Bayes (explained in the next section) is a prominent example used for spam
filtering, sentiment analysis, and image recognition.
● Recommendation Systems: Can model user preferences and item relationships to recommend
relevant items.
● Natural Language Processing: Used for tasks like part-of-speech tagging and sentiment analysis.
⭐Bayesian Networks : A Bayesian network is a probabilistic graphical model which represents a set
of variables and their conditional dependencies using a directed acyclic graph.
● It is also called a Bayes network, belief network, decision network, or Bayesian model.
● Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
● Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks including
prediction, anomaly detection, diagnostics, automated insight, reasoning, time series prediction,
and decision making under uncertainty.
● Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
The generalized form of Bayesian network that represents and solves decision problems under uncertain
knowledge is known as an Influence diagram.
Note: You can also include some theory from the Graphical Models.
Prepared By : Mayank Yadav Machine Learning
⭐Bayes Theorem : Bayes theorem is one of the most popular machine learning concepts that helps
to calculate the probability of occurring one event with uncertain knowledge while another one has
already occurred.
Bayes' theorem can be derived using product rule and conditional probability of event X with known
event Y:
● According to the product rule we can express as the probability of event X with known event Y as
follows:
P(X ? Y)= P(X|Y) P(Y) {equation 1}
Here, both events X and Y are independent events which means the probability of outcome of both
events does not depend on one another.
=> The above equation is called the Bayes Rule or Bayes Theorem.
● P(X|Y) is called the posterior probability, the probability of event X occurring given that event Y
has already happened. This is what we want to find after considering new evidence (Y).
● P(Y|X) is called the likelihood. the probability of observing event Y given that event X is true. This
represents how likely the evidence (Y) is under the assumption of hypothesis (X).
● P(X) is called the prior probability, the initial probability of event X occurring before considering
any evidence. This can be based on domain knowledge or historical data.
● P(Y) is called marginal probability. the total probability of observing event Y, regardless of
whether X is true or false. This can sometimes be difficult to calculate directly
Prepared By : Mayank Yadav Machine Learning
● Reliance on Prior Probabilities: The accuracy of the results depends heavily on the quality of the
prior probabilities used. Biased or inaccurate priors can lead to misleading results.
● Computational Challenges: Calculating the marginal probability (P(B)) can be computationally
expensive for complex problems with many variables.
● Spam Filtering: Classifies emails as spam or not spam based on the presence of specific words or
patterns.
● Recommendation Systems: Recommends products or content to users based on their past
preferences and browsing behavior (considering prior user interactions).
● Anomaly Detection: Identifies data points that deviate significantly from the expected patterns,
potentially indicating fraud or system failures.
Prepared By : Mayank Yadav Machine Learning
⭐Naïve Bayes Classifier : Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
● The Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
● It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
● Some popular examples of Naïve Bayes Algorithm are Spam Filtration, Sentimental Analysis, and
classifying articles.
Why is it called Naïve Bayes? : The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.
Such as if the fruit is identified on the basis of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that
it is an apple without depending on each other.
Working of Naïve Bayes' Classifier: Working of Naïve Bayes' Classifier can be understood with the
help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
⭐Algorithm⭐
1. Convert the given dataset into frequency tables.
Problem: If the weather is sunny, then the Player should play or not?
👉P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
👉P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|No)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
There are three types of Naive Bayes Model, which are given below:
● Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.
● Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
● Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in
a document. This model is also famous for document classification tasks.
Prepared By : Mayank Yadav Machine Learning
● Model Performance: The effectiveness of an ML model heavily relies on the chosen values for its
parameters. Finding the optimal parameters allows the model to learn the underlying patterns in
the data and make accurate predictions.
● Generalizability: The goal of parameter learning is not just to fit the training data but also to
generalize well to unseen data. This means the model should perform well on new data points it
hasn't encountered during training.
● Optimization Algorithms: These algorithms iteratively adjust the model parameters in a way that
minimizes a specific loss function. The loss function quantifies the discrepancy between the
model's predictions and the actual targets. Popular optimization algorithms include gradient
descent, stochastic gradient descent, and Adam.
● Maximum Likelihood Estimation (MLE): This technique aims to find the parameter values that
maximize the likelihood of the training data occurring under the model's assumption. It's
commonly used for models with probabilistic outputs.
● Bayesian Inference: This approach incorporates prior knowledge or beliefs about the parameters
(through prior probability distributions) to guide the learning process. It allows for updating
these beliefs based on the observed data (posterior probability).
● Model Complexity
● Data Quality and Quantity
● Choice of Optimization Algorithm
⭐What are HyperParameters? : Hyperparameters are defined as the parameters that are
explicitly defined by the user to control the learning process.
● Here the prefix "hyper" suggests that the parameters are top-level parameters that are used in
controlling the learning process.
● The value of the Hyperparameter is selected and set by the machine learning engineer before
the learning algorithm begins training the model.
● Hence, these are external to the model, and their values cannot be changed during the training
process.
⭐Hidden Markov Model or HMM : A Hidden Markov Model (HMM) is a probabilistic model that
consists of a sequence of hidden states, each of which generates an observation.
The hidden states are usually not directly observable, and the goal of HMM is to estimate the sequence
of hidden states based on a sequence of observations. HMM is defined by the following components:
Example : Imagine a sequence of coin flips (heads or tails). While you see the results (heads/tails), you
don't directly observe the underlying state of the coin (fair/biased). HMMs model such scenarios, where
the sequence (observations) is influenced by hidden states that generate them.
Key Components:
● Hidden States: A set of unobservable states that represent the underlying process generating
the data. These states could be "fair coin" or "biased coin" in our example.
● Observations: The actual sequence of data points you observe. In our case, these would be the
coin flip outcomes (heads/tails).
● State Transition Probabilities: These probabilities define the likelihood of transitioning from one
hidden state to another. For example, the probability of transitioning from "fair coin" to "biased
coin" in consecutive flips.
● Emission Probabilities: These probabilities represent the likelihood of observing a specific data
point given a particular hidden state. For instance, the probability of observing "heads" when the
coin is in the "fair coin" state.
Benefits of HMMs:
● Handling Hidden Processes: Allows modeling complex systems where the underlying
mechanisms are not directly observable.
● Efficient Sequence Analysis: Provides a framework for analyzing and making predictions about
sequential data.
● Flexibility: Can be adapted to various data types by defining appropriate hidden states and
emission probabilities.
Prepared By : Mayank Yadav Machine Learning
Limitations of HMMs:
● Assumption of Simplicity: Assumes hidden states are independent and the process is
memoryless (future depends only on the current state). This might not hold true for all
real-world scenarios.
● Parameter Estimation Challenges: Learning the model parameters, especially with a large
number of hidden states, can be computationally expensive.
Applications of HMMs:
● Speech Recognition: HMMs are used to recognize spoken words based on the sequence of
acoustic features extracted from audio signals.
● Part-of-Speech Tagging: Used to identify the grammatical function (noun, verb, etc.) of each
word in a sentence by modeling the sequence of words as hidden states and the parts-of-speech
as observations.
● Bioinformatics: Can be used to analyze and predict protein structures by modeling the sequence
of amino acids as hidden states and the observed structural features as emissions.
More Examples
Weather Prediction: Imagine a weather forecasting system that uses an HMM. The hidden states could
represent different weather conditions (sunny, rainy, cloudy). The observations might be daily
temperature readings, precipitation levels, and cloud cover. By analyzing a sequence of these
observations, the HMM can predict the most likely sequence of hidden states, which translates to the
weather forecast for the coming days.
Stock Market Analysis: HMMs can be used to model stock price movements. The hidden states could
represent different market conditions (bull market, bear market, consolidation). Daily closing prices or
other financial indicators would be the observations. Analyzing historical data with an HMM can help
identify patterns in market behavior and potentially predict future trends (keeping in mind the
limitations of HMMs and the inherent uncertainty in financial markets).
C LLEGPT
Visit: www.collegpt.com