0% found this document useful (0 votes)
2 views

ML-9

Uploaded by

22beit30160
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML-9

Uploaded by

22beit30160
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Machine Learning CT604EN

C L LEGPT

9 Graphical Model

Prepared and Edited by:- Mayank Yadav Designed by:- Kussh Prajapati

Get Prepared Together

www.collegpt.com [email protected]
Prepared By : Mayank Yadav Machine Learning

Graphical Model

⭐What are graphical models? : Graphical models are a powerful way to represent probability
distributions over a set of variables. They use a graph structure to depict the relationships between
these variables, making them particularly useful for tasks like classification and reasoning under
uncertainty.

Key Concepts:

● Nodes: Represent random variables in the domain. These variables can be discrete (like weather:
sunny, rainy) or continuous (like temperature).

● Edges: Depict conditional dependencies between variables. An edge connecting two nodes
indicates that the probability of one variable is influenced by the state of the other.

● Markov Property: A core principle in graphical models. It states that a variable is conditionally
independent of its non-descendants in the graph given its parents. In simpler terms, knowing the
values of its parents is sufficient to determine the probability of a variable, and any other
information (non-descendants) becomes irrelevant.

Types of Graphical Models:

● Directed Acyclic Graphs (DAGs): The most common type, where edges have a direction. The
direction implies a causal relationship between the connected variables (e.g., a parent variable
influences a child variable). A classic example is a Bayesian Network, where each node
represents a variable, and directed edges depict causal relationships.

● Undirected Graphical Models (UGMs): Edges in these models are undirected, indicating a
mutual dependence between the connected variables. They capture statistical relationships
without implying causality. Markov Random Fields (MRFs) are a popular example, often used in
image processing and computer vision.
Prepared By : Mayank Yadav Machine Learning

Benefits of Graphical Models:

● Visual Representation: The graph structure provides a clear and intuitive understanding of the
relationships between variables, making them easier to analyze and interpret.

● Efficient Inference: Algorithms can exploit the conditional independence properties encoded in
the graph to perform efficient probabilistic inference tasks like calculating marginal probabilities
or finding the most probable configuration of variables.

● Scalability: Graphical models can handle complex problems with many variables by leveraging
the sparse structure of the graph (not all variables are connected).

Applications of Graphical Models:

● Classification: Naive Bayes (explained in the next section) is a prominent example used for spam
filtering, sentiment analysis, and image recognition.

● Recommendation Systems: Can model user preferences and item relationships to recommend
relevant items.

● Natural Language Processing: Used for tasks like part-of-speech tagging and sentiment analysis.

● Computer Vision: Employed in image segmentation and object recognition.


Prepared By : Mayank Yadav Machine Learning

⭐Bayesian Networks : A Bayesian network is a probabilistic graphical model which represents a set
of variables and their conditional dependencies using a directed acyclic graph.

● It is also called a Bayes network, belief network, decision network, or Bayesian model.

● Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.

● Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks including
prediction, anomaly detection, diagnostics, automated insight, reasoning, time series prediction,
and decision making under uncertainty.

● Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:

● Directed Acyclic Graph


● Table of conditional probabilities.

The generalized form of Bayesian network that represents and solves decision problems under uncertain
knowledge is known as an Influence diagram.

Note: You can also include some theory from the Graphical Models.
Prepared By : Mayank Yadav Machine Learning

⭐Bayes Theorem : Bayes theorem is one of the most popular machine learning concepts that helps
to calculate the probability of occurring one event with uncertain knowledge while another one has
already occurred.

Bayes' theorem can be derived using product rule and conditional probability of event X with known
event Y:

● According to the product rule we can express as the probability of event X with known event Y as
follows:
P(X ? Y)= P(X|Y) P(Y) {equation 1}

● Further, the probability of event Y with known event X:

P(X ? Y)= P(Y|X) P(X) {equation 2}


Mathematically, Bayes theorem can be expressed by combining both equations on the right hand side.
We will get:

Here, both events X and Y are independent events which means the probability of outcome of both
events does not depend on one another.

=> The above equation is called the Bayes Rule or Bayes Theorem.

● P(X|Y) is called the posterior probability, the probability of event X occurring given that event Y
has already happened. This is what we want to find after considering new evidence (Y).

● P(Y|X) is called the likelihood. the probability of observing event Y given that event X is true. This
represents how likely the evidence (Y) is under the assumption of hypothesis (X).

● P(X) is called the prior probability, the initial probability of event X occurring before considering
any evidence. This can be based on domain knowledge or historical data.

● P(Y) is called marginal probability. the total probability of observing event Y, regardless of
whether X is true or false. This can sometimes be difficult to calculate directly
Prepared By : Mayank Yadav Machine Learning

Benefits of Bayes' Theorem in Machine Learning:

● Incorporating Prior Knowledge: Allows incorporating existing knowledge or beliefs (prior


probability) into the decision-making process.
● Reasoning Under Uncertainty: Enables reasoning about probabilities in situations where
complete information might not be available.
● Improved Classification Performance: Can be used in algorithms like Naive Bayes for effective
classification tasks by calculating the posterior probability of different classes given the features
of a data point.

Limitations of Bayes' Theorem:

● Reliance on Prior Probabilities: The accuracy of the results depends heavily on the quality of the
prior probabilities used. Biased or inaccurate priors can lead to misleading results.
● Computational Challenges: Calculating the marginal probability (P(B)) can be computationally
expensive for complex problems with many variables.

Examples of Applications in Machine Learning:

● Spam Filtering: Classifies emails as spam or not spam based on the presence of specific words or
patterns.
● Recommendation Systems: Recommends products or content to users based on their past
preferences and browsing behavior (considering prior user interactions).
● Anomaly Detection: Identifies data points that deviate significantly from the expected patterns,
potentially indicating fraud or system failures.
Prepared By : Mayank Yadav Machine Learning

⭐Naïve Bayes Classifier : Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.

● It is mainly used in text classification that includes a high-dimensional training dataset.

● The Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.

● It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.

● Some popular examples of Naïve Bayes Algorithm are Spam Filtration, Sentimental Analysis, and
classifying articles.

Why is it called Naïve Bayes? : The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.

Such as if the fruit is identified on the basis of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that
it is an apple without depending on each other.

● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Working of Naïve Bayes' Classifier: Working of Naïve Bayes' Classifier can be understood with the
help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:

⭐Algorithm⭐
1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.

3. Now, use Bayes theorem to calculate the posterior probability.


Prepared By : Mayank Yadav Machine Learning

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

1.Convert the given dataset into frequency tables.


Prepared By : Mayank Yadav Machine Learning

2.Generate Likelihood table by finding the probabilities of given features.

3. Apply Bayes Theorem :

👉P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

=> So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

👉P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|No)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

=>So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

∴ Hence on a Sunny day, Player can play the game.


Prepared By : Mayank Yadav Machine Learning

Advantages of Naïve Bayes Classifier:


● Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
● It can be used for Binary as well as Multi-class Classifications.
● It performs well in Multi-class predictions as compared to the other Algorithms.
● It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


● Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.

Applications of Naïve Bayes Classifier:


● It is used for Credit Scoring.
● It is used in medical data classification.
● It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
● It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:

● Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.

● Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.

● Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in
a document. This model is also famous for document classification tasks.
Prepared By : Mayank Yadav Machine Learning

⭐What is Parameter Learning? : Parameter learning is a fundamental concept in Machine


Learning (ML) that refers to the process of estimating the optimal values for the parameters within a
chosen machine learning model. These parameters are the adjustable variables within the model that
control its behavior and influence its predictions.

Importance of Parameter Learning:

● Model Performance: The effectiveness of an ML model heavily relies on the chosen values for its
parameters. Finding the optimal parameters allows the model to learn the underlying patterns in
the data and make accurate predictions.
● Generalizability: The goal of parameter learning is not just to fit the training data but also to
generalize well to unseen data. This means the model should perform well on new data points it
hasn't encountered during training.

Common Parameter Learning Techniques:

● Optimization Algorithms: These algorithms iteratively adjust the model parameters in a way that
minimizes a specific loss function. The loss function quantifies the discrepancy between the
model's predictions and the actual targets. Popular optimization algorithms include gradient
descent, stochastic gradient descent, and Adam.
● Maximum Likelihood Estimation (MLE): This technique aims to find the parameter values that
maximize the likelihood of the training data occurring under the model's assumption. It's
commonly used for models with probabilistic outputs.
● Bayesian Inference: This approach incorporates prior knowledge or beliefs about the parameters
(through prior probability distributions) to guide the learning process. It allows for updating
these beliefs based on the observed data (posterior probability).

Factors Affecting Parameter Learning:

● Model Complexity
● Data Quality and Quantity
● Choice of Optimization Algorithm

Benefits of Effective Parameter Learning:

● Improved Model Performance


● Reduced Overfitting
● Enhanced Interpretability
Prepared By : Mayank Yadav Machine Learning

⭐What are HyperParameters? : Hyperparameters are defined as the parameters that are
explicitly defined by the user to control the learning process.

● Here the prefix "hyper" suggests that the parameters are top-level parameters that are used in
controlling the learning process.
● The value of the Hyperparameter is selected and set by the machine learning engineer before
the learning algorithm begins training the model.
● Hence, these are external to the model, and their values cannot be changed during the training
process.

Some examples of Hyperparameters in Machine Learning

● The k in kNN or K-Nearest Neighbour algorithm


● Learning rate for training a neural network
● Train-test split ratio
● Batch Size
● Number of Epochs
● Branches in Decision Tree
● Number of clusters in Clustering Algorithm
Prepared By : Mayank Yadav Machine Learning

⭐Hidden Markov Model or HMM : A Hidden Markov Model (HMM) is a probabilistic model that
consists of a sequence of hidden states, each of which generates an observation.

The hidden states are usually not directly observable, and the goal of HMM is to estimate the sequence
of hidden states based on a sequence of observations. HMM is defined by the following components:

● A set of N hidden states, S = {s1, s2, ..., sN}.


● A set of M observations, O = {o1, o2, ..., oM}.
● An initial state probability distribution, ? = {?1, ?2, ..., ?N}, which specifies the probability of
starting in each hidden state.
● A transition probability matrix, A = [aij], defines the probability of moving from one hidden state
to another.
● An emission probability matrix, B = [bjk], defines the probability of emitting an observation from
a given hidden state.

Example : Imagine a sequence of coin flips (heads or tails). While you see the results (heads/tails), you
don't directly observe the underlying state of the coin (fair/biased). HMMs model such scenarios, where
the sequence (observations) is influenced by hidden states that generate them.

Key Components:

● Hidden States: A set of unobservable states that represent the underlying process generating
the data. These states could be "fair coin" or "biased coin" in our example.
● Observations: The actual sequence of data points you observe. In our case, these would be the
coin flip outcomes (heads/tails).
● State Transition Probabilities: These probabilities define the likelihood of transitioning from one
hidden state to another. For example, the probability of transitioning from "fair coin" to "biased
coin" in consecutive flips.
● Emission Probabilities: These probabilities represent the likelihood of observing a specific data
point given a particular hidden state. For instance, the probability of observing "heads" when the
coin is in the "fair coin" state.

Benefits of HMMs:

● Handling Hidden Processes: Allows modeling complex systems where the underlying
mechanisms are not directly observable.
● Efficient Sequence Analysis: Provides a framework for analyzing and making predictions about
sequential data.
● Flexibility: Can be adapted to various data types by defining appropriate hidden states and
emission probabilities.
Prepared By : Mayank Yadav Machine Learning

Limitations of HMMs:

● Assumption of Simplicity: Assumes hidden states are independent and the process is
memoryless (future depends only on the current state). This might not hold true for all
real-world scenarios.
● Parameter Estimation Challenges: Learning the model parameters, especially with a large
number of hidden states, can be computationally expensive.

Applications of HMMs:

● Speech Recognition: HMMs are used to recognize spoken words based on the sequence of
acoustic features extracted from audio signals.
● Part-of-Speech Tagging: Used to identify the grammatical function (noun, verb, etc.) of each
word in a sentence by modeling the sequence of words as hidden states and the parts-of-speech
as observations.
● Bioinformatics: Can be used to analyze and predict protein structures by modeling the sequence
of amino acids as hidden states and the observed structural features as emissions.

More Examples
Weather Prediction: Imagine a weather forecasting system that uses an HMM. The hidden states could
represent different weather conditions (sunny, rainy, cloudy). The observations might be daily
temperature readings, precipitation levels, and cloud cover. By analyzing a sequence of these
observations, the HMM can predict the most likely sequence of hidden states, which translates to the
weather forecast for the coming days.

Stock Market Analysis: HMMs can be used to model stock price movements. The hidden states could
represent different market conditions (bull market, bear market, consolidation). Daily closing prices or
other financial indicators would be the observations. Analyzing historical data with an HMM can help
identify patterns in market behavior and potentially predict future trends (keeping in mind the
limitations of HMMs and the inherent uncertainty in financial markets).
C LLEGPT

All the Best


"Enjoyed these notes? Feel free to share them with

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

Visit: www.collegpt.com

www.collegpt.com ColleGPT [email protected]

You might also like