0% found this document useful (0 votes)
19 views

Module 3 (1)

The document introduces Naive Bayes classifiers, which are simple probabilistic models based on Bayes' Theorem, often used for tasks like spam filtering and text classification despite their assumption of feature independence. It also discusses Hidden Markov Models (HMM), which represent systems with hidden states and are applied in areas like speech recognition and bioinformatics, detailing algorithms like Viterbi and Baum-Welch for state decoding. The document highlights the advantages and limitations of both models, including efficiency and independence assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Module 3 (1)

The document introduces Naive Bayes classifiers, which are simple probabilistic models based on Bayes' Theorem, often used for tasks like spam filtering and text classification despite their assumption of feature independence. It also discusses Hidden Markov Models (HMM), which represent systems with hidden states and are applied in areas like speech recognition and bioinformatics, detailing algorithms like Viterbi and Baum-Welch for state decoding. The document highlights the advantages and limitations of both models, including efficiency and independence assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Naive Bayes Classifier: Introduction

Naive Bayes is a family of simple probabilistic classifiers based on Bayes’


Theorem. It is termed “naive” because it assumes that all features (input
variables) are independent of each other, which is rarely the case in real-
world data. Despite this strong assumption, Naive Bayes classifiers often
perform very well, especially in applications like spam filtering, text
classification, and sentiment analysis.

Example: Consider you are trying to classify emails as “spam” or “not


spam” based on the words they contain. Words like “buy”, “cheap”, and
“now” are often associated with spam emails. The Naive Bayes classifier
computes probabilities for each word and uses them to classify the email.

Understanding Bayes’ Theorem

Bayes' Theorem provides a way to calculate the probability of a


hypothesis given prior knowledge and new evidence.

The formula is:


The Naive Assumption of Feature Independence

The Naive Bayes classifier assumes that all features are conditionally
independent of each other, given the class label. This is the “naive”
assumption, and it simplifies the computation of the likelihood.

Why this matters: In real-world problems, many features are not


independent (e.g., in text classification, words are often related to each
other). Despite this unrealistic assumption, Naive Bayes often works well
due to the strong impact of prior probabilities and likelihoods.

Example: In classifying whether a person will buy a car based on age and
income, Naive Bayes assumes that age and income are independent, even
though they may be correlated (younger people may have lower incomes).
The assumption simplifies the calculations by ignoring this relationship.

Naive Bayes Classifier Formula

To classify a new instance using Naive Bayes, the classifier computes the
posterior probability for each class and assigns the instance to the class
with the highest probability.

The formula is:


Example: For email classification:

 C: "spam" or "not spam."


 X: the words in the email (e.g., "buy", "cheap", "now").
 The classifier calculates the probabilities for both classes and
classifies the email based on which has the higher probability.

Advantages of Naive Bayes

 Efficiency: It requires less computational power and smaller


amounts of data to estimate the parameters compared to other
algorithms.
 Simplicity: Naive Bayes is easy to understand and implement,
making it a great starting point for classification tasks.
 Fast and Scalable: It works well with large datasets and provides
good performance in many practical applications.
 Handles Missing Data: Naive Bayes can handle missing data by
ignoring the probability terms for missing features.

Limitations of Naive Bayes

 Independence Assumption: Naive Bayes assumes that all features


are independent, which may not hold true in many real-world
applications where features are correlated.
 Zero Probability Problem: If a class never encountered a
particular feature in training, Naive Bayes assigns a probability of
zero to that class when the feature appears in the test data. This is
known as the “zero-frequency” problem.
Hidden Markov Models (HMM)

1. Introduction to Hidden Markov Models (HMM)

A Hidden Markov Model is a statistical model that represents a system


that evolves over time through a sequence of hidden states, where the
actual state is not directly observable, but outputs (observations) are
available. It is widely used in areas such as speech recognition, bio
informatics, and pattern recognition.

An HMM consists of:

 States (S): The hidden states the system passes through.


 Observations (O): The visible output generated from the states.
 Transition probabilities (A): The probability of transitioning from
one state to another.
 Emission probabilities (B): The probability of an observation
given a state.
 Initial state probabilities (π): The probability of starting in a
particular state.
Solution: Baum-Welch Algorithm (an Expectation-Maximization
algorithm).

Viterbi Algorithm (Decoding Problem)

The Viterbi algorithm finds the most likely sequence of hidden states that
produces a given sequence of observations.

Steps:

1. Initialization: Compute the initial probabilities for each state


based on the first observation.
2. Recursion: For each subsequent observation, compute the
maximum probability of reaching each state by considering all
previous states.
3. Termination: Identify the state with the maximum probability at
the final time step.
4. Backtracking: Trace back through the states to find the most
likely sequence.

Formula:

Applications of HMMs

 Speech Recognition: Identifying words or phonemes based on


acoustic signals.
 Bioinformatics: Finding gene sequences or protein structure
alignments.
 Natural Language Processing: Part-of-speech tagging, named
entity recognition.
 Finance: Modeling market conditions or stock prices as hidden
states.
Challenges in HMM

 Scalability: As the number of states increases, the complexity of


calculations grows exponentially.
 Assumption of Independence: HMM assumes that the current
state depends only on the previous state, which may not always be
true in complex real-world systems.

Weather Prediction Example with Detailed Explanations

1. States (S)

In this example, the hidden states are the weather conditions. The two
possible states are:

 Sunny (S1)
 Rainy (S2)

The weather condition is hidden because it is not directly observed—what


we observe instead is the person's activity.

2. Observations (O)

The observations are what we can see or measure, which give us some
idea about the current state. Here, we have three activities:

 Walking (O1)
 Shopping (O2)
 Cleaning (O3)

The observation is assumed to be influenced by the weather. For instance,


a person is more likely to walk outside on a sunny day, while they might
clean indoors if it’s rainy.

3. Transition Matrix (A)

The transition matrix (A) defines the probabilities of moving from one
state (weather condition) to another state.
This means:

 If today is Sunny, there is a 70% chance it will be Sunny again


tomorrow, and a 30% chance it will be Rainy.
 If today is Rainy, there is a 40% chance it will be Sunny
tomorrow, and a 60% chance it will continue to be Rainy.

4. Emission Matrix (B)

The emission matrix (B) defines the probability of observing each


activity given the current weather state.

This means:

 On a Sunny day:
o There is a 60% chance that the person will go Walking.
o There is a 30% chance that the person will go Shopping.
o There is a 10% chance that the person will stay Cleaning.
 On a Rainy day:

o There is a 10% chance that the person will go Walking.


o There is a 40% chance that the person will go Shopping.
o There is a 50% chance that the person will stay Cleaning.

5. Initial State Probabilities (π)

The initial state distribution (π) represents the probability of starting in


each state.

This means that there is an 80% chance that the first day is Sunny and a
20% chance that it is Rainy.
6. Likelihood Calculation using Forward Algorithm

Suppose we observe the following sequence of activities over three days:


Walking, Shopping, Cleaning. We want to find the probability of
observing this sequence given our HMM.

Initialization: Calculate the probability of starting in each state and


generating the first observation (Walking).

You might also like