0% found this document useful (0 votes)

13 views65 pages

Lect-7-DM

Uploaded by

Júnia Pereira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views65 pages

Lect-7-DM

Uploaded by

Júnia Pereira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Data Mining

Lecture # 7
Naïve Bayes Classifier

1
Naïve Bayes Classifier

Thomas Bayes
1702 - 1761

We will start off with a visual intuition, before looking at the math…
Background
 There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: multi-layered perceptron with the cross-entropy
cost
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model based classifiers
 a) and b) are examples of discriminative classification
 c) is an example of generative classification
 b) and c) are both examples of probabilistic classification
Grasshoppers Katydids

10
9
8
Antenna Length

7
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Abdomen Length
Remember this example? Let’s
get lots more data…
With a lot of data, we can build a histogram.
Let us just build one for “Antenna Length” for
now…
10
9
8
Antenna Length

7
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10

Katydids
Grasshoppers
We can leave the
histograms as they are, or
we can summarize them
with two normal
distributions.

Let us use two normal

distributions for ease of
visualization in the following
slides…
• We want to classify an insect we have found. Its antennae are 3 units
long. How can we classify it?

• We can just ask ourselves, give the distributions of antennae lengths

we have seen, is it more probable that our insect is a Grasshopper or
a Katydid.
• There is a formal way to discuss the most probable classification…
p(cj | d) = probability of class cj, given that we have observed d

Antennae length is 3
p(cj | d) = probability of class cj, given that we have observed d

P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833

P(Katydid | 3 ) = 2 / (10 + 2)= 0.166

Antennae length is 3
p(cj | d) = probability of class cj, given that we have observed d

P(Grasshopper | 7 ) = 3 / (3 + 9)= 0.250

P(Katydid | 7 ) = 9 / (3 + 9) = 0.750

Antennae length is 7
p(cj | d) = probability of class cj, given that we have observed d

P(Grasshopper | 5 ) = 6 / (6 + 6)= 0.500

P(Katydid | 5 ) = 6 / (6 + 6) = 0.500

6 6

Antennae length is 5
Bayes Classifiers
That was a visual intuition for a simple case of the Bayes classifier, also called:

• Idiot Bayes
• Naïve Bayes
• Simple Bayes

We are about to see some of the mathematical formalisms, and more examples,
but keep in mind the basic idea.

Find out the probability of the previously unseen instance belonging to each
class, then simply pick the most probable class.
Self Study
Concepts related to probability
Probability Basics
• Prior, conditional and joint probability
– Prior probability:P(X)
– Conditional probability: P(X1 |X2 ), P(X2 |X1 )
– Joint probability:X  (X1 , X2 ), P(X)  P(X1 ,X2 )
– Relationship:P(X1 ,X2 )  P(X2 |X1 )P(X1 )  P(X1 |X2 )P(X2 )
– Independence:P(X |X )  P(X ), P(X |X )  P(X ), P(X ,X )  P(X )P(X )
2 1 2 1 2 1 1 2 1 2
• Bayesian Rule

P( X|C)P(C) Likelihood Prior

P(C |X)  Posterior 
P( X) Evidence
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model
P(C|X) C  c1 ,  , cL , X  (X1 ,  , Xn )
– Generative model
P(X|C) C  c1 ,  , cL , X  (X1 ,  , Xn )
• MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if P(C  c * |X  x)  P(C  c |X  x) c  c * , c  c ,  , c
1 L
• Generative classification with the MAP rule
– Apply Bayesian rule to convert: P(X|C)P(C)
P(C |X)   P(X|C)P(C)
P( X)
Naïve Bayes
• Bayes classification
P(C|X)  P(X|C)P(C)  P(X1 ,  , Xn |C)P(C)
Difficulty: learning the joint probabilityP(X ,  , X |C)
1 n

• Naïve Bayes classification

– Making the assumption that all input attributes are independent
P( X1 , X2 ,  , Xn |C )  P( X1 | X2 ,  , Xn ; C )P( X2 ,  , Xn |C )
 P( X1 |C )P( X2 ,  , Xn |C )
 P( X1 |C )P( X2 |C )    P( Xn |C )
– MAP classification rule

[ P( x1 |c * )    P( xn |c* )]P(c* )  [ P( x1 |c)    P( xn |c)]P(c), c  c* , c  c1 ,  , cL

Naïve Bayes
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
For each target value of ci (ci  c1 ,  , c L )
Pˆ (C  ci )  estimate P(C  ci ) with examplesin S;
For every attribute value a jk of each attribute x j ( j  1,  , n; k  1,  , N j )
Pˆ ( X j  a jk |C  ci )  estimate P( X j  a jk |C  ci ) with examplesin S;
Output: conditional probability tables; forx N  L elements
j, j

– Test Phase: Given an unknown instance X  (a ,  , a ) ,

1 n
Look up tables to assign the label c* to X’ if

[ Pˆ ( a1 |c * )    Pˆ ( an |c * )]Pˆ (c * )  [ Pˆ ( a1 |c)    Pˆ ( an |c)]Pˆ (c), c  c * , c  c1 ,  , c L

Example
• Example: Play Tennis
Example
• Learning Phase
Outlook Play=Yes Play=No Temperatur Play=Yes Play=No
Sunny e
2/9 3/5
Hot 2/9 2/5
Overcast 4/9 0/5
Mild 4/9 2/5
Rain 3/9 2/5
Cool 3/9 1/5
Humidity Play=Ye Play=N Wind Play=Yes Play=No
s o Strong 3/9 3/5
High 3/9 4/5 Weak 6/9 2/5
Normal 6/9 1/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Assume that we have two classes (Note: “Drew
c1 = male, and c2 = female. can be a male
or female
name”)
We have a person whose sex we do not
know, say “drew” or d. Drew Barrymore
Classifying drew as male or female is
equivalent to asking is it more probable
that drew is male or female, I.e which is
greater p(male | drew) or p(female |
drew)
Drew Carey
What is the probability of being
called “drew” given that you are a
male? What is the
probability of being a
p(male | drew) = p(drew | male ) p(male) male?
p(drew) What is the probability
of being named “drew”?
(actually irrelevant, since it is
that same for all classes)
This is Officer . Is Officer Drew a Male
or Female?
Luckily, we have a small database
with names and sex.

We can use it to apply Bayes rule…

Name Sex
Drew Male
Officer Drew Claudia Female
Drew Female
Drew Female
p(cj | d) = p(d | cj ) p(cj) Alberto Male
p(d) Karin Female
Nina Female
Sergio Male
Name Sex
Drew Male
Claudia Female
Drew Female
Drew Female
p(cj | d) = p(d | cj ) p(cj) Alberto Male
p(d) Karin Female
Officer Drew
Nina Female
Sergio Male
p(male | drew) = 1/3 * 3/8 = 0.125
3/8 3/8
Officer Drew is more
likely to be a Female.
p(female | drew) = 2/5 * 5/8 = 0.250
3/8 3/8
Officer Drew IS a female!

Officer Drew
So far we have only considered Bayes
Classification when we have one attribute (the p(cj | d) = p(d | cj ) p(cj)
“antennae length”, or the “name”). But we may
have many features.
p(d)
How do we use all the features?

Name Over 170CM Eye Hair length Sex

Drew No Blue Short Male
Claudia Yes Brown Long Female
Drew No Blue Long Female
Drew No Blue Long Female
Alberto Yes Brown Short Male
Karin No Blue Long Female
Nina Yes Brown Short Female
Sergio Yes Blue Long Male
• To simplify the task, naïve Bayesian classifiers assume
attributes have independent distributions, and thereby
estimate

p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

The probability of class cj
generating instance d,
equals….

The probability of class

cj generating the
observed value for
feature 1, multiplied by..
The probability of class
cj generating the
observed value for
feature 2, multiplied by..
• To simplify the task, naïve Bayesian classifiers
assume attributes have independent distributions,
and thereby estimate
p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * ….

Officer Drew is
blue-eyed,
over 170cm tall, p(officer drew| Female) = 2/5 * 3/5 * ….
and has long p(officer drew| Male) = 2/3 * 2/3 * ….
hair
The Naive Bayes classifiers is
often represented as this type of
graph… cj
Note the direction of the arrows,
which state that each class
causes certain features, with a
certain probability

p(d1|cj) p(d2|cj) … p(dn|cj)

Naïve Bayes is fast cj
and space efficient

We can look up all the

probabilities with a single scan of
the database and store them in a
(small) table…
p(d1|cj) p(d2|cj) … p(dn|cj)

Sex Over190cm Sex Long Hair Sex

Male Yes 0.15 Male Yes 0.05 Male
No 0.85 No 0.95
Female Yes 0.01 Female Yes 0.70 Female
No 0.99 No 0.30
Naïve Bayes is NOT sensitive to irrelevant features...

Suppose we are trying to classify a persons sex

based on several features, including eye color. (Of
course, eye color is completely irrelevant to a
persons gender)
p(Jessica |cj) = p(eye = brown|cj) * p( wears_dress = yes|cj) * ….
p(Jessica | Female) = 9,000/10,000 * 9,975/10,000 * ….
p(Jessica | Male) = 9,001/10,000 * 2/10,000 * ….

Almost the same!

However, this assumes that we have good enough estimates of the

probabilities, so the more data the better.
An obvious point. I have used a simple
two class problem, and two possible
values for each example, for my previous
cj
examples. However we can have an
arbitrary number of classes, or feature
values

p(d1|cj) p(d2|cj) … p(dn|cj)

Animal Color
Animal Mass >10kg Cat Black 0.33 Animal
Cat Yes 0.15 White 0.23 Cat
No 0.85 Brown 0.44
Dog Yes 0.91 Dog
Dog Black 0.97
No 0.09 White 0.03
Pig
Pig Yes 0.99 Brown 0.90
No 0.01 Pig Black 0.04
White 0.01
Brown 0.95
Problem! Naïve Bayesian
p(d|cj) Classifier
Naïve Bayes assumes
independence of
features…

p(d1|cj) p(d2|cj) p(dn|cj)

Sex Over 6 Sex Over 200

foot pounds
Male Yes 0.15 Male Yes 0.11
No 0.85 No 0.80
Female Yes 0.01 Female Yes 0.05
No 0.99 No 0.95
Solution Naïve Bayesian
p(d|cj) Classifier
Consider the relationships
between attributes…

p(d1|cj) p(d2|cj) p(dn|cj)

Sex Over 6 Sex Over 200 pounds

foot
Male Yes and Over 6 foot 0.11
Male Yes 0.15
No and Over 6 foot 0.59
No 0.85
Yes and NOT Over 6 foot 0.05
Female Yes 0.01
No and NOT Over 6 foot 0.35
No 0.99
Female Yes and Over 6 foot 0.01
Solution Naïve Bayesian
p(d|cj) Classifier
Consider the relationships
between attributes…

p(d1|cj) p(d2|cj) p(dn|cj)

But how do we find the set of connecting arcs??

The Naïve Bayesian Classifier
has a quadratic decision boundary
10
9
8
7
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,P(X1 ,  , Xn |C)  P(X1 |C)    P(Xn |C)
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the attribute valueX  a , Pˆ ( X  a |C  c )  0
j jk j jk i
– In this circumstance, Pˆ ( x |c )    Pˆ (a |c )    Pˆ ( x |c )  0 during test
1 i jk i n i
– For a remedy, conditional probabilities estimated with
n  mp
Pˆ ( X j  a jk |C  ci )  c
nm
nc : numberof training examplesfor which X j  a jk and C  ci
n : numberof training examplesfor which C  ci
p : prior estimate(usually, p  1 / t for t possible values of X j )
m : weight to prior (numberof " virtual" examples, m  1)
Estimating Probabilities
• Normally, probabilities are estimated based on observed
frequencies in the training data.
• If D contains nk examples in category yk, and nijk of these nk
examples have the jth value for feature Xi, xij, then:
nijk
P( X i  xij | Y  yk ) 
nk
• However, estimating such probabilities from small training
sets is error-prone.

– If due only to chance, a rare feature, Xi, is always false in the

training data, yk :P(Xi=true | Y=yk) = 0.
– If Xi=true then occurs in a test example, X, the result is that yk:
P(X | Y=yk) = 0 and yk: P(Y=yk | X) = 0
9/16/2013 4747
Probability Estimation Example
Probability positive negative
Ex Size Color Shape Category
P(Y) 0.5 0.5
1 small red circle positive P(small | Y) 0.5 0.5
P(medium | Y) 0.0 0.0
2 large red circle positive
P(large | Y) 0.5 0.5
3 small red triangle negitive P(red | Y) 1.0 0.5
P(blue | Y) 0.0 0.5
4 large blue circle negitive
P(green | Y) 0.0 0.0
P(square | Y) 0.0 0.0

Test Instance X: P(triangle | Y) 0.0 0.5

<medium, red, circle> P(circle | Y) 1.0 0.5

P(positive | X) = 0.5 * 0.0 * 1.0 * 1.0 / P(X) = 0

9/16/2013 4949
Naïve Bayes Example
Probability positive negative
P(Y) 0.5 0.5
P(medium | Y) 0.1 0.2
P(red | Y) 0.9 0.3 Test Instance:
<medium ,red, circle>
P(circle | Y) 0.9 0.3

P(positive | X) = P(positive)P(medium | positive)P(red | positive)*P(circle | positive) / P(X)

P(X) = (0.0405 + 0.009) = 0.0495

9/16/2013 5050
Smoothing
• To account for estimation from small samples,
probability estimates are adjusted or smoothed.
• Laplace smoothing using an m-estimate assumes
that each feature is given a prior probability, p, that
is assumed to have been previously observed in a
“virtual” sample of size m.
nijk  mp
P( X i  xij | Y  yk ) 
nk  m

• For binary features, p is simply assumed to be 0.5.

9/16/2013 5151
Laplace Smothing Example
• Assume training set contains 10 positive examples:
– 4: small
– 0: medium
– 6: large
• Estimate parameters as follows (if m=1, p=1/3)
– P(small | positive) = (4 + 1/3) / (10 + 1) = 0.394
– P(medium | positive) = (0 + 1/3) / (10 + 1) = 0.03
– P(large | positive) = (6 + 1/3) / (10 + 1) = 0.576
– P(small or medium or large | positive) = 1.0

9/16/2013 5252
Numerical Stability
• It is often the case that machine learning algorithms need to
work with very small numbers
– Imagine computing the probability of 2000 independent
coin flips
– MATLAB thinks that (.5)2000=0

9/16/2013 53
Stochastic Language Models
• Models probability of generating strings (each
word in turn) in the language (commonly all
strings over ∑). E.g., unigram model
Model M

0.2 the
the guy likes the fruit
0.1 a
0.01 guy 0.2 0.01 0.02 0.2 0.01

0.01 fruit
0.03 said multiply
0.02 likes
P(s | M) = 0.00000008
…
9/16/2013 54
13.2.1
Numerical Stability
• Instead of comparing P(Y=5|X1,…,Xn) with P(Y=6|X1,…,Xn),
– Compare their logarithms

9/16/2013 55
Underflow Prevention
• Multiplying lots of probabilities, which are
between 0 and 1 by definition, can result in
floating-point underflow.
• Since log(xy) = log(x) + log(y), it is better to
perform all computations by summing logs of
probabilities rather than multiplying
probabilities.
• Class with highest final un-normalized log
probability score is still the most probable.
9/16/2013 5656
Relevant Issues
• Continuous-valued Input Attributes
– Numberless values for an attribute
– Conditional probability modeled with the normal distribution
1  ( X j   ji )2 
Pˆ ( X j |C  ci )  exp  
2  ji  2 ji 
2

 ji : mean(avearage)of attribute values X j of examples for which C  ci
 ji : standarddeviation of attribute values X j of examples for which C  ci

– Learning Phase: for X  (X ,  , X ), C  c ,  , c

1 n 1 L
Output: n L normal distributions andP(C  c ) i  1,  , L
i
– Test Phase:
for X  (X1 ,  , Xn )
• Calculate conditional probabilities with all the normal distributions
• Apply the MAP rule to make a decision
Data with Numeric Attributes
foot
sex height (feet) weight (lbs)
size(inches)
male 6 180 12
male 5.92 (5'11") 190 11
male 5.58 (5'7") 170 12
male 5.92 (5'11") 165 10
female 5 100 6
female 5.5 (5'6") 150 8
female 5.42 (5'5") 130 7
female 5.75 (5'9") 150 9

mean variance mean variance mean variance

sex
(height) (height) (weight) (weight) (foot size) (foot size)
3.5033e- 1.2292e+0 9.1667e-
male 5.855 176.25 11.25
02 2 01
9.7225e- 5.5833e+0 1.6667e+0
female 5.4175 132.5 7.5
02 2 0
Data with Numeric Attributes

foot
sex height (feet) weight (lbs)
size(inches)
sample 6 130 8
Since posterior numerator is greater in the female case, we predict the
sample is female.
Advantages/Disadvantages of Naïve Bayes
• Advantages:
– Fast to train (single scan). Fast to classify
– Not sensitive to irrelevant features
– Handles real and discrete data
– Handles streaming data well
• Disadvantages:
– Assumes independence of features
Conclusions
 Naïve Bayes based on the independence assumption
 Training is very easy and fast; just requiring considering each
attribute in each class separately
 Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions
 A popular generative model
 Performance competitive to most of state-of-the-art
classifiers even in presence of violating independence
assumption
 Many successful applications, e.g., spam mail filtering
 Apart from classification, naïve Bayes can do more…
Conclusions
• Naïve Bayes is:
– Really easy to implement and often works well
– Often a good first thing to try
– Commonly used as a “punching bag” for smarter
algorithms

9/16/2013 64
Acknowledgements
 Introduction to Machine Learning, Alphaydin
 Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Material in these slides has been taken from, the following resources

 Pattern Recognition and Analysis Course – A.K. Jain, MSU

 Pattern Classification” by Duda et al., John Wiley & Sons.
 http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture13.html
 http://en.wikipedia.org/wiki/Naive_Bayes_classifier
 Some Material adopted from Dr. Adam Prugel-Bennett

Internship - Report - On - Ai - and - ML - 23P15A0513 SARATH - Final
No ratings yet
Internship - Report - On - Ai - and - ML - 23P15A0513 SARATH - Final
32 pages
Applied Data Science Questions
No ratings yet
Applied Data Science Questions
15 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
CPE412 Pattern Recognition (Week 5) - Updated
No ratings yet
CPE412 Pattern Recognition (Week 5) - Updated
36 pages
CPE412 Pattern Recognition (Week 4)
No ratings yet
CPE412 Pattern Recognition (Week 4)
47 pages
Naive Bayes
No ratings yet
Naive Bayes
36 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
19 pages
Bayesian Classification Examples
No ratings yet
Bayesian Classification Examples
17 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
ML-09-naive-bayes-classifier
No ratings yet
ML-09-naive-bayes-classifier
24 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Clasificacion Espol
No ratings yet
Clasificacion Espol
75 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
10 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
27 pages
Probabilistic Class I Fiers
No ratings yet
Probabilistic Class I Fiers
5 pages
Chapter_4 (2)
No ratings yet
Chapter_4 (2)
22 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
NaiveBayes TomasWard
No ratings yet
NaiveBayes TomasWard
39 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
No ratings yet
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
25 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
ML_lecture15
No ratings yet
ML_lecture15
13 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
IML Module 3.pptx
No ratings yet
IML Module 3.pptx
95 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Naive-Bayes
No ratings yet
Naive-Bayes
25 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet
Tarp Rev3
No ratings yet
Tarp Rev3
32 pages
ML Hw1
No ratings yet
ML Hw1
2 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
1-s2.0-S0032591023005909-main
No ratings yet
1-s2.0-S0032591023005909-main
13 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Prediction of Hospital Admission Using Machine Learning
No ratings yet
Prediction of Hospital Admission Using Machine Learning
9 pages
56.eye Ball Cursor Movement Using Opencv
75% (4)
56.eye Ball Cursor Movement Using Opencv
47 pages
Final Viva Ppt
No ratings yet
Final Viva Ppt
27 pages
Cs3491 Aiml Unit 2 Qbank
No ratings yet
Cs3491 Aiml Unit 2 Qbank
33 pages
b 14 Sms Spam Detection Ml Ieee Report (1)
No ratings yet
b 14 Sms Spam Detection Ml Ieee Report (1)
5 pages
AI & ML Question Bank
No ratings yet
AI & ML Question Bank
4 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Project Report
100% (3)
Project Report
36 pages
Learning Probabilistic Graphical Models in R 1st Edition David Bellot all chapter instant download
100% (5)
Learning Probabilistic Graphical Models in R 1st Edition David Bellot all chapter instant download
55 pages
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
No ratings yet
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
31 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Ijetr042741 PDF
No ratings yet
Ijetr042741 PDF
4 pages
Crash Severity Analysis Through Nonparametric Machine Learning
No ratings yet
Crash Severity Analysis Through Nonparametric Machine Learning
14 pages
Big Data Analytics With Storm, Spark and GraphLab
100% (1)
Big Data Analytics With Storm, Spark and GraphLab
53 pages
DM Mod4
No ratings yet
DM Mod4
108 pages
Module 04
No ratings yet
Module 04
75 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
No ratings yet
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
29 pages
Machine Learning Project Report
100% (2)
Machine Learning Project Report
89 pages
PT Report
No ratings yet
PT Report
50 pages
Internship Report On Machine Learning
100% (1)
Internship Report On Machine Learning
26 pages
Sentiment Analysis of Movie Reviews Using Machine Learning Techniques
No ratings yet
Sentiment Analysis of Movie Reviews Using Machine Learning Techniques
6 pages

Lect-7-DM

Uploaded by

Lect-7-DM

Uploaded by

Data Mining

Let us use two normal

• We can just ask ourselves, give the distributions of antennae lengths

P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833

P(Grasshopper | 7 ) = 3 / (3 + 9)= 0.250

P(Grasshopper | 5 ) = 6 / (6 + 6)= 0.500

P( X|C)P(C) Likelihood Prior

• Naïve Bayes classification

[ P( x1 |c * )    P( xn |c* )]P(c* )  [ P( x1 |c)    P( xn |c)]P(c), c  c* , c  c1 ,  , cL

– Test Phase: Given an unknown instance X  (a ,  , a ) ,

[ Pˆ ( a1 |c * )    Pˆ ( an |c * )]Pˆ (c * )  [ Pˆ ( a1 |c)    Pˆ ( an |c)]Pˆ (c), c  c * , c  c1 ,  , c L

P(Play=Yes) = 9/14 P(Play=No) = 5/14

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

We can use it to apply Bayes rule…

Name Over 170CM Eye Hair length Sex

p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

The probability of class

p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * ….

p(d1|cj) p(d2|cj) … p(dn|cj)

We can look up all the

Sex Over190cm Sex Long Hair Sex

Suppose we are trying to classify a persons sex

Almost the same!

However, this assumes that we have good enough estimates of the

p(d1|cj) p(d2|cj) … p(dn|cj)

p(d1|cj) p(d2|cj) p(dn|cj)

Sex Over 6 Sex Over 200

p(d1|cj) p(d2|cj) p(dn|cj)

Sex Over 6 Sex Over 200 pounds

p(d1|cj) p(d2|cj) p(dn|cj)

But how do we find the set of connecting arcs??

– If due only to chance, a rare feature, Xi, is always false in the

Test Instance X: P(triangle | Y) 0.0 0.5

P(positive | X) = 0.5 * 0.0 * 1.0 * 1.0 / P(X) = 0

P(positive | X) = P(positive)*P(medium | positive)*P(red | positive)*P(circle | positive) / P(X)

P(X) = (0.0405 + 0.009) = 0.0495

• For binary features, p is simply assumed to be 0.5.

– Learning Phase: for X  (X ,  , X ), C  c ,  , c

mean variance mean variance mean variance

 Pattern Recognition and Analysis Course – A.K. Jain, MSU

You might also like

P(positive | X) = P(positive)P(medium | positive)P(red | positive)*P(circle | positive) / P(X)