0% found this document useful (0 votes)
2 views

Data Mining - Module 7

This document outlines a course on Data Mining with a focus on the Naive Bayes Algorithm, detailing learning outcomes, course content, and references. It covers the algorithm's background, probability basics, classification methods, and practical applications, including advantages and disadvantages. The document also includes exercises for students to apply the Naive Bayes Algorithm in real-world scenarios.

Uploaded by

DHEN DHEN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Mining - Module 7

This document outlines a course on Data Mining with a focus on the Naive Bayes Algorithm, detailing learning outcomes, course content, and references. It covers the algorithm's background, probability basics, classification methods, and practical applications, including advantages and disadvantages. The document also includes exercises for students to apply the Naive Bayes Algorithm in real-world scenarios.

Uploaded by

DHEN DHEN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Republic of the Philippines

Province of Cotabato
Municipality of Makilala
MAKILALA INSTITUTE OF SCIENCE AND TECHNOLOGY
Makilala, Cotabato

COLLEGE OF TECHNOLOGY AND INFORMATION SYSTEMS


Bachelor of Science in Information Systems

Course Number : PROF EL 3 Instructor : RONALD L. BAJADOR


Course Description : DATA MINING Mobile Number: +639075182943
Credit Units : 3 units (3 hours lecture; 2hours laboratory) Email Address : [email protected]
Module Number :7
Duration : 2 Weeks
I. LEARNING OUTCOMES

Upon completion of this material, you should be able to:


 discuss the overview of Naive Bayes Algorithm
 fimiliarize formula and equation for basic probability
 generate a probabilistic model for classification
 apply Naive Bayes Algorithm in a given data set
 discuss the advantages and disadvantages of Naive Bayes Algorithm

II. TOPIC(S) - Naive Bayes Algorithm


Lesson 1: Naive Bayes Algorithm Background
Lesson 2: Probability Basics
Lesson 3: Probability Classification
Lesson 4: Naive Bayes and its Example
Lesson 5: advantages and disadvantages of Naive Bayes Algorithm

III. REFERENCES

 Main Textbook

 Tan, Steinbach, Karpatne, Kumar (2019). Introduction to Data Mining 2nd Edition.

 Han, J., Kamber, M. & Pei, J. (2015). Data Mining Concepts and Techniques. 3rd Edition

 I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal (2016) Data Mining: Practical Machine Learning Tools
and Techniques. 4TH Edition

IV. COURSE CONTENT

Lesson 1: Naive Bayes Algorithm Background

• It is a classification technique based on Bayes’ Theorem with an assumption of independence among


predictors.
• Naïve Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of
classification tasks.
• Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature,
- For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter

PROFEL 3 – DATA MINING 1


- Even if these features depend on each other or upon the existence of the other features, all of
these properties independently contribute to the probability that this fruit is an apple and that is
why it is known as ‘Naive’

• There are three methods to establish a classifier


a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM

b) Model the probability of class memberships given input data


Example: multi-layered perceptron with the cross-entropy cost

c) Make a probabilistic model of data within each class


Examples: naive Bayes, model based classifiers

• a) and b) are examples of discriminative classification


• c) is an example of generative classification
• b) and c) are both examples of probabilistic classification

Lesson 2: Probability Basics

• Prior, conditional and joint probability


– Prior probability: P( X )

– Conditional probability: P( X 1|X 2 ), P( X 2|X 1 )

– Joint probability: X =( X 1 , X 2 ), P( X )=P( X 1 ,X 2 )

– Relationship: P( X 1 ,X 2 )=P( X 2|X 1 ) P( X 1 )=P( X 1|X 2 )P ( X 2 )

– Independence: P( X 2|X 1 )=P( X 2 ), P ( X 1|X 2 )=P ( X 1 ), P( X 1 ,X 2 )=P( X 1 )P( X 2 )

• Bayesian Rule
P( X|C )P (C ) Likelihood ×Prior
P(C|X )= Posterior=
P( X ) Evidence

PROFEL 3 – DATA MINING 2


Lesson 3: Probabilistic Classification

• Establishing a probabilistic model for classification


– Discriminative model P(C|X ) C=c ,⋅¿⋅,c X =( X 1 ,⋅¿⋅, X n )
1 L,

– Generative model P( X|C ) C=c 1 ,⋅¿⋅,c L , X =( X 1 ,⋅¿⋅, X n )


• MAP classification rule
– MAP: Maximum A Posterior
¿ ¿
– Assign x to c* if P(C=c |X =x )>P(C=c|X=x ) c≠c , c =c 1 ,⋅¿⋅,c L

• Generative classification with the MAP rule


– Apply Bayesian rule to convert: P( X|C )P (C )
P(C|X )= ∝ P( X|C )P (C )
P( X )

Feature Histograms P(C|X )=


P( X|C )P (C )
∝ P( X|C )P (C )

P(x)
P( X )

C1
C2

Posterior x
Probability
P(C|x)
1

0
PROFEL 3 – DATA MINING 3

x
Lesson 4: Naive Bayes
 Bayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities.
 Conditional probability is a measure of the probability of an event occurring given that another event has
(by assumption, presumption, assertion, or evidence) occurred.
The formula is: —

 Which tells us: how often A happens given that B happens, written P(A|B) also called posterior probability, When
we know: how often B happens given that A happens, written P(B|A) and how likely A is on its own,
written P(A) and how likely B is on its own, written P(B).

 Bayes classification P(C|X )∝ P( X|C )P(C )=P( X 1 ,⋅¿⋅, X n|C )P(C )


Difficulty: learning the joint probability P( X 1 ,⋅¿⋅, X n|C )
 Naïve Bayes classification
– Making the assumption that all input attributes are independent
P( X 1 , X 2 ,⋅¿⋅, X n|C )=P ( X 1|X 2 ,⋅¿⋅, X n ;C )P( X 2 ,⋅¿⋅, X n|C )
=P( X 1|C )P( X 2 ,⋅¿⋅, X n|C )
=P( X 1|C )P( X 2|C )⋅¿⋅P( X n|C )

– MAP classification rule


¿ ¿ ¿ ¿
[ P( x 1|c )⋅¿⋅P ( xn |c )] P(c )>[ P( x1|c )⋅¿⋅P (x n|c )]P (c ), c≠c , c=c1 ,⋅¿⋅,c L
 Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,

Foreachtargetvalue of ci(ci=c1,⋅¿ ,cL)


(C=ci)←estimate P(C=ci) withexamples in S;¿ Foreveryat ributevalue ajk ofeachat ribute xj(j=1,⋅¿ ,n;k=1,⋅¿ ,Nj)¿ {P^ ¿(Xj=ajk|C=ci)←estimate P(Xj=ajk|C=ci) withexamplesin S;¿
{P^
Output: conditional probability tables; for x j , N j ×L elements
' ' '
– Test Phase: Given an unknown instance X =(a1 ,⋅¿⋅, a n ) ,

Look up tables to assign the label c* to X’ if


^ a' |c ¿ )⋅¿⋅P(
[ P( ^ a' |c ¿ )] P^ (c ¿ )>[ P(
^ a' |c )⋅¿⋅P(a
^ ' |c )] P^ (c ), c≠c ¿ , c=c ,⋅¿⋅,c
1 n 1 n 1 L

PROFEL 3 – DATA MINING 4


Example1: Play tennis

• Learning Phase

P(Play=Yes)P(Play=No)
= 9/14 = 5/14
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

– Look up tables

P(Outlook=Sunny|Play=Yes)P(Outlook=Sunny|Play=No)
= 2/9 =
P(Temperature=Cool|Play=Yes)
P(Temperature=Cool|Play==N
= 3/9
P(Huminity=High|Play=Yes)P(Huminity=High|Play=No)
= 3/9 =
P(Wind=Strong|Play=Yes)
PROFEL 3 – DATA MINING 5 =P(Wind=Strong|Play=No)
3/9 = 3/
P(Play=Yes) = 9/14 P(Play=No) = 5/14
– MAP rule

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|
Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|
No)P(Strong|No)]P(Play=No) = 0.0206
Example2: Car stolen

Given the fact P(Yes|x’) < P(No|x’),


we label x’ to be “No”.

 Frequency and likelihood tables for all three predictors.

 Frequency and Likelihood tables of ‘Color’

PROFEL 3 – DATA MINING 6


 Frequency and Likelihood tables of ‘Type’

 Frequency and Likelihood tables of ‘Origin’


So in our example, we have 3 predictors X.

 As per the equations discussed above, we can calculate the posterior probability P(Yes | X) as :

and, P(No | X):

Since 0.144 > 0.048, Which means given the features RED SUV and Domestic, our example gets classified as ’NO’ the
car is not stolen.

Lesson 6: Advantages and Disadvantages of NB Algorithm


 Advantages

 It is easy and fast to predict class of test data set. It also perform well in multi class prediction
 When assumption of independence holds, a Naive Bayes classifier performs better compare to
other models like logistic regression and you need less training data.
 It perform well in case of categorical input variables compared to numerical variable(s). For numerical
variable, normal distribution is assumed (bell curve, which is a strong assumption).

 Disadvantages

 If categorical variable has a category (in test data set), which was not observed in training data set,
then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as

PROFEL 3 – DATA MINING 7


“Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing
techniques is called Laplace estimation.
 On the other side naive Bayes is also known as a bad estimator, so the probability outputs
from predict_proba are not to be taken too seriously.
 Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost
impossible that we get a set of predictors which are completely independent.

V. ACTIVITY/ EXERCISES/EVALUATION

1. The following table informs about decision making factors to buy computer. Apply NB Algorithm to classify.

Credit Buy
Age Income Student Rating Computer
<=30 High No Fair No
<=30 High No Excellent No
31-40 High No Fair Yes
Mediu
>40 m No Fair yes
>40 Low Yes Fair Yes
>40 Low Yes Excellent No
31-40 Low Yes Excellent Yes
Mediu
<=30 m No Fair No
<=30 Low Yes Fair Yes
Mediu
>40 m Yes Fair Yes
Mediu
<=30 m Yes Excellent Yes
Mediu
31-40 m No Excellent Yes
31-40 High Yes Fair Yes
Mediu
>40 m No Excellent No

2. Discuss some advantages and disadvantages of NAIVE BAYES Algorithm

PROFEL 3 – DATA MINING 8

You might also like