ML Lab 7 - Naive Bayes
ML Lab 7 - Naive Bayes
Naive Bayes
Machine Learning
BITS F464
I Semester 2023-24
The Naive Bayes (NB) algorithm describes a simple application using Bayes' theorem for
classification. The naive Bayes algorithm is named as such because it makes a couple of "naive"
assumptions about the data.
1. All of the features in the dataset are equally important and independent.
2. It assumes class-conditional independence, which means that events are independent so long
as they are conditioned on the same class value.
Consider the below sample dataset comprising of target concept i.e. PlayTennis and 4 features :
Outlook, Temperature, Humidity, and Windy
The Naive Bayes classification algorithm we used in the preceding example can be summarized
by the following formula. Essentially, the probability of level L for class C, given the evidence
provided by features F1 through Fn, is equal to the product of the probabilities of each piece of
evidence conditioned on the class level, the prior probability of the class level.
P(CL|F1,…………..,Fn) = P(CL)∏ni=1𝑃(𝐹𝑖|𝐶𝐿)
# Install and load e1071 package
library("e1071")
#Load the dataset
data = read.csv("Lab3/WeatherData.csv",stringsAsFactors = FALSE)
data
#Display the structure of dataset
str(data)
#Encode the target vector as factor
data$PlayTennis <- factor(data$PlayTennis)
#Display the frequency of each target class type
table(data$PlayTennis)
#Display the probability of each target class type
prop.table(table(data$PlayTennis))
#Build the model
classifier <- naiveBayes(data,data$PlayTennis)
classifier
#Generate test sample (X)
Outlook <- "rain"
Temperature <- "mild"
Humidity <- "high"
Windy <- "strong"
#Load the instance X as dataframe
test_data <- data.frame(Outlook,Temperature,Humidity,Windy)
#Predict the target class for a given instance X
test_pred <- predict(classifier,test_data)
test_pred
Li = Ci +S / N+K
Where
Ci: Count of tuples satisfying the test condition
S: Laplacian parameter (add small no.) , usually 1
N: Total no. of tuples belong to that class value
K: Count of Distinct value in particular feature
library("e1071")
data1 = read.csv("Lab3/WeatherData.csv",stringsAsFactors = FALSE)
data1
str(data1)
data1$Salary <- factor(data1$Humidity)
table(data1$Humidity)
prop.table(table(data1$Humidity))