Ex.no.5_Naïve Bayesian classifier
Ex.no.5_Naïve Bayesian classifier
Description:
The main idea behind the Naive Bayes classifier is to use Bayes’ Theorem to classify data
based on the probabilities of different classes given the features of the data. It is used mostly
in high-dimensional text classification. The Naive Bayes Classifier is a simple probabilistic
classifier with very few parameters used to build ML models that can predict at a faster speed
than other classification algorithms. Naive Bayes is called “naive” because it assumes that the
features of a data point are independent of each other. The Naïve Bayes Algorithm is used in
spam filtration, Sentiment analysis, classifying articles, etc.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
Naive Bayes classifier calculates the probability of an event in the following steps:
Given an example of weather conditions and playing sports, calculate the probability of
playing sports. Now, you need to classify whether players will play or not, based on the
weather condition.
The Frequency table contains the occurrence of labels for all features. There are two
likelihood tables. Likelihood Table 1 is showing prior probabilities of labels and Likelihood
Table 2 is showing the posterior probability.
Suppose you want to calculate the probability of playing when the weather is overcast.
Probability of playing:
The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast
than players will play the sport.
Problem statements:
2. Given a dataset of breast cancer patient records, the task is to build a Naïve Bayes
Classifier to predict whether a tumor is malignant or benign based on various features
such as tumor size, texture, and other medical measurements.
Breast Cancer dataset
mean_radius mean_texture mean_perimeter mean_area mean_smoothness diagnosis
17.99 10.38 122.8 1001 0.1184 0
20.57 17.77 132.9 1326 0.08474 0
19.69 21.25 130 1203 0.1096 0
11.42 20.38 77.58 386.1 0.1425 0
20.29 14.34 135.1 1297 0.1003 0
12.45 15.7 82.57 477.1 0.1278 0
16.13 20.68 108.1 798.8 0.117 0
19.81 22.15 130 1260 0.09831 0
13.54 14.36 87.46 566.3 0.09779 1
13.08 15.71 85.63 520 0.1075 1
9.504 12.44 60.34 273.9 0.1024 1