0% found this document useful (0 votes)

16 views

Updated_k-means Naive bayes

,kjnnjknkjnkjbkbnnnnnnl,

Uploaded by

dsingh1be21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Updated_k-means Naive bayes

,kjnnjknkjnkjbkbnnnnnnl,

Uploaded by

dsingh1be21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

K-MEANS CLUSTERING ALGORITHM-

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.

• It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.
• It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid. The
main aim of this algorithm is to minimize the sum of distances between the data point and
their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid by calculating Euclidian distance formula,
which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Problem 1:
Now that we have discussed the algorithm, let us solve a numerical problem on k means
clustering. The problem is as follows. You are given 15 points in the Cartesian coordinate system
as follows.

Point Coordinates

A1 (2,10)

A2 (2,6)
A3 (11,11)

A4 (6,9)

A5 (6,4)

A6 (1,2)

A7 (5,10)

A8 (4,9)

A9 (10,12)

A10 (7,5)

A11 (9,11)

A12 (4,6)

A13 (3,10)

A14 (3,8)

A15 (6,11)
Input Dataset

We are also given the information that we need to make 3 clusters. It means we are given K=3.We
will solve this numerical on k-means clustering using the approach discussed below.

First, we will randomly choose 3 centroids from the given data. Let us consider A2 (2,6), A7 (5,10),
and A15 (6,11) as the centroids of the initial clusters. Hence, we will consider that

• Centroid 1=(2,6) is associated with cluster 1.

• Centroid 2=(5,10) is associated with cluster 2.
• Centroid 3=(6,11) is associated with cluster 3.

Now we will find the euclidean distance between each point and the centroids.
Based on the minimum distance of each point from the centroids, we will assign the points to a
cluster. I have tabulated the distance of the given points from the clusters in the following table

Distance from Distance from Distance from Assigned

Point
Centroid 1 (2,6) Centroid 2 (5,10) Centroid 3 (6,11) Cluster

A1
4 3 4.123106 Cluster 2
(2,10)

A2 (2,6) 0 5 6.403124 Cluster 1

A3
10.29563 6.082763 5 Cluster 3
(11,11)

A4 (6,9) 5 1.414214 2 Cluster 2

A5 (6,4) 4.472136 6.082763 7 Cluster 1

A6 (1,2) 4.123106 8.944272 10.29563 Cluster 1

A7
5 0 1.414214 Cluster 2
(5,10)

A8 (4,9) 3.605551 1.414214 2.828427 Cluster 2

A9
10 5.385165 4.123106 Cluster 3
(10,12)

A10
5.09902 5.385165 6.082763 Cluster 1
(7,5)

A11
8.602325 4.123106 3 Cluster 3
(9,11)

A12
2 4.123106 5.385165 Cluster 1
(4,6)
A13
4.123106 2 3.162278 Cluster 2
(3,10)

A14
2.236068 2.828427 4.242641 Cluster 1
(3,8)

A15
6.403124 1.414214 0 Cluster 3
(6,11)
Results from 1st iteration of K means clustering

At this point, we have completed the first iteration of the k-means clustering algorithm and assigned
each point into a cluster.

In the above table, you can observe that the point that is closest to the centroid of a given cluster
is assigned to the cluster.

Now, we will calculate the new centroid for each cluster.

• In cluster 1, we have 6 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), A12 (4,6), A14
(3,8). To calculate the new centroid for cluster 1, we will find the mean of the x and y
coordinates of each point in the cluster. Hence, the new centroid for cluster 1 is (3.833,
5.167).
• In cluster 2, we have 5 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), and A13
(3,10). Hence, the new centroid for cluster 2 is (4, 9.6)
• In cluster 3, we have 4 points i.e. A3 (11,11), A9 (10,12), A11 (9,11), and A15 (6,11).
Hence, the new centroid for cluster 3 is (9, 11.25).

Now that we have calculated new centroids for each cluster, we will calculate the distance of each
data point from the new centroids. Then, we will assign the points to clusters based on their
distance from the centroids. The results for this process have been given in the following table.

Distance from
Distance from Distance from Assigned
Point Centroid 1 (3.833,
centroid 2 (4, 9.6) centroid 3 (9, 11.25) Cluster
5.167)

A1
5.169 2.040 7.111 Cluster 2
(2,10)

A2 (2,6) 2.013 4.118 8.750 Cluster 1

A3
9.241 7.139 2.016 Cluster 3
(11,11)

A4 (6,9) 4.403 2.088 3.750 Cluster 2

A5 (6,4) 2.461 5.946 7.846 Cluster 1

A6 (1,2) 4.249 8.171 12.230 Cluster 1

A7
4.972 1.077 4.191 Cluster 2
(5,10)

A8 (4,9) 3.837 0.600 5.483 Cluster 2

A9
9.204 6.462 1.250 Cluster 3
(10,12)

A10
3.171 5.492 6.562 Cluster 1
(7,5)

A11
7.792 5.192 0.250 Cluster 3
(9,11)

A12
0.850 3.600 7.250 Cluster 1
(4,6)

A13
4.904 1.077 6.129 Cluster 2
(3,10)

A14
2.953 1.887 6.824 Cluster 2
(3,8)

A15
6.223 2.441 3.010 Cluster 2
(6,11)
Results from 2nd iteration of K means clustering

Now, we have completed the second iteration of the k-means clustering algorithm and assigned
each point into an updated cluster. In the above table, you can observe that the point closest to
the new centroid of a given cluster is assigned to the cluster.

Now, we will calculate the new centroid for each cluster for the third iteration.

• In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), and A12 (4,6).
To calculate the new centroid for cluster 1, we will find the mean of the x and y
coordinates of each point in the cluster. Hence, the new centroid for cluster 1 is (4, 4.6).
• In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), A13 (3,10),
A14 (3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571)
• In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new
centroid for cluster 3 is (10, 11.333).

At this point, we have calculated new centroids for each cluster. Now, we will calculate the distance
of each data point from the new centroids. Then, we will assign the points to clusters based on
their distance from the centroids. The results for this process have been given in the following
table.

Distance from Distance from

Distance from Assigned
Point centroid 2 (4.143, centroid 3 (10,
Centroid 1 (4, 4.6) Cluster
9.571) 11.333)

A1
5.758 2.186 8.110 Cluster 2
(2,10)

A2 (2,6) 2.441 4.165 9.615 Cluster 1

A3
9.485 7.004 1.054 Cluster 3
(11,11)
A4 (6,9) 4.833 1.943 4.631 Cluster 2

A5 (6,4) 2.088 5.872 8.353 Cluster 1

A6 (1,2) 3.970 8.197 12.966 Cluster 1

A7
5.492 0.958 5.175 Cluster 2
(5,10)

A8 (4,9) 4.400 0.589 6.438 Cluster 2

A9
9.527 6.341 0.667 Cluster 3
(10,12)

A10
3.027 5.390 7.008 Cluster 1
(7,5)

A11
8.122 5.063 1.054 Cluster 3
(9,11)

A12
1.400 3.574 8.028 Cluster 1
(4,6)

A13
5.492 1.221 7.126 Cluster 2
(3,10)

A14
3.544 1.943 7.753 Cluster 2
(3,8)

A15
6.705 2.343 4.014 Cluster 2
(6,11)
Results from 3rd iteration of K means clustering

Now, we have completed the third iteration of the k-means clustering algorithm and assigned each
point into an updated cluster. In the above table, you can observe that the point that is closest to
the new centroid of a given cluster is assigned to the cluster.

Now, we will calculate the new centroid for each cluster for the third iteration.

• In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), and A12 (4,6). To
calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates of
each point in the cluster. Hence, the new centroid for cluster 1 is (4, 4.6).
• In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), A13 (3,10), A14
(3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571)
• In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new
centroid for cluster 3 is (10, 11.333).

Here, you can observe that no point has changed its cluster compared to the previous iteration.
Due to this, the centroid also remains constant. Therefore, we will say that the clusters have been
stabilized. Hence, the clusters obtained after the third iteration are the final clusters made from the
given dataset. If
we plot the clusters on a graph, the graph looks like as follows.

Plot for K-Means Clustering

In the above plot, points in the clusters have been plotted using red, blue, and black markers. The
centroids of the clusters have been marked using green circles.

Naive Bayes Theorem in Machine Learning

Consider a simple problem where you need to learn a machine learning

model from a given set of attributes. Then you will have to describe a
hypothesis or a relation to a response variable and then using this
relation, you will have to predict a response, given the set of attributes
you have.

You can create a learner using Bayes' Theorem that can predict the
probability of the response variable that will belong to the same class,
given a new set of attributes.

Assume that A is the response variable and B is the given attribute. So

according to the equation of Bayes' Theorem, we have:

P(A|B): The conditional probability of the response variable that

belongs to a particular value, given the input attributes, also
known as the posterior probability.

P(A): The prior probability of the response variable.

P(B): The probability of training data (input attributes) or the
evidence.

P(B|A): This is termed as the likelihood of the training data.

The Bayes' Theorem can be reformulated in correspondence with the

machine learning algorithm as:

posterior = (prior x likelihood) / (evidence)

Example:

Consider a situation where you have 1000 fruits which are either
‘banana’ or ‘apple’ or ‘other’. These will be the possible classes of the
variable Y.

The data for the following X variables all of which are in binary (0 and
1):

• Long
• Sweet
• Yellow

Not Not
Type Long Not Long Sweet Yellow Total
sweet Yellow

Banana 400 100 350 150 450 50 500

Apple 0 300 150 150 300 0 300

Not Not
Type Long Not Long Sweet Yellow Total
sweet Yellow

Other 100 100 150 50 50 150 200

Total 500 500 650 350 800 200 1000

The main agenda of the classifier is to predict if a given fruit is a

‘Banana’ or an ‘Apple’ or ‘Other’ when the three attributes(long, sweet
and yellow) are known.

Consider a case where you’re given that a fruit is long, sweet and yellow
and you need to predict what type of fruit it is. This case is similar to
the case where you need to predict Y only when the X attributes in the
training dataset are known. You can easily solve this problem by using
Naive Bayes.

The thing you need to do is to compute the 3 probabilities, i.e. the

probability of being a banana or an apple or other. The one with the
highest probability will be your answer.

Step 1:

First of all, you need to compute the proportion of each fruit class out
of all the fruits from the population which is the prior probability of
each fruit class.

The Prior probability can be calculated from the training dataset:

P(Y=Banana) = 500 / 1000 = 0.50

P(Y=Apple) = 300 / 1000 = 0.30

P(Y=Other) = 200 / 1000 = 0.20

The training dataset contains 1000 records. Out of which, you have 500
bananas, 300 apples and 200 others. So the priors are 0.5, 0.3 and 0.2
respectively.

Step 2:

Secondly, you need to calculate the probability of evidence that goes

into the denominator. It is simply the product of P of X’s for all X:

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65

P(x3=Yellow) = 800 / 1000 = 0.80

Step 3:

The third step is to compute the probability of likelihood of evidence

which is nothing but the product of conditional probabilities of the 3
attributes.

The Probability of Likelihood for Banana:

P(x1=Long | Y=Banana) = 400 / 500 = 0.80

P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70

P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

Therefore, the overall probability of likelihood for banana will be the

product of the above three,i.e. 0.8 * 0.7 * 0.9 = 0.504.

Step 4:
The last step is to substitute all the 3 equations into the mathematical
expression of Naive Bayes to get the probability.

P(Banana|Long,SweetandYellow) = [P(Long|Banana)∗P(Sweet|Ba
nana)∗P(Yellow|Banana) x P(Banana)] /[P(Long)∗P(Sweet)∗P(Yellow)]
= 0.8∗0.7∗0.9∗0.5/[P(Evidence)] = 0.252/[P(Evidence)=0.26]=0.96

P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0

P(Other|Long,Sweet and Yellow) = 0.01875/[P(Evidence)=.26]=0.072

In a similar way, you can also compute the probabilities for ‘Apple’ and
‘Other’. The denominator is the same for all cases.

Banana gets the highest probability, so that will be considered as the

predicted class.

cheatsheet-transformers-large-language-models
No ratings yet
cheatsheet-transformers-large-language-models
4 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
K-Means Clustering Algorithm With Numerical Example
No ratings yet
K-Means Clustering Algorithm With Numerical Example
11 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Unit-V (1)
No ratings yet
Unit-V (1)
165 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
K Means Questions
No ratings yet
K Means Questions
2 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
DWM Question Bank Solution
No ratings yet
DWM Question Bank Solution
23 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K-Means Clustering Algorithm With Numerical Example - Coding Infinite
No ratings yet
K-Means Clustering Algorithm With Numerical Example - Coding Infinite
16 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Kmeans Clustering Numerical - 1
No ratings yet
Kmeans Clustering Numerical - 1
5 pages
Kmeans Clustering Lecture 8
No ratings yet
Kmeans Clustering Lecture 8
20 pages
algo
No ratings yet
algo
59 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Clustering Numericals
No ratings yet
Clustering Numericals
8 pages
K Means
No ratings yet
K Means
19 pages
K-means_clustering
No ratings yet
K-means_clustering
21 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
PART2
No ratings yet
PART2
61 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Example 1
No ratings yet
Example 1
8 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
K-Means
No ratings yet
K-Means
66 pages
K Means
No ratings yet
K Means
26 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
DM UNIT IV (1)
No ratings yet
DM UNIT IV (1)
45 pages
2875 27398 1 SP
No ratings yet
2875 27398 1 SP
4 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
Assignment1 M0719077 Naufal Adhi Iyansyah
No ratings yet
Assignment1 M0719077 Naufal Adhi Iyansyah
4 pages
Clustering
No ratings yet
Clustering
17 pages
Numerical Example of K-Means: Problem
No ratings yet
Numerical Example of K-Means: Problem
3 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Module 5
No ratings yet
Module 5
98 pages
ME3435E ADDTE Lect33 Machine Learning for Signal Processing 07.04.25
No ratings yet
ME3435E ADDTE Lect33 Machine Learning for Signal Processing 07.04.25
16 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
No ratings yet
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
7 pages
08_k-means
No ratings yet
08_k-means
19 pages
K Means Tutorial
No ratings yet
K Means Tutorial
8 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
K means alg,example
No ratings yet
K means alg,example
9 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
K-means Clustering Algorithm With Numerical Example
No ratings yet
K-means Clustering Algorithm With Numerical Example
7 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
MIS Network Analysis Numericals
No ratings yet
MIS Network Analysis Numericals
9 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
Green Catalysis: Heterogeneous Catalysis
From Everand
Green Catalysis: Heterogeneous Catalysis
Wiley
No ratings yet
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
IEEE-Machine Learning For The Predictive Maintenance of A Jaw Crusher in The Mining Industry
No ratings yet
IEEE-Machine Learning For The Predictive Maintenance of A Jaw Crusher in The Mining Industry
6 pages
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
No ratings yet
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
28 pages
CMPE597 Syllabus
No ratings yet
CMPE597 Syllabus
3 pages
Machine Learning For Smart Environmentscities An Iot Approach Gonalo Marques instant download
100% (1)
Machine Learning For Smart Environmentscities An Iot Approach Gonalo Marques instant download
86 pages
20CB913 Machine Learning Module 2
No ratings yet
20CB913 Machine Learning Module 2
52 pages
Real-Time Control and Optimization of Internal
No ratings yet
Real-Time Control and Optimization of Internal
11 pages
The Impact of Artificial Intelligence On Modern Business
No ratings yet
The Impact of Artificial Intelligence On Modern Business
2 pages
SYE AI and Cyber Security WP 190925
No ratings yet
SYE AI and Cyber Security WP 190925
6 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
31 pages
SSRN Id4429658
No ratings yet
SSRN Id4429658
64 pages
Research Internship Guide E-Book
No ratings yet
Research Internship Guide E-Book
11 pages
Assignmrnt 1
No ratings yet
Assignmrnt 1
9 pages
The Fundamental Problem of Authorship Attribution
No ratings yet
The Fundamental Problem of Authorship Attribution
9 pages
Unit5 - Unsupervised Learning
No ratings yet
Unit5 - Unsupervised Learning
48 pages
PGM 7
No ratings yet
PGM 7
3 pages
mobile data transfer system documentation
No ratings yet
mobile data transfer system documentation
25 pages
Res 3
No ratings yet
Res 3
4 pages
Analysis of Learning Behavior Characteristics and Prediction of Learning Effect For Improving College Students Information Literacy Based On Machine Learning
No ratings yet
Analysis of Learning Behavior Characteristics and Prediction of Learning Effect For Improving College Students Information Literacy Based On Machine Learning
15 pages
Exercise #1 7_4_2025
No ratings yet
Exercise #1 7_4_2025
3 pages
Report
No ratings yet
Report
7 pages
Internship Report PDF 2022
No ratings yet
Internship Report PDF 2022
25 pages
2019 Second International Conference On Advanced Computational and Communication Paradigms (ICACCP 2019)
No ratings yet
2019 Second International Conference On Advanced Computational and Communication Paradigms (ICACCP 2019)
8 pages
SSSSSS1212
No ratings yet
SSSSSS1212
13 pages
Intelligent Recognition of Multimodal Human Activities for Personal Healthcare
No ratings yet
Intelligent Recognition of Multimodal Human Activities for Personal Healthcare
9 pages
Research Fellow HTI Lab Job Advertisement
No ratings yet
Research Fellow HTI Lab Job Advertisement
2 pages
Data Science Immersive Syllabus: Course
No ratings yet
Data Science Immersive Syllabus: Course
4 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ARTIFICIAL INTELLIGENce ON TAX ADMINISTRATION
No ratings yet
ARTIFICIAL INTELLIGENce ON TAX ADMINISTRATION
6 pages
Class 9 Intro To AI Notes CH 1
No ratings yet
Class 9 Intro To AI Notes CH 1
16 pages

Updated_k-means Naive bayes

Uploaded by

Updated_k-means Naive bayes

Uploaded by

K-MEANS CLUSTERING ALGORITHM-

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

• Centroid 1=(2,6) is associated with cluster 1.

Distance from Distance from Distance from Assigned

A2 (2,6) 0 5 6.403124 Cluster 1

A4 (6,9) 5 1.414214 2 Cluster 2

A5 (6,4) 4.472136 6.082763 7 Cluster 1

A6 (1,2) 4.123106 8.944272 10.29563 Cluster 1

A8 (4,9) 3.605551 1.414214 2.828427 Cluster 2

Now, we will calculate the new centroid for each cluster.

A2 (2,6) 2.013 4.118 8.750 Cluster 1

A4 (6,9) 4.403 2.088 3.750 Cluster 2

A5 (6,4) 2.461 5.946 7.846 Cluster 1

A6 (1,2) 4.249 8.171 12.230 Cluster 1

A8 (4,9) 3.837 0.600 5.483 Cluster 2

Distance from Distance from

A2 (2,6) 2.441 4.165 9.615 Cluster 1

A5 (6,4) 2.088 5.872 8.353 Cluster 1

A6 (1,2) 3.970 8.197 12.966 Cluster 1

A8 (4,9) 4.400 0.589 6.438 Cluster 2

Plot for K-Means Clustering

Naive Bayes Theorem in Machine Learning

Consider a simple problem where you need to learn a machine learning

Assume that A is the response variable and B is the given attribute. So

P(A|B): The conditional probability of the response variable that

P(A): The prior probability of the response variable.

P(B|A): This is termed as the likelihood of the training data.

The Bayes' Theorem can be reformulated in correspondence with the

posterior = (prior x likelihood) / (evidence)

Banana 400 100 350 150 450 50 500

Apple 0 300 150 150 300 0 300

Other 100 100 150 50 50 150 200

Total 500 500 650 350 800 200 1000

The main agenda of the classifier is to predict if a given fruit is a

The thing you need to do is to compute the 3 probabilities, i.e. the

The Prior probability can be calculated from the training dataset:

P(Y=Banana) = 500 / 1000 = 0.50

P(Y=Apple) = 300 / 1000 = 0.30

Secondly, you need to calculate the probability of evidence that goes

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65

P(x3=Yellow) = 800 / 1000 = 0.80

The third step is to compute the probability of likelihood of evidence

The Probability of Likelihood for Banana:

P(x1=Long | Y=Banana) = 400 / 500 = 0.80

P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70

P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

Therefore, the overall probability of likelihood for banana will be the

P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0

P(Other|Long,Sweet and Yellow) = 0.01875/[P(Evidence)=.26]=0.072

Banana gets the highest probability, so that will be considered as the

You might also like