SlideShare a Scribd company logo
K-Means Clustering
What is K-Means Clustering?
What is K-Means Clustering?
What is K-Means Clustering?
k-means performs division of objects
into clusters which are “similar”
between them and are “dissimilar” to
the objects belonging to another cluster
What is K-Means Clustering?
What is K-Means Clustering?
Can you explain this with an example?
What is K-Means Clustering?
Sure. For understanding K-Means in a
better way, let’s take an example of
Cricket
Can you explain this with an example?
What is K-Means Clustering?
Task: Identify bowlers and batsmen
What is K-Means Clustering?
Task: Identify bowlers and batsmen
 The data contains runs and wickets gained in the last 10 matches
 So, the bowler will have more wickets and the batsmen will have higher runs
Scores
What is K-Means Clustering?
Assign data points
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
What is K-Means Clustering?
Lorem ipsum
Cluster 1Assign data points
Lorem ipsum
We can see that this cluster
has players with high runs and
low wickets
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
Runs
Wickets
What is K-Means Clustering?
And here, we can see that this
cluster has players with high
wickets and low wickets
Lorem ipsum
Cluster 1 Cluster 2Assign data points
Lorem ipsumLorem ipsum
We can see that this cluster
has players with high runs and
low wickets
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
Runs
Wickets
Runs
Wickets
What is K-Means Clustering?
Consider the same data set of cricket
Solve the problem using K-Means
What is K-Means Clustering?
Initially, two centroids are assigned randomly
Euclidean distance to find out which centroid is closest to each data point and the data points are
assigned to the corresponding centroids
What is K-Means Clustering?
Reposition the two centroids for optimization.
What is K-Means Clustering?
The process is iteratively repeated until our centroids become static
What’s in it for you?
Types of Clustering
What is K-Means Clustering?
Applications of K-Means clustering
Common distance measure
How does K-Means clustering work?
K-Means Clustering Algorithm
Demo: K-Means Clustering
Use Case: Color Compression
Types of Clustering
Types of Clustering
Clustering
Hierarchical
Clustering
Agglomerative Divisive
Partitional
Clustering
K-Means Fuzzy C-Means
Types of Clustering
Clustering
Hierarchical
Clustering
Division
Clusters have a tree like structure or a parent
child relationship
Types of Clustering
Clustering
Hierarchical
Clustering
Agglomerative Divisive
a b c fd e
debc
def
bcdef
abcdef
“Bottom up" approach: Begin with
each element as a separate cluster
and merge them into successively
larger clusters
Types of Clustering
“Top down“ approach begin with the
whole set and proceed to divide it into
successively smaller clusters.
a b c fd e
de
def
bcdef
abcdef
bc
Clustering
Hierarchical
Clustering
Agglomerative Divisive
Types of Clustering
Clustering
Partitional Clustering
K-Means Fuzzy C-Means
c1
c2
Division of objects into clusters such
that each object is in exactly one
cluster, not several
Types of Clustering
Clustering
Partitional Clustering
K-Means Fuzzy C-Means
Division of objects into clusters such
that each object can belong to
multiple clusters
c2c1
Applications of K-Means Clustering
Applications of K-Means Clustering
Academic
Performance
Wireless Sensor
Network's
Diagnostic
Systems
Search Engines
Distance Measure
Distance Measure
Euclidean
distance
measure
Manhattan
distance
measure
Squared Euclidean
distance measure
Cosine distance
measure
Distance measure will determine the similarity between two elements and it will influence the shape of
the clusters
Euclidean Distance Measure
• The Euclidean distance is the "ordinary" straight line
• It is the distance between two points in Euclidean space
d=√ 𝑖=1
𝑛
( 𝑞𝑖− )2
p
q
Euclidian
Distance
𝑝𝑖
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
Squared Euclidean Distance Measure
The Euclidean squared distance metric uses the same equation as the
Euclidean distance metric, but does not take the square root.
d= 𝑖=1
𝑛
( 𝑞𝑖− )2
𝑝𝑖
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
Manhattan Distance Measure
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
The Manhattan distance is the simple sum of the horizontal and vertical
components or the distance between two points measured along axes at right angles
d= 𝑖=1
𝑛
| 𝑞 𝑥− |
p
q
Manhattan
Distance
𝑝 𝑥 +|𝑞 𝑥− |𝑝 𝑦
(x,y)
(x,y)
Cosine Distance Measure
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
The cosine distance similarity measures the angle between the two vectors
p
q
Cosine
Distance
𝑖=0
𝑛−1
𝑞𝑖−
𝑖=0
𝑛−1
(𝑞𝑖)2
× 𝑖=0
𝑛−1
(𝑝𝑖)2
d=
𝑝 𝑥
How does K-Means clustering work?
How does K-Means clustering work?
Start
Elbow point (k)
Reposition the
centroids
Grouping based on
minimum distance
Measure the distance
Convergence
- +
If clusters are
stable
If clusters are
unstable
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• Let’s say, you have a dataset for a Grocery shop
• Now, the important question is, “how would you choose the optimum
number of clusters?“
?
c
1
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• The best way to do this is by Elbow method
• The idea of the elbow method is to run K-Means clustering on the
dataset where ‘k’ is referred as number of clusters
• Within sum of squares (WSS) is defined as the sum of the squared distance
between each member of the cluster and its centroid
𝑖=1
𝑚
)𝑥𝑖
2
WSS = (
Where x𝑖 = data point and c𝑖 = closest point to centroid
− 𝑐𝑖
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• Now, we draw a curve between WSS (within sum of squares) and the
number of clusters
• Here, we can see a very slow change in the value of WSS after k=2, so you should
take that elbow point value as the final number of clusters
Elbow pointWSS
No . of. clusters
k=2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 1: The given data points below are assumed as delivery points
c1
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 2: We can randomly initialize two points called the cluster centroids,
Euclidean distance is a distance measure used to find out which data point
is closest to our centroids
c1
c1
c2c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 3: Based upon the distance from c1 and c2 centroids, the data points will
group itself into clusters
c1
c1
c2c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 4: Compute the centroid of data points inside blue cluster
Step 5: Reposition the centroid of the blue cluster to the new centroid
c1
c1
c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 6: Now, compute the centroid of data points inside orange cluster
Step 7: Reposition the centroid of the orange cluster to the new centroid
c1
c1
c2
c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 8: Once the clusters become static, K-Means clustering algorithm is
said to be converged
c
1
c2
K-Means Clustering Algorithm
K-Means Clustering Algorithm
Assuming we have inputs x1,x2,x3,…, and value of K,
Step 1 : Pick K random points as cluster centers called centroids
Step 2 : Assign each xi to nearest cluster by calculating its distance to each centroid
Step 3 : Find new cluster center by taking the average of the assigned points
Step 4 : Repeat Step 2 and 3 until none of the cluster assignments change
K-Means Clustering Algorithm
Step 1 :
We randomly pick K cluster centers (centroids). Let’s assume these are c1,c2,…,ckc1,c2,…,ck, and we
can say that;
C is the set of all centroids.
Step 2:
In this step, we assign each data point to closest center, this is done by calculating Euclidean
distance
arg min dist ( ,x )2
Where dist() is the Euclidean distance.
𝑐𝑖
∈C𝑐𝑖
𝑐1 𝑐2 𝑐 𝑘C= , ,.…
|𝑆𝑖|
= 1 ∑
Step 3:
In this step, we find the new centroid by taking the average of all the points assigned to that
cluster.
is the set of all points assigned to the i th cluster
Step 4:
In this step, we repeat step 2 and 3 until none of the cluster assignments change
That means until our clusters remain stable, we repeat the algorithm
xi∈Si
𝑐𝑖 𝑥𝑖
𝑠𝑖
K-Means Clustering Algorithm
Demo: K-Means Clustering
Demo: K-Means Clustering
Problem Statement
• Walmart wants to open a chain of stores across Florida and wants to find out optimal store locations
to maximize revenue
Solution
• Walmart already has a strong e-commerce presence
• Walmart can use its online customer data to analyze the customer locations along with the monthly
sales
Demo: K-Means Clustering
%matplotlib inline
import matplotlib.pyplot as plt
# for plot styling
import seaborn as sns; sns.set()
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
Demo: K-Means Clustering
# output
Demo: K-Means Clustering
# assign four clusters
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
# import library
from sklearn.metrics import pairwise_distances_argmin
def find_clusters(X, n_clusters, rseed=2):
# 1. randomly choose clusters
rng = np.random.RandomState(rseed)
i = rng.permutation(X.shape[0])[:n_clusters]
centers = X[i]
while True:
Demo: K-Means Clustering
# 2. assign labels based on closest center
labels = pairwise_distances_argmin(X, centers)
# 3. find new centers from means of points
new_centers = np.array([X[labels == i].mean(0)
for i in range(n_clusters)])
centers, labels = find_clusters(X, 4)
plt.scatter(X[:, 0], X[:, 1], c=labels,
s=50, cmap='viridis’)
Demo: K-Means Clustering
# 4. check for convergence
if np.all(centers == new_centers):
break
centers = new_centers
return centers, labels
centers, labels = find_clusters(X, 4)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
Demo: K-Means Clustering
# output:
Conclusion
Congratulations!
We have demonstrated K-Means
clustering by establishing Walmart stores
across Florida in the most optimized way
Use case – Color compression
Use Case: K-Means for Color Compression
Problem Statement
To perform color compression on images using K-Means algorithm
Use Case: K-Means for Color Compression
# example 1:
# note: this requires the ``pillow`` package to be installed
from sklearn.datasets import load_sample_image
china = load_sample_image("flower.jpg")
ax = plt.axes(xticks=[], yticks=[])
ax.imshow(china);
#Output:
Use Case: K-Means for Color Compression
# returns the dimensions of the array
china.shape
# reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1
data = china / 255.0 # use 0...1 scale
data = data.reshape(427 * 640, 3)
data.shape
# visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency
def plot_pixels(data, title, colors=None, N=10000):
if colors is None:
colors = data
Use Case: K-Means for Color Compression
# choose a random subset
rng = np.random.RandomState(0)
i = rng.permutation(data.shape[0])[:N]
colors = colors[i]
R, G, B = data[i].T
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
ax[0].scatter(R, G, color=colors, marker='.')
ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1))
ax[1].scatter(R, B, color=colors, marker='.')
ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1))
fig.suptitle(title, size=20);
Use Case: K-Means for Color Compression
plot_pixels(data, title='Input color space: 16 million possible colors')
Use Case: K-Means for Color Compression
# fix numPy issues
import warnings; warnings.simplefilter('ignore’)
# reducing these 16 million colors to just 16 colors
from sklearn.cluster import MiniBatchKMeans
kmeans = MiniBatchKMeans(16)
kmeans.fit(data)
new_colors = kmeans.cluster_centers_[kmeans.predict(data)]
plot_pixels(data, colors=new_colors,
title="Reduced color space: 16 colors")
Use Case: K-Means for Color Compression
china_recolored = new_colors.reshape(china.shape)
fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[]))
fig.subplots_adjust(wspace=0.05)
ax[0].imshow(china)
ax[0].set_title('Original Image', size=16)
ax[1].imshow(china_recolored)
ax[1].set_title('16-color Image', size=16);
# the result is re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center
# output:
Use Case: K-Means for Color Compression
# output
Use Case: K-Means for Color Compression
# example 2:
from sklearn.datasets import load_sample_image
china = load_sample_image(“china.jpg")
ax = plt.axes(xticks=[], yticks=[])
ax.imshow(china);
Use Case: K-Means for Color Compression
# returns the dimensions of the array
china.shape
# reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1
data = china / 255.0 # use 0...1 scale
data = data.reshape(427 * 640, 3)
data.shape
# visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency
def plot_pixels(data, title, colors=None, N=10000):
if colors is None:
colors = data
Use Case: K-Means for Color Compression
# choose a random subset
rng = np.random.RandomState(0)
i = rng.permutation(data.shape[0])[:N]
colors = colors[i]
R, G, B = data[i].T
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
ax[0].scatter(R, G, color=colors, marker='.')
ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1))
ax[1].scatter(R, B, color=colors, marker='.')
ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1))
fig.suptitle(title, size=20);
Use Case: K-Means for Color Compression
plot_pixels(data, title='Input color space: 16 million possible colors')
Use Case: K-Means for Color Compression
# fix NumPy issues
import warnings; warnings.simplefilter('ignore’)
# reducing these 16 million colors to just 16 colors
from sklearn.cluster import MiniBatchKMeans
kmeans = MiniBatchKMeans(16)
kmeans.fit(data)
new_colors = kmeans.cluster_centers_[kmeans.predict(data)]
plot_pixels(data, colors=new_colors,
title="Reduced color space: 16 colors")
Use Case: K-Means for Color Compression
china_recolored = new_colors.reshape(china.shape)
fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[]))
fig.subplots_adjust(wspace=0.05)
ax[0].imshow(china)
ax[0].set_title('Original Image', size=16)
ax[1].imshow(china_recolored)
ax[1].set_title('16-color Image', size=16);
# the result is a re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center
# output
Use Case: K-Means for Color Compression
# output
Conclusion
Congratulations!
We have demonstrated
K-Means in color compression.
The hands on example will help
you to encounter any K-Means
project in future.
Key Takeaways
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn

More Related Content

What's hot (20)

PPT
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
PPTX
K MEANS CLUSTERING
singh7599
 
PDF
K means Clustering
Edureka!
 
PPTX
K-means clustering algorithm
Vinit Dantkale
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Kmeans
Nikita Goyal
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PPTX
Unsupervised learning
amalalhait
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PPTX
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
PPTX
Machine Learning
Girish Khanzode
 
PPTX
K-means Clustering
Anna Fensel
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PPT
K mean-clustering
Afzaal Subhani
 
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
PPTX
K means clustring @jax
Yaduvanshi Yadav
 
PDF
Clustering
Rashmi Bhat
 
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
K MEANS CLUSTERING
singh7599
 
K means Clustering
Edureka!
 
K-means clustering algorithm
Vinit Dantkale
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Kmeans
Nikita Goyal
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Unsupervised learning
amalalhait
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Machine Learning
Girish Khanzode
 
K-means Clustering
Anna Fensel
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
K mean-clustering
Afzaal Subhani
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
K means clustring @jax
Yaduvanshi Yadav
 
Clustering
Rashmi Bhat
 

Similar to K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn (20)

PPT
k-mean-clustering big data analaysis.ppt
abikishor767
 
PPT
K mean clustering algorithm unsupervised learning
namansingh302004
 
PPT
k-mean-clustering for data classification
KantilalRane1
 
PPT
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
PPTX
partitioning methods in data mining .pptx
BodhanLaxman1
 
PPT
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
PDF
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
PDF
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
PPTX
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
PDF
kmeans.pdfbgfyih bhdsey jbct gtfryjvftyjhgtuugft
RahinTamboli
 
DOCX
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
PPTX
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
PPTX
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
PDF
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
PPTX
Machine learning session9(clustering)
Abhimanyu Dwivedi
 
PPT
K mean-clustering
PVP College
 
PPTX
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
rinehi3578
 
PPTX
K-Means Clustering Algorithm.pptx
JebaRaj26
 
PPTX
K Means Clustering in ML.pptx
Ramakrishna Reddy Bijjam
 
PPT
k-mean-clustering.ppt
RanimeLoutar
 
k-mean-clustering big data analaysis.ppt
abikishor767
 
K mean clustering algorithm unsupervised learning
namansingh302004
 
k-mean-clustering for data classification
KantilalRane1
 
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
partitioning methods in data mining .pptx
BodhanLaxman1
 
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
kmeans.pdfbgfyih bhdsey jbct gtfryjvftyjhgtuugft
RahinTamboli
 
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Machine learning session9(clustering)
Abhimanyu Dwivedi
 
K mean-clustering
PVP College
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
rinehi3578
 
K-Means Clustering Algorithm.pptx
JebaRaj26
 
K Means Clustering in ML.pptx
Ramakrishna Reddy Bijjam
 
k-mean-clustering.ppt
RanimeLoutar
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PDF
Android Programming - Basics of Mobile App, App tools and Android Basics
Kavitha P.V
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
PPTX
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
PPTX
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
PPTX
Difference between write and update in odoo 18
Celine George
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
Light Reflection and Refraction- Activities - Class X Science
SONU ACADEMY
 
PDF
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
PDF
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PPTX
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
PDF
Council of Chalcedon Re-Examined
Smiling Lungs
 
PDF
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
PDF
IMPORTANT GUIDELINES FOR M.Sc.ZOOLOGY DISSERTATION
raviralanaresh2
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PPTX
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
PPTX
SD_GMRC5_Session 6AB_Dulog Pedagohikal at Pagtataya (1).pptx
NickeyArguelles
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
Android Programming - Basics of Mobile App, App tools and Android Basics
Kavitha P.V
 
infertility, types,causes, impact, and management
Ritu480198
 
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
Difference between write and update in odoo 18
Celine George
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Light Reflection and Refraction- Activities - Class X Science
SONU ACADEMY
 
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
Council of Chalcedon Re-Examined
Smiling Lungs
 
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
Controller Request and Response in Odoo18
Celine George
 
IMPORTANT GUIDELINES FOR M.Sc.ZOOLOGY DISSERTATION
raviralanaresh2
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
SD_GMRC5_Session 6AB_Dulog Pedagohikal at Pagtataya (1).pptx
NickeyArguelles
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn

  • 2. What is K-Means Clustering? What is K-Means Clustering?
  • 3. What is K-Means Clustering? k-means performs division of objects into clusters which are “similar” between them and are “dissimilar” to the objects belonging to another cluster What is K-Means Clustering?
  • 4. What is K-Means Clustering? Can you explain this with an example?
  • 5. What is K-Means Clustering? Sure. For understanding K-Means in a better way, let’s take an example of Cricket Can you explain this with an example?
  • 6. What is K-Means Clustering? Task: Identify bowlers and batsmen
  • 7. What is K-Means Clustering? Task: Identify bowlers and batsmen  The data contains runs and wickets gained in the last 10 matches  So, the bowler will have more wickets and the batsmen will have higher runs Scores
  • 8. What is K-Means Clustering? Assign data points Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets
  • 9. What is K-Means Clustering? Lorem ipsum Cluster 1Assign data points Lorem ipsum We can see that this cluster has players with high runs and low wickets Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets Runs Wickets
  • 10. What is K-Means Clustering? And here, we can see that this cluster has players with high wickets and low wickets Lorem ipsum Cluster 1 Cluster 2Assign data points Lorem ipsumLorem ipsum We can see that this cluster has players with high runs and low wickets Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets Runs Wickets Runs Wickets
  • 11. What is K-Means Clustering? Consider the same data set of cricket Solve the problem using K-Means
  • 12. What is K-Means Clustering? Initially, two centroids are assigned randomly Euclidean distance to find out which centroid is closest to each data point and the data points are assigned to the corresponding centroids
  • 13. What is K-Means Clustering? Reposition the two centroids for optimization.
  • 14. What is K-Means Clustering? The process is iteratively repeated until our centroids become static
  • 15. What’s in it for you? Types of Clustering What is K-Means Clustering? Applications of K-Means clustering Common distance measure How does K-Means clustering work? K-Means Clustering Algorithm Demo: K-Means Clustering Use Case: Color Compression
  • 17. Types of Clustering Clustering Hierarchical Clustering Agglomerative Divisive Partitional Clustering K-Means Fuzzy C-Means
  • 18. Types of Clustering Clustering Hierarchical Clustering Division Clusters have a tree like structure or a parent child relationship
  • 19. Types of Clustering Clustering Hierarchical Clustering Agglomerative Divisive a b c fd e debc def bcdef abcdef “Bottom up" approach: Begin with each element as a separate cluster and merge them into successively larger clusters
  • 20. Types of Clustering “Top down“ approach begin with the whole set and proceed to divide it into successively smaller clusters. a b c fd e de def bcdef abcdef bc Clustering Hierarchical Clustering Agglomerative Divisive
  • 21. Types of Clustering Clustering Partitional Clustering K-Means Fuzzy C-Means c1 c2 Division of objects into clusters such that each object is in exactly one cluster, not several
  • 22. Types of Clustering Clustering Partitional Clustering K-Means Fuzzy C-Means Division of objects into clusters such that each object can belong to multiple clusters c2c1
  • 24. Applications of K-Means Clustering Academic Performance Wireless Sensor Network's Diagnostic Systems Search Engines
  • 26. Distance Measure Euclidean distance measure Manhattan distance measure Squared Euclidean distance measure Cosine distance measure Distance measure will determine the similarity between two elements and it will influence the shape of the clusters
  • 27. Euclidean Distance Measure • The Euclidean distance is the "ordinary" straight line • It is the distance between two points in Euclidean space d=√ 𝑖=1 𝑛 ( 𝑞𝑖− )2 p q Euclidian Distance 𝑝𝑖 Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04
  • 28. Squared Euclidean Distance Measure The Euclidean squared distance metric uses the same equation as the Euclidean distance metric, but does not take the square root. d= 𝑖=1 𝑛 ( 𝑞𝑖− )2 𝑝𝑖 Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04
  • 29. Manhattan Distance Measure Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04 The Manhattan distance is the simple sum of the horizontal and vertical components or the distance between two points measured along axes at right angles d= 𝑖=1 𝑛 | 𝑞 𝑥− | p q Manhattan Distance 𝑝 𝑥 +|𝑞 𝑥− |𝑝 𝑦 (x,y) (x,y)
  • 30. Cosine Distance Measure Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04 The cosine distance similarity measures the angle between the two vectors p q Cosine Distance 𝑖=0 𝑛−1 𝑞𝑖− 𝑖=0 𝑛−1 (𝑞𝑖)2 × 𝑖=0 𝑛−1 (𝑝𝑖)2 d= 𝑝 𝑥
  • 31. How does K-Means clustering work?
  • 32. How does K-Means clustering work? Start Elbow point (k) Reposition the centroids Grouping based on minimum distance Measure the distance Convergence - + If clusters are stable If clusters are unstable
  • 33. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • Let’s say, you have a dataset for a Grocery shop • Now, the important question is, “how would you choose the optimum number of clusters?“ ? c 1
  • 34. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • The best way to do this is by Elbow method • The idea of the elbow method is to run K-Means clustering on the dataset where ‘k’ is referred as number of clusters • Within sum of squares (WSS) is defined as the sum of the squared distance between each member of the cluster and its centroid 𝑖=1 𝑚 )𝑥𝑖 2 WSS = ( Where x𝑖 = data point and c𝑖 = closest point to centroid − 𝑐𝑖
  • 35. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • Now, we draw a curve between WSS (within sum of squares) and the number of clusters • Here, we can see a very slow change in the value of WSS after k=2, so you should take that elbow point value as the final number of clusters Elbow pointWSS No . of. clusters k=2
  • 36. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 1: The given data points below are assumed as delivery points c1
  • 37. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 2: We can randomly initialize two points called the cluster centroids, Euclidean distance is a distance measure used to find out which data point is closest to our centroids c1 c1 c2c 1 c2
  • 38. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 3: Based upon the distance from c1 and c2 centroids, the data points will group itself into clusters c1 c1 c2c 1 c2
  • 39. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 4: Compute the centroid of data points inside blue cluster Step 5: Reposition the centroid of the blue cluster to the new centroid c1 c1 c 1 c2
  • 40. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 6: Now, compute the centroid of data points inside orange cluster Step 7: Reposition the centroid of the orange cluster to the new centroid c1 c1 c2 c 1 c2
  • 41. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 8: Once the clusters become static, K-Means clustering algorithm is said to be converged c 1 c2
  • 43. K-Means Clustering Algorithm Assuming we have inputs x1,x2,x3,…, and value of K, Step 1 : Pick K random points as cluster centers called centroids Step 2 : Assign each xi to nearest cluster by calculating its distance to each centroid Step 3 : Find new cluster center by taking the average of the assigned points Step 4 : Repeat Step 2 and 3 until none of the cluster assignments change
  • 44. K-Means Clustering Algorithm Step 1 : We randomly pick K cluster centers (centroids). Let’s assume these are c1,c2,…,ckc1,c2,…,ck, and we can say that; C is the set of all centroids. Step 2: In this step, we assign each data point to closest center, this is done by calculating Euclidean distance arg min dist ( ,x )2 Where dist() is the Euclidean distance. 𝑐𝑖 ∈C𝑐𝑖 𝑐1 𝑐2 𝑐 𝑘C= , ,.…
  • 45. |𝑆𝑖| = 1 ∑ Step 3: In this step, we find the new centroid by taking the average of all the points assigned to that cluster. is the set of all points assigned to the i th cluster Step 4: In this step, we repeat step 2 and 3 until none of the cluster assignments change That means until our clusters remain stable, we repeat the algorithm xi∈Si 𝑐𝑖 𝑥𝑖 𝑠𝑖 K-Means Clustering Algorithm
  • 47. Demo: K-Means Clustering Problem Statement • Walmart wants to open a chain of stores across Florida and wants to find out optimal store locations to maximize revenue Solution • Walmart already has a strong e-commerce presence • Walmart can use its online customer data to analyze the customer locations along with the monthly sales
  • 48. Demo: K-Means Clustering %matplotlib inline import matplotlib.pyplot as plt # for plot styling import seaborn as sns; sns.set() import numpy as np from sklearn.datasets.samples_generator import make_blobs X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) plt.scatter(X[:, 0], X[:, 1], s=50);
  • 50. Demo: K-Means Clustering # assign four clusters from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=4) kmeans.fit(X) y_kmeans = kmeans.predict(X) # import library from sklearn.metrics import pairwise_distances_argmin def find_clusters(X, n_clusters, rseed=2): # 1. randomly choose clusters rng = np.random.RandomState(rseed) i = rng.permutation(X.shape[0])[:n_clusters] centers = X[i] while True:
  • 51. Demo: K-Means Clustering # 2. assign labels based on closest center labels = pairwise_distances_argmin(X, centers) # 3. find new centers from means of points new_centers = np.array([X[labels == i].mean(0) for i in range(n_clusters)]) centers, labels = find_clusters(X, 4) plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis’)
  • 52. Demo: K-Means Clustering # 4. check for convergence if np.all(centers == new_centers): break centers = new_centers return centers, labels centers, labels = find_clusters(X, 4) plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis') plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
  • 53. Demo: K-Means Clustering # output: Conclusion Congratulations! We have demonstrated K-Means clustering by establishing Walmart stores across Florida in the most optimized way
  • 54. Use case – Color compression
  • 55. Use Case: K-Means for Color Compression Problem Statement To perform color compression on images using K-Means algorithm
  • 56. Use Case: K-Means for Color Compression # example 1: # note: this requires the ``pillow`` package to be installed from sklearn.datasets import load_sample_image china = load_sample_image("flower.jpg") ax = plt.axes(xticks=[], yticks=[]) ax.imshow(china); #Output:
  • 57. Use Case: K-Means for Color Compression # returns the dimensions of the array china.shape # reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1 data = china / 255.0 # use 0...1 scale data = data.reshape(427 * 640, 3) data.shape # visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency def plot_pixels(data, title, colors=None, N=10000): if colors is None: colors = data
  • 58. Use Case: K-Means for Color Compression # choose a random subset rng = np.random.RandomState(0) i = rng.permutation(data.shape[0])[:N] colors = colors[i] R, G, B = data[i].T fig, ax = plt.subplots(1, 2, figsize=(16, 6)) ax[0].scatter(R, G, color=colors, marker='.') ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1)) ax[1].scatter(R, B, color=colors, marker='.') ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1)) fig.suptitle(title, size=20);
  • 59. Use Case: K-Means for Color Compression plot_pixels(data, title='Input color space: 16 million possible colors')
  • 60. Use Case: K-Means for Color Compression # fix numPy issues import warnings; warnings.simplefilter('ignore’) # reducing these 16 million colors to just 16 colors from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(16) kmeans.fit(data) new_colors = kmeans.cluster_centers_[kmeans.predict(data)] plot_pixels(data, colors=new_colors, title="Reduced color space: 16 colors")
  • 61. Use Case: K-Means for Color Compression china_recolored = new_colors.reshape(china.shape) fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[])) fig.subplots_adjust(wspace=0.05) ax[0].imshow(china) ax[0].set_title('Original Image', size=16) ax[1].imshow(china_recolored) ax[1].set_title('16-color Image', size=16); # the result is re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center # output:
  • 62. Use Case: K-Means for Color Compression # output
  • 63. Use Case: K-Means for Color Compression # example 2: from sklearn.datasets import load_sample_image china = load_sample_image(“china.jpg") ax = plt.axes(xticks=[], yticks=[]) ax.imshow(china);
  • 64. Use Case: K-Means for Color Compression # returns the dimensions of the array china.shape # reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1 data = china / 255.0 # use 0...1 scale data = data.reshape(427 * 640, 3) data.shape # visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency def plot_pixels(data, title, colors=None, N=10000): if colors is None: colors = data
  • 65. Use Case: K-Means for Color Compression # choose a random subset rng = np.random.RandomState(0) i = rng.permutation(data.shape[0])[:N] colors = colors[i] R, G, B = data[i].T fig, ax = plt.subplots(1, 2, figsize=(16, 6)) ax[0].scatter(R, G, color=colors, marker='.') ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1)) ax[1].scatter(R, B, color=colors, marker='.') ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1)) fig.suptitle(title, size=20);
  • 66. Use Case: K-Means for Color Compression plot_pixels(data, title='Input color space: 16 million possible colors')
  • 67. Use Case: K-Means for Color Compression # fix NumPy issues import warnings; warnings.simplefilter('ignore’) # reducing these 16 million colors to just 16 colors from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(16) kmeans.fit(data) new_colors = kmeans.cluster_centers_[kmeans.predict(data)] plot_pixels(data, colors=new_colors, title="Reduced color space: 16 colors")
  • 68. Use Case: K-Means for Color Compression china_recolored = new_colors.reshape(china.shape) fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[])) fig.subplots_adjust(wspace=0.05) ax[0].imshow(china) ax[0].set_title('Original Image', size=16) ax[1].imshow(china_recolored) ax[1].set_title('16-color Image', size=16); # the result is a re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center # output
  • 69. Use Case: K-Means for Color Compression # output Conclusion Congratulations! We have demonstrated K-Means in color compression. The hands on example will help you to encounter any K-Means project in future.

Editor's Notes