SlideShare a Scribd company logo
K means Clustering
What is clustering?
• Clustering is the classification of objects into different groups, or
more precisely, the partitioning of a data set into subsets (clusters), so
that the data in each subset (ideally) share some common trait - often
according to some defined distance measure.
Types of clustering:
1. Hierarchical algorithms: these find successive clusters
using previously established clusters.
1. Agglomerative ("bottom-up"): Agglomerative algorithms
begin with each element as a separate cluster and merge them
into successively larger clusters.
2. Divisive ("top-down"): Divisive algorithms begin with the
whole set and proceed to divide it into successively smaller
clusters.
2. Partitional clustering: Partitional algorithms determine all clusters at
once. They include:
• K-means and derivatives
• Fuzzy c-means clustering
• QT clustering algorithm
Common Distance measures:
• Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
2. The Manhattan distance (also called taxicab norm or 1-norm) is
given by:
3.The maximum norm is given by:
4. The Mahalanobis distance corrects data for different
scales and correlations in the variables.
5. Inner product space: The angle between two vectors can
be used as a distance measure when clustering high
dimensional data
6. Hamming distance (sometimes edit distance) measures
the minimum number of substitutions required to change
one member into another.
K-MEANS CLUSTERING
• The k-means algorithm is an algorithm to cluster n objects
based on attributes into k partitions, where k < n.
• It is similar to the expectation-maximization algorithm for
mixtures of Gaussians in that they both attempt to find the
centers of natural clusters in the data.
• It assumes that the object attributes form a vector space.
• An algorithm for partitioning (or clustering) N data points into K
disjoint subsets Sj containing data points so as to minimize the sum-
of-squares criterion
where xn is a vector representing the the nth data point and uj is the
geometric centroid of the data points in Sj.
• Simply speaking k-means clustering is an algorithm to classify or to
group the objects based on attributes/features into K number of
group.
• K is positive integer number.
• The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid.
How the K-Mean Clustering algorithm works?
• Step 1: Begin with a decision on the value of k = number of
clusters .
• Step 2: Put any initial partition that classifies the data into k
clusters. You may assign the training samples randomly,or
systematically as the following:
1.Take the first k training sample as single- element clusters
2. Assign each of the remaining (N-k) training sample to the
cluster with the nearest centroid. After each assignment,
recompute the centroid of the gaining cluster.
• Step 3: Take each sample in sequence and compute its distance from the
centroid of each of the clusters. If a sample is not currently in the cluster with
the closest centroid, switch this sample to that cluster and update the centroid
of the cluster gaining the new sample and the cluster losing the sample.
• Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through
the training sample causes no new assignments.
Example 1
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial grouping
will determine the cluster significantly.
2. The number of cluster, K, must be determined before hand. Its
disadvantage is that it does not yield the same result with
each run, since the resulting clusters depend on the initial
random assignments.
3. We never know the real cluster, using the same data, because
if it is inputted in a different order it may produce different
cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm may be
trapped in the local optimum.
Applications of K-Mean Clustering
• It is relatively efficient and fast. It computes result at O(tkn),
where n is number of objects or points, k is number of
clusters and t is number of iterations.
• k-means clustering can be applied to machine learning or
data mining
• Used on acoustic data in speech understanding to convert
waveforms into one of k categories (known as Vector
Quantization or Image Segmentation).
• Also used for choosing color palettes on old fashioned
graphical display devices and Image Quantization.
CONCLUSION
• K-means algorithm is useful for undirected knowledge discovery and
is relatively simple. K-means has found wide spread usage in lot of
fields, ranging from unsupervised learning of neural network, Pattern
recognitions, Classification analysis, Artificial intelligence, image
processing, machine vision, and many others.

More Related Content

Similar to K means Clustering - algorithm to cluster n objects (20)

DOCX
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
PPT
K mean-clustering algorithm
parry prabhu
 
PPT
K mean-clustering
PVP College
 
PPT
k-mean-clustering (1) clustering topic explanation
my123lapto
 
PPT
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
PPT
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
PPT
k-mean-clustering algorithm with example.ppt
geethar79
 
PPT
k-mean-clustering.ppt
RanimeLoutar
 
PPTX
K means clustering
keshav goyal
 
PPTX
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPTX
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
PPTX
machine learning - Clustering in R
Sudhakar Chavan
 
PPTX
K-means Clustering
Anna Fensel
 
PPTX
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
PDF
[ML]-Unsupervised-learning_Unit2.ppt.pdf
4NM20IS025BHUSHANNAY
 
PPTX
Unsupervised Learning.pptx
GandhiMathy6
 
PDF
Training machine learning k means 2017
Iwan Sofana
 
PPTX
Detailed_KMeans_Unsupervised_Learning_Presentation.pptx
Mansi Sharma
 
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
K mean-clustering algorithm
parry prabhu
 
K mean-clustering
PVP College
 
k-mean-clustering (1) clustering topic explanation
my123lapto
 
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
k-mean-clustering algorithm with example.ppt
geethar79
 
k-mean-clustering.ppt
RanimeLoutar
 
K means clustering
keshav goyal
 
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
machine learning - Clustering in R
Sudhakar Chavan
 
K-means Clustering
Anna Fensel
 
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
[ML]-Unsupervised-learning_Unit2.ppt.pdf
4NM20IS025BHUSHANNAY
 
Unsupervised Learning.pptx
GandhiMathy6
 
Training machine learning k means 2017
Iwan Sofana
 
Detailed_KMeans_Unsupervised_Learning_Presentation.pptx
Mansi Sharma
 

Recently uploaded (20)

PPTX
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PDF
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PPTX
Alan Turing - life and importance for all of us now
Pedro Concejero
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PPTX
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
PPTX
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPT
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
Alan Turing - life and importance for all of us now
Pedro Concejero
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Ad

K means Clustering - algorithm to cluster n objects

  • 2. What is clustering? • Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
  • 3. Types of clustering: 1. Hierarchical algorithms: these find successive clusters using previously established clusters. 1. Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters. 2. Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters. 2. Partitional clustering: Partitional algorithms determine all clusters at once. They include: • K-means and derivatives • Fuzzy c-means clustering • QT clustering algorithm
  • 4. Common Distance measures: • Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters. They include: 1. The Euclidean distance (also called 2-norm distance) is given by: 2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
  • 5. 3.The maximum norm is given by: 4. The Mahalanobis distance corrects data for different scales and correlations in the variables. 5. Inner product space: The angle between two vectors can be used as a distance measure when clustering high dimensional data 6. Hamming distance (sometimes edit distance) measures the minimum number of substitutions required to change one member into another.
  • 6. K-MEANS CLUSTERING • The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n. • It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. • It assumes that the object attributes form a vector space.
  • 7. • An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing data points so as to minimize the sum- of-squares criterion where xn is a vector representing the the nth data point and uj is the geometric centroid of the data points in Sj.
  • 8. • Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group. • K is positive integer number. • The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.
  • 9. How the K-Mean Clustering algorithm works?
  • 10. • Step 1: Begin with a decision on the value of k = number of clusters . • Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly,or systematically as the following: 1.Take the first k training sample as single- element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the gaining cluster.
  • 11. • Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample. • Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.
  • 19. Weaknesses of K-Mean Clustering 1. When the numbers of data are not so many, initial grouping will determine the cluster significantly. 2. The number of cluster, K, must be determined before hand. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. 3. We never know the real cluster, using the same data, because if it is inputted in a different order it may produce different cluster if the number of data is few. 4. It is sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
  • 20. Applications of K-Mean Clustering • It is relatively efficient and fast. It computes result at O(tkn), where n is number of objects or points, k is number of clusters and t is number of iterations. • k-means clustering can be applied to machine learning or data mining • Used on acoustic data in speech understanding to convert waveforms into one of k categories (known as Vector Quantization or Image Segmentation). • Also used for choosing color palettes on old fashioned graphical display devices and Image Quantization.
  • 21. CONCLUSION • K-means algorithm is useful for undirected knowledge discovery and is relatively simple. K-means has found wide spread usage in lot of fields, ranging from unsupervised learning of neural network, Pattern recognitions, Classification analysis, Artificial intelligence, image processing, machine vision, and many others.