04 LEC Data Science Kmeans

This document provides an outline for a lecture on K-means clustering and K-means based on MapReduce. It begins with an introduction to clustering techniques including partitioning, hierarchical, density-based, and grid-based methods. It then describes the standard K-means algorithm and provides a step-by-step example. Finally, it outlines how K-means can be implemented using MapReduce, including the map, combiner, and reduce functions.

Uploaded by

Viram Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views26 pages

04 LEC Data Science Kmeans

Uploaded by

Viram Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Science (CO314)

By:
Nidhi S. Periwal,
SVNIT, Surat.
Outline
• K-means clustering
• Reference book: “Data mining concepts and techniques 3rd edition”
by : Jiawei Han and Kamber
• K-means based on MapReduce
• Reference Material: Weizhong Zhao, “Parallel k-means clustering
based on MapReduce”, LNCS, Springer, 2009.
Clustering
• Process of grouping a set of data objects into multiple groups or
clusters so that objects within a cluster have high similarity, but a
very dissimilar to objects in other clusters.
• Clustering is also called data segmentation in some applications
because clustering partitions large data sets into groups according to
their similarity
• Used for outlier detection
Clustering
• Techniques of Clustering
• Partitioning methods
• Hierarchical methods
• Density-based methods
• Grid- based methods
Clustering
• Techniques of Clustering
• Partitioning methods
• K-means clustering
K-means Algorithm
• K-means clustering, first selects k objects from the whole objects
which represent initial cluster centers.
• Each remaining object is assigned to the cluster to which it is the
most similar, based on the distance between the object and the
cluster center.
• The new mean for each cluster is then calculated.
• Iterates last two steps until the criterion function converges.
K-means Algorithm
• K-means clustering, first selects k objects from the whole objects
which represent initial cluster centers.
• Eg:
• Data= {2,3,4,10,11,12,20,25,30}
• Let k=2
• Randomly select 4 and 12 as initial cluster value
• So, let m1 = 4 and m2 =12
K-means Algorithm
• K-means clustering, first selects k objects from the whole objects
which represent initial cluster centers.
• Eg: ATTRIBUTE
2
3
4
10
11
12
20
25
30
K-means Algorithm
• K-means clustering, first selects k objects from the whole objects
which represent initial cluster centers.
• Each remaining object is assigned to the cluster to which it is the
most similar, based on the distance between the object and the
cluster center.
K-means Algorithm •Dist1(2,4) = sqrt((2-4)2 )= 2
•Dist2(2,12) = sqrt((2-12)2 )= 10
• Data= {2} •Compare dist1 with dist2
•If dist1<dist2,
•put sample-I In Cluster1
•Else,
• put sample-I in Cluster2

Cluster1 with m1 = 4 Cluster2 with m2 = 12

2
K-means Algorithm
• Data= {2,3,4,10,11,12,20,25,30}

Cluster1 with m1 = 4 Cluster2 with m2 = 12

2 10
3 11
4 12
20
25
30
K-means Algorithm
• K-means clustering, first selects k objects from the whole objects
which represent initial cluster centers.
• Each remaining object is assigned to the cluster to which it is the
most similar, based on the distance between the object and the
cluster center.
• The new mean for each cluster is then calculated.
K-means Algorithm
• Data= {2,3,4,10,11,12,20,25,30}
Cluster2 with m2 = 12
Cluster1 with m1 = 4
10
2
11
3
12
4
20
25
30

m2 = 18
m1 = (2+3+4)/3 = 3
K-means Algorithm
• ITERATION 2 :
• Data= {2,3,4,10,11,12,20,25,30}

Cluster1 with m1 = 3
Cluster2 with m2 = 18
2
11
3
12
4
20
10
25
30

m1 = (2+3+4+10)/4 = m2 = 20
approx = 5
K-means Algorithm
• ITERATION 3 :
• Data= {2,3,4,10,11,12,20,25,30}

Cluster1 with m1 = 5
Cluster2 with m2 = 20
2
20
3
25
4
30
10
11
12
m2 = 25
m1 = 7
K-means Algorithm
• ITERATION 4 :
• Data= {2,3,4,10,11,12,20,25,30}

Cluster1 with m1 = 7
Cluster2 with m2 = 25
2
20
3
25
4
30
10
11
12
m2 = 25
m1 = 7
K-means Algorithm
• ITERATION 4 :
• Data= {2,3,4,10,11,12,20,25,30}

Cluster1 with m1 = 7
Cluster2 with m2 = 25
2
20
3
25
4
30
10
11
12

• As, after four iterations mean values are same, so final cluster are formed.
K-means with MapReduce
• The map function performs the procedure of assigning each sample
to the closest center
• The reduce function performs the procedure of updating the new
centers.
• To decrease the cost of network communication, a combiner
function is developed to deal with partial combination of the
intermediate values with the same key within the same map task
K-means with MapReduce
• The map function performs the procedure of assigning each sample
to the closest center
• The reduce function performs the procedure of updating the new
centers.
• To decrease the cost of network communication, a combiner
function is developed to deal with partial combination of the
intermediate values with the same key within the same map task
K-means with MapReduce
• Algo: Mapper
• Input: Global variable centers, the offset key, the sample value
• Output: <key’, value’> pair, where the key’ is the index of the closest
center point and value’ is a string comprise of sample information
K-means with MapReduce
Algo: Mapper
1.Construct the sample instance from value;
2. minDis = Double.MAX VALUE;
3. index = -1;
4. For i=0 to centers.length do
dis= ComputeDist(instance, centers[i]);
If dis < minDis {
minDis = dis;
Index = i;}
5. End For
6. Take index as key’;
7. Construct value’ as a string comprise of the values of different dimensions;
8. output < key, value> pair;
9. End
K-means with MapReduce
Algo: combiner
• Input: key is the index of the cluster, V is the list of the samples
assigned to the same cluster
• Output: < key, value > pair,
• where the key’ is the index of the cluster,
• value’ is a string comprised of sum of the samples in the same cluster and the
sample number
K-means with MapReduce
Algo: combiner
1.Initialize one array to record the sum of value of each dimensions of the
samples contained in the same cluster, i.e. the samples in the list V;
2. Initialize a counter num as 0 to record the sum of sample number in the
same cluster;
3. while(V.hasNext()){
• Construct the sample instance from V.next();
• Add the values of different dimensions of instance to the array
• num++; }
4. Take key as key’;
5. Construct value’ as a string comprised of the sum values of different
dimensions and num;
6. output < key , value> pair;
7. End
K-means with MapReduce
Algo: Reducer
• Input: key is the index of the cluster, V is the list of the partial sums from
different host
• Output: < key , value > pair,
• where the key’ is the index of the cluster,
• value’ is a string representing the new center
K-means with MapReduce
Algo: Reducer
1. Initialize one array record the sum of value of each dimensions of the samples
contained in the
same cluster, e.g. the samples in the list V;
2. Initialize a counter NUM as 0 to record the sum of sample number in the same
cluster;
3. while(V.hasNext()){
Construct the sample instance from V.next();
Add the values of different dimensions of instance to the array
NUM += num; }
4. Divide the entries of the array by NUM to get the new center’s coordinates;
5. Take key as key’;
6. Construct value’ as a string comprise of the center’s coordinates;
7. output < key, value > pair;
8. End
THANK YOU!!!

Sample Civil Complaint
100% (4)
Sample Civil Complaint
5 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
MapReduce Algorithms For K-Means Clustering
No ratings yet
MapReduce Algorithms For K-Means Clustering
11 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K Mean Clustering
No ratings yet
K Mean Clustering
24 pages
K Mean
No ratings yet
K Mean
7 pages
PART2
No ratings yet
PART2
61 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Week 10
No ratings yet
Week 10
41 pages
Kmean
No ratings yet
Kmean
24 pages
K-Means Mapreduce Example
No ratings yet
K-Means Mapreduce Example
33 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
42-Unsupervised Learning - k-means clustering-21-11-2024
No ratings yet
42-Unsupervised Learning - k-means clustering-21-11-2024
18 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
KNN VS Kmeans
No ratings yet
KNN VS Kmeans
3 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
K- Means Clustering
No ratings yet
K- Means Clustering
34 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
K-Means Clustering-converted-merged
No ratings yet
K-Means Clustering-converted-merged
76 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
Clustering
No ratings yet
Clustering
18 pages
ML-12
No ratings yet
ML-12
19 pages
K Mean
No ratings yet
K Mean
12 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Week 11
No ratings yet
Week 11
49 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
K Means
No ratings yet
K Means
23 pages
06. k Clustering
No ratings yet
06. k Clustering
28 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
kmea
No ratings yet
kmea
53 pages
Cui 2014
No ratings yet
Cui 2014
11 pages
K Mean Cluster Analysis
No ratings yet
K Mean Cluster Analysis
16 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
k Mean Clustering
No ratings yet
k Mean Clustering
19 pages
k-mean-clustering
No ratings yet
k-mean-clustering
18 pages
k Means Clustering
No ratings yet
k Means Clustering
29 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
K Means Example
No ratings yet
K Means Example
10 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Cast Certificate
No ratings yet
Cast Certificate
2 pages
l70 Aed5說明書swm8ae530 r
No ratings yet
l70 Aed5說明書swm8ae530 r
1 page
article_138964
No ratings yet
article_138964
5 pages
Unit I Graph Theory and concepts
No ratings yet
Unit I Graph Theory and concepts
35 pages
Code of Mechanical Engineering Ethics in The Philippines
No ratings yet
Code of Mechanical Engineering Ethics in The Philippines
13 pages
Feasibility Review Checklist: General Details
No ratings yet
Feasibility Review Checklist: General Details
4 pages
Partes Kits Senneboguen
No ratings yet
Partes Kits Senneboguen
3 pages
Atmaca Et Al 2021
No ratings yet
Atmaca Et Al 2021
17 pages
CR216 - Update The FE Margin of 6 New SKUs For Customers in Thailand
No ratings yet
CR216 - Update The FE Margin of 6 New SKUs For Customers in Thailand
3 pages
Monologue Design (Module 7)
No ratings yet
Monologue Design (Module 7)
2 pages
Perfetti
100% (1)
Perfetti
20 pages
2025 Essay Competition Guide0 2
No ratings yet
2025 Essay Competition Guide0 2
15 pages
QE140 Robotrac Screener Manual
No ratings yet
QE140 Robotrac Screener Manual
204 pages
SOP For Changeovers Carried Out in Manufacturing and Packing Area
50% (2)
SOP For Changeovers Carried Out in Manufacturing and Packing Area
2 pages
Performance Analysis of Offshore Floating PV Systems in Isolated Area
No ratings yet
Performance Analysis of Offshore Floating PV Systems in Isolated Area
5 pages
JMN (1) Merged
No ratings yet
JMN (1) Merged
22 pages
Resume
No ratings yet
Resume
1 page
Oisd Osr Booklet
No ratings yet
Oisd Osr Booklet
23 pages
Molecular Modelling USTH
No ratings yet
Molecular Modelling USTH
175 pages
HTB-CPTS-Course-Outline
No ratings yet
HTB-CPTS-Course-Outline
2 pages
Price List-Signature Villas
No ratings yet
Price List-Signature Villas
1 page
Iso 9692 1 2013
100% (1)
Iso 9692 1 2013
11 pages
Chapter 3 Glimpses of The Past
No ratings yet
Chapter 3 Glimpses of The Past
3 pages
13 Check Dams 2006
100% (1)
13 Check Dams 2006
35 pages
2 Marketing Plan Yoba Yoghurt
No ratings yet
2 Marketing Plan Yoba Yoghurt
11 pages
ENRTL-RK Rate Based PZ+MDEA Model
No ratings yet
ENRTL-RK Rate Based PZ+MDEA Model
41 pages
Ethiopian Aviation University PPT Oct 14-2024-ATR1
No ratings yet
Ethiopian Aviation University PPT Oct 14-2024-ATR1
16 pages
Iatf 16949 Clause To Eqms Module
100% (3)
Iatf 16949 Clause To Eqms Module
8 pages
Quizlet Practi
100% (1)
Quizlet Practi
21 pages

04 LEC Data Science Kmeans

Uploaded by

04 LEC Data Science Kmeans

Uploaded by

Data Science (CO314)

Cluster1 with m1 = 4 Cluster2 with m2 = 12

Cluster1 with m1 = 4 Cluster2 with m2 = 12

You might also like