0% found this document useful (0 votes)

21 views16 pages

K Mean Cluster Analysis

Uploaded by

Nilesh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views16 pages

K Mean Cluster Analysis

Uploaded by

Nilesh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Cluster Analysis

(K-means)
•Presenter: Vikas Dubey
Introduction:

1. Cluster analysis is a group of segmentation techniques designed to

classify respondents into groups or segments called clusters.
2. Cluster analysis or clustering is the task of grouping a set of objects
in such a way that objects in the same group (called a cluster) are
more similar (in some sense or other) to each other than to those in
other groups (clusters).
3. Clustering algorithms segment records minimizing within-cluster
variance and maximizing between cluster variation.
Objective of Cluster Analysis

• The main cluster analysis objective is to address the heterogeneity in

each set of data. The other cluster analysis objectives are
• Taxonomy description – Identifying groups within the data
• Data simplification – The ability to analyze groups of similar
observations instead of all individual observation
• Hypothesis generation or testing – Develop hypothesis based on the
nature of the data or to test the previously stated hypothesis
• Relationship Identification – The simplified structure from cluster
analysis that describes the relationships
Introduction of k-means

• An algorithm for partitioning (or clustering) N data points into K

disjoint subsets Sj containing data points so as to minimize the sum-
of-squares criterion

where xn is a vector representing the the n th data point and uj is the

geometric centroid or cluster centers of the data points in S j.
Introduction of k-means (Continue...)

• Simply speaking k-means clustering is an algorithm to classify or to

group the objects based on attributes/features into K number of group.
• K is positive integer number.
• The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid.
Basic Data Requirement

• K Means can be only applied to continuous data or Interval data

• Homogeneity variance within cluster and heterogeneity variance
between cluster.
• It assumes that you have selected the appropriate no. of clusters and
have included all relevant variables.
• The sample size required for a K-means cluster depends on the
expected number of resulting clusters (each final group should have a
representative base size).
• It is best to include a variety of statements so it’s easier to identify
different groups
Steps of K-means:
Step 1: Decide number of cluster i.e., the value of K.
Step 2: Select Initial cluster centres.
Step 3: Find the distance between each of the data points from all cluster
centres. Assign the data points to the cluster where it has minimum
distance.
Step 4: Update the Cluster centres by taking mean of the data points that falls
under that respective cluster.
Step 5: Repeat step three and four till the movement of object from one cluster
to another cluster stops.
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids (k=2) for two
clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).

•
Individual Variable 1 Variable 2
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Step 2:

• Now we calculate distance between each observation and centroid.

• The formula used is Euclidean distance given as,

• Some calculations are,

Individual Variable 1 Variable 2
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Continue..
Distances from Distances from
Individual Centroid 1 Centroid 2
1 0 7.21
2 1.12 6.1
3 3.61 3.61
4 7.21 0
5 4.72 2.5
6 5.31 2.06
7 4.3 2.92

• Thus, we obtain two clusters containing:

{1,2,3} and {4,5,6,7}.
Continue…
Individual Variable 1 Variable 2
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5

Their new centroids are,

Continue..
Step 3: Individual Distance from Distance from
Centroid 1 Centroid 2
• Now using these centroids we 1 1.57 5.38
compute the Euclidean distance of
each object, as shown in table. 2 0.47 4.28

3 2.04 1.78
• Therefore, the new clusters are:
4 5.64 1.84
{1,2} and {3,4,5,6,7}
5 3.15 0.73

• Next centroids are: m1=(1.25,1.5) 6 3.78 0.54

and m2 = (3.9,5.1) 7 2.74 1.08

Continue..
• Step 4 :
Individual Distance from Distance from
The clusters obtained are: Centroid 1 Centroid 2
{1,2} and {3,4,5,6,7} 1 0.56 5.02

• Therefore, there is no change in the 2 0.56 3.92

cluster.
3 3.05 1.42
• Thus, the algorithm comes to a halt
here and final result consist of 2 4 6.66 2.20
clusters {1,2} and {3,4,5,6,7}.
5 4.16 0.41

6 4.78 0.61

7 3.75 0.72
Weaknesses of K-Mean Clustering

1. When the base size is low, initial grouping will determine

the cluster significantly.
2. The number of cluster, K, must be determined before hand.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may produce
different cluster if the number of data is few.
Any
Questions…..

Pages From ASHRAE 62.1-2010
100% (1)
Pages From ASHRAE 62.1-2010
3 pages
Clustering
No ratings yet
Clustering
18 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K MEANS
No ratings yet
K MEANS
40 pages
kmea
No ratings yet
kmea
53 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
K Mean
No ratings yet
K Mean
12 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
1-Kmeans
No ratings yet
1-Kmeans
13 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Kmean
No ratings yet
Kmean
24 pages
K Mean
No ratings yet
K Mean
7 pages
KMeans_Clustering
No ratings yet
KMeans_Clustering
11 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
algo
No ratings yet
algo
59 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Pilot
No ratings yet
Pilot
3 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
2875 27398 1 SP
No ratings yet
2875 27398 1 SP
4 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
K-Means Clustering-converted-merged
No ratings yet
K-Means Clustering-converted-merged
76 pages
Clustering
No ratings yet
Clustering
125 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Clustring Data Mining
No ratings yet
Clustring Data Mining
21 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
Kmeans
No ratings yet
Kmeans
6 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
K_Means_Clustering_Report
No ratings yet
K_Means_Clustering_Report
3 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
K Means
No ratings yet
K Means
33 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
Simplified College Algebra
From Everand
Simplified College Algebra
Sachin Nambeesan
No ratings yet
Applications of Finite Mathematics
From Everand
Applications of Finite Mathematics
Gautami Devar
No ratings yet
Ma 101 Course Guide
No ratings yet
Ma 101 Course Guide
2 pages
Deploying The Best Model in A Few Minutes With Bentoml and Mlflow
No ratings yet
Deploying The Best Model in A Few Minutes With Bentoml and Mlflow
11 pages
United States v. Norman L. Talley, 4th Cir. (2011)
No ratings yet
United States v. Norman L. Talley, 4th Cir. (2011)
6 pages
DCA For Production Forecasting Base On Machine Learning
No ratings yet
DCA For Production Forecasting Base On Machine Learning
14 pages
Ciare Hm130 en
No ratings yet
Ciare Hm130 en
1 page
OB2 Session4
No ratings yet
OB2 Session4
11 pages
About Oracle Tutor 12
No ratings yet
About Oracle Tutor 12
7 pages
Supreme Court Judgement.
No ratings yet
Supreme Court Judgement.
27 pages
Pecpr Guide2007 Colour
No ratings yet
Pecpr Guide2007 Colour
18 pages
Visa Requirments Saudi Arabia
No ratings yet
Visa Requirments Saudi Arabia
3 pages
SASB Investment - Banking - Brokerage - Standard - 2018
No ratings yet
SASB Investment - Banking - Brokerage - Standard - 2018
31 pages
Advanced Internet Technologies, Inc. v. Google, Inc. - Document No. 45
No ratings yet
Advanced Internet Technologies, Inc. v. Google, Inc. - Document No. 45
2 pages
Lecture 05
No ratings yet
Lecture 05
40 pages
CH 111
No ratings yet
CH 111
4 pages
Petronas Syntium 5000 CP 5w-30 70606e18eu 70606m12eu
No ratings yet
Petronas Syntium 5000 CP 5w-30 70606e18eu 70606m12eu
2 pages
Installation Instructions: LRM1070, LRM1080
No ratings yet
Installation Instructions: LRM1070, LRM1080
2 pages
Bud Dipole in The Field 2
100% (2)
Bud Dipole in The Field 2
170 pages
001,Internal,Report,Durgapur_P-02-0188,4500026033
No ratings yet
001,Internal,Report,Durgapur_P-02-0188,4500026033
1 page
FUNDAMENTAL OF ACCOUNTING I - INDIVIDUAL ASSIGNMENT (2)
No ratings yet
FUNDAMENTAL OF ACCOUNTING I - INDIVIDUAL ASSIGNMENT (2)
10 pages
Women's Twist Bust Tie-Back Skort Dress Women's Clearance HollisterCo.com
No ratings yet
Women's Twist Bust Tie-Back Skort Dress Women's Clearance HollisterCo.com
1 page
Motion For Publication
No ratings yet
Motion For Publication
2 pages
Output Equation of DC Machines.
0% (1)
Output Equation of DC Machines.
5 pages
GLICO LIFE ANNUAL REPORT - Web
No ratings yet
GLICO LIFE ANNUAL REPORT - Web
57 pages
2019LLB052 - 1PC I - 3rd Sem
No ratings yet
2019LLB052 - 1PC I - 3rd Sem
16 pages
Guidance Protocol and SAP Submission Process Overview
No ratings yet
Guidance Protocol and SAP Submission Process Overview
14 pages
Essentials of Corporate Finance Homework Solutions
100% (1)
Essentials of Corporate Finance Homework Solutions
4 pages
Tracy Chapman Complete Piano Chant
100% (5)
Tracy Chapman Complete Piano Chant
90 pages
Ratiba Ya Mitihani December 2024-1
No ratings yet
Ratiba Ya Mitihani December 2024-1
10 pages
Divisibility Rules For 3, 6 and 9 (3 Digit Numbers) (A)
No ratings yet
Divisibility Rules For 3, 6 and 9 (3 Digit Numbers) (A)
1 page

K Mean Cluster Analysis

Uploaded by

K Mean Cluster Analysis

Uploaded by

Cluster Analysis

1. Cluster analysis is a group of segmentation techniques designed to

• The main cluster analysis objective is to address the heterogeneity in

• An algorithm for partitioning (or clustering) N data points into K

where xn is a vector representing the the n th data point and uj is the

• Simply speaking k-means clustering is an algorithm to classify or to

• K Means can be only applied to continuous data or Interval data

• Now we calculate distance between each observation and centroid.

• Some calculations are,

• Thus, we obtain two clusters containing:

Their new centroids are,

• Next centroids are: m1=(1.25,1.5) 6 3.78 0.54

and m2 = (3.9,5.1) 7 2.74 1.08

• Therefore, there is no change in the 2 0.56 3.92

1. When the base size is low, initial grouping will determine

You might also like