0% found this document useful (0 votes)

58 views

Storage Technologies: Digital Assignment 1

The document discusses three papers related to improving the k-means clustering algorithm. The first paper proposes finding initial centroids by separating data points that do not change clusters from those that do, to reduce workload for large datasets. The second paper presents a modified k-means algorithm that uses checkpoint values between cluster centers to determine if a data point should be reassigned. The third paper improves initial centroid selection and accuracy by systematically calculating distances between data points.

Uploaded by

Yash Pawar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Storage Technologies: Digital Assignment 1

Uploaded by

Yash Pawar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DA1 1

STORAGE
TECHNOLOGIES
ITE2009

DIGITAL ASSIGNMENT 1
SUMIT PATIL
SLOT: D1+TD2
FACULTY BHAVANI S
DA1 2

PAPER 1
A Modified K-Means Algorithm for Big Data Clustering
SK Ahammad Fahad, Md. Mahbub Alam2 IBAIS University, Dhaka, Bangladesh
DUET, Dhaka, Bangladesh

Measure of information is getting greater in each minute and this information

originates from all over; web-based social networking, sensors, web crawlers, GPS
signals, exchange records, satellites, money related markets, web based business
destinations and so forth. This expansive volume of information might be semi-
organized, unstructured or even organized. So it is essential to get significant data
from this gigantic informational index. Bunching is the procedure to arrange
information to such an extent that information are gathered in a similar group
when they are comparable as per particular measurements. In this paper, we are
chipping away at k-mean bunching strategy to group enormous information. A
few techniques have been proposed for enhancing the execution of the k-implies
grouping calculation. We propose a strategy for making the calculation less
tedious, more powerful and proficient for better bunching with decreased many-
sided quality. As indicated by our perception, nature of the subsequent bunches
intensely relies upon the determination of introductory centroid and changes in
information groups in the subsequence cycles. As we probably am aware, after a
specific number of cycles, a little piece of the information focuses change their
bunches. In this manner, our proposed technique initially finds the underlying
centroid and puts an interim between those information components which
won't change their bunch and those which may change their group in the
subsequence emphasess. With the goal that it will decrease the workload
fundamentally if there should be an occurrence of huge informational collections.
We assess our technique with various arrangements of information and contrast
and others strategies also.
DA1 3

MODIFIED ALGORITHM
DA1 4

RESULTS
DA1 5

PAPER 2

A Proposed Modification of K-Means Algorithm

Sharfuddin Mahmood American International University- Bangladesh, Dhaka,
1213, Bangladesh Email: [email protected]

K-means algorithm is one of the most popular algorithms for data clustering. With
this algorithm, data of similar types are tried to be clustered together from a large
data set with brute force strategy which is done by repeated calculations. As a
result, the computational complexity of this algorithm is very high. Several
researches have been carried out to minimize this complexity. This paper presents
the result of our research, which proposes a modified version of k-means
algorithm with an improved technique to divide the data set into specific
numbers of clusters with the help of several check point values. It requires less
computation and has enhanced accuracy than the traditional k-means algorithm
as well as some modified variant of the traditional kMeans.

MODIFIED ALGORITHM
Step 1:

Find the Euclidian distance of each data object from the origin (0i,0j0n).

Here we randomly select N data objects as initial origin. Then we find out the
Euclidian distance between each data object with respect to the origin.
DA1 6

Step 2:

Sort the N-data object in ascending order according to the distance found in the
previous step.

Step 3:

Divide the data set into K equal clusters. K will be determined according to the
user requirement or on the type of the data set. This will act as the primary
cluster.

For setting up the initial cluster this step is necessary. Depending on the number
of cluster needed we now divide the whole data set into equal portion. For every
situation this may not be the case as there may not be equal numbers of object in
every data set. As example, if we have 1000 data objects and we have to divide
them into 3 clusters then the cluster may have 333,333,334 numbers of objects in
each of it.

Step 4:

For each cluster, consider the middle point as the primary cluster center. That is,
if there is N data members and K clusters, the primary cluster center will be (
(n/k)/2)th object.

As this data set is obtained from the distance from the initial origins, so the center
points will be the most significant points in each clusters from which all the data
objects will mostly have a unified distance.

Step 5:

Find the distance between the cluster centers. If there are K clusters, there will be
Kdistances. Divide the distance by 2 and store the value in Dij (i, j=0,1,k). Here
Dij denotes the middle point of the distance from cluster center i to cluster center
j. This Dij will be used as a check point value.

For example, if cluster A and B have cluster centers Ai and Bi , and suppose the
center point of the distance between Ai and Bi will denote a point where the
DA1 7

distance is equal from both cluster centers. As a result it can be utilized to

determine the new cluster for any data objects if needed.

Step 6:

Find the Euclidian distance of each data object di (i=1. k) from the cluster center
it is assigned to.

Step 7:

Compare di with the distance stored in Dij.

If the distance is less than or equal to Dij, then the object stays in the previous
cluster.

That is, the distance from the current cluster center is less than the distance from
the center points of two cluster centers. As a result we can conclude that this
object is closer to its current cluster. Hence we do not need to calculate the
distance from other cluster center. This check point value will ensure that we
need less computation.

Else calculate the Euclidian distance of the data object with respect to the center
with which the distance crossed the check point value. That is, if Dij is exceeded
and the object was previously in the cluster with center i, then compute the
distance with respect to cluster center j.

Means the object may be closer from the other cluster center. To be sure about
this, we have to calculate the distance with respect to other cluster center.

Now compare the distances. Assign the data object to the cluster from whose
center it has a shorter distance.

Recalculate the cluster centers by taking the mean of every objects currently
present in one cluster. This point can be an imaginary point which has no
existence in our current data set or can be any current object of our dataset. This
will not affect the outcome of our algorithm.
DA1 8

Go back to step 4 and repeat until the convergence criteria is met. That is no data
object is moving from one cluster to another cluster after the cluster center is
changed. That results in the object of the cluster remains same, hence the center
also remains unchanged. Now we can draw the conclusion that we have achieved
the final clusters. That is we grouped together similar objects in each, which may
be different from other clusters.

PAPER 3
An Improvement in K-mean Clustering Algorithm Using
Better Time and Accuracy
Er. Nikhil Chaturvedi and Er. Anand Rajavat

Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense
or another) to each other than to those in other groups (clusters).K-means is one
of the simplest unsupervised learning algorithms that solve the well known
clustering problem. The process of k means algorithm data is partitioned into K
clusters and the data are randomly choose to the clusters resulting in clusters that
have the same number of data set. This paper is proposed a new K means
clustering algorithm we calculate the initial centroids systemically instead of
random assigned due to which accuracy and time improved.
DA1 9

MODIFIED ALGORITHM
Phase 1: For the initial centroids

Steps:

1. Set p = 1;

2. Measure the distance between each data and all other data in the set D;

3. Find the closest pair of data from the set D and form a data set Ap (1<= p <= k)
which contains these two data, Delete these two data from the set D;

4. Find the data in D that is closest to the data set Ap, Add it to Ap and delete it
from D;

5. Repeat step 4 until the number of data in Ap reaches all data in D;

6. If p<=p<=k) find the mean of data in Ap. These means will be the initial
centroids.

Phase 2: Data to the clusters

Steps:

1. Compute distance between each data to all the centroids

2. for each data di find the closest centroid ci and assing to cluster j.

3. Set ClusterCL[i] = j; // j:CL of the closest cluster

4. Set Shorter_Dist[i] = d (di, cj);

5. For each cluster j (1<=j<=k), recalculate the centroids;

6. Repeat

7. for each data di,

7.1 Compute the distance from the centroids of the closest cluster;
DA1 10

7.2 If distance is less than or equal to the present closest distance, the data-
point stays in cluster; Else

7.2.1 For every centroids compute the distance. End for;

7.2.2 Data di assign to the cluster with the closest centroid cj

7.2.3 Set ClusterCL[i] =j;

7.2.4 Set Shorter_Dist[i] = d (di, cj); End for;

8. For each cluster j (1<=j<=k), recalculate the centroids; until the criteria is met.

RESULTS
DA1 11

PAPER 4

Implementing & Improvisation of K-means Clustering

Algorithm
Unnati R. Raval , Chaita Jani

The clustering techniques are the most important part of the data analysis and k-
means is the oldest and popular clustering technique used. The paper discusses
the traditional K-means algorithm with advantages and disadvantages of it. It also
includes researched on enhanced k-means proposed by various authors and it
also includes the techniques to improve traditional K-means for better accuracy
and efficiency. There are two area of concern for improving K-means; 1) is to
select initial centroids and 2) by assigning data points to nearest cluster by using
equations for calculating mean and distance between two data points. The time
complexity of the proposed K-means technique will be lesser that then the
traditional one with increase in accuracy and efficiency. The main purpose of the
article is to proposed techniques to enhance the techniques for deriving initial
centroids and the assigning of the data points to its nearest clusters. The
clustering technique proposed in this paper is enhancing the accuracy and time
complexity but it still needs some further improvements and in future it is also
viable to include efficient techniques for selecting value for initial clusters(k).
Experimental results show that the improved method can effectively improve the
speed of clustering and accuracy, reducing the computational complexity of the k-
means.
DA1 12

MODIFIED ALGORITHM
Part1: Determine initial centroids

Step1.1: Input Dataset

Step1.2: Check the Each attributes of the Records

Step1.3: Find the mean value for the given Dataset.

Step1.4: Find the distance for each data point from mean value using Equation
(Equ).

The Distance between the mean value is minimum then it will be stored
in

Then Divide datasets into k cluster points dont needs to move to other
clusters.

ESLE

Recalculate distance for each data point from mean value using Equation
(Equ) until divide datasets into k cluster

Part2: Assigning data points to nearest centroids

Step2.1: Calculate Distance from each data point to centroids and assign data
points to its nearest centroid to form clusters and stored values for each data.

Step2.2: Calculate new centroids for these clusters.

Step2.3: Calculate distance from all centroids to each data point for all data
points.

IF
DA1 13

The Distance stored previously is equal to or less then Distance stored in

Step2.1

Then Those Data points dont needs to move to other clusters.

ESLE

From the distance calculated assign data point to its nearest centroid by
comparing distance from different centroids.

Step2.5: Calculate centroids for these new clusters again. Until The convergence
criterion met.

RESULTS

PAPER 5

Modification of K-means Clustering Algorithm

Sumit Patil , Dhanashri Patil , Rushikesh Babar , Abhishek Rathi
DA1 14

K-means algorithm is one of the important algorithms when it comes to

clustering. Clustering is a term where all similar kind of data is clubbed together in
a cluster. In this project we have tried to modify the algorithm for its scalability.
As this algorithm is mainly used for analyzing big data, when the data is too big it
takes hours to run the program and sometimes system can hang. So we have tried
to reduce its execution time. To get the output faster than the traditional
algorithm.

MODIFIED ALGORITHM

Step 1:

Take a reference point (0,0). (If the data contain 2 attributes). If the data contains
3 attributes (0,0,0). For n attributes we have to consider a n-d point.

Step 2:

Calculate the distance of all the points from the reference point which you have
taken. The distance can be calculated by Eludian distance formula

X1- x component of reference point

Y1- y component of reference point

X2- x component of given point

Y2- y component of given point

Step 3:

Calculate the mean of the distance which we have calculated in step 2. M= ((X-
X1)^2+(Y-Y1)^2)/N M-mean N-Total points
DA1 15

Step 4:

If the number of required cluster is n:

E=D/N

D- Distance N-Total points

Step 5:

Divide the points into N clusters

1 st cluster points having distance between 0 to E.

2 nd cluster points having distance between E to 2E.

Similarly we will go up to N.

Step 6:

Calculate the centroid of the clusters formed in step 5. We will do this by

calculating the mean of the dimensions of the points in the respective clusters.

X= (xi)/n

Y= (yi)/n

n - number of points present in that cluster.

Similarly we can go it to n dimensions.

Step 7:

Now as we have got the centroid we have to repeat the steps of the traditional
method.
DA1 16

RESULTS

Final Report
No ratings yet
Final Report
41 pages
Math KSSR Weekly Practice Y4-Week10
No ratings yet
Math KSSR Weekly Practice Y4-Week10
2 pages
A Theorem On Coloring The Lines of A Network
No ratings yet
A Theorem On Coloring The Lines of A Network
4 pages
Image Segmentation Using K Mean Algorithm
No ratings yet
Image Segmentation Using K Mean Algorithm
5 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
K-Means Clustering Clustering Algorithms Implementation and Comparison
No ratings yet
K-Means Clustering Clustering Algorithms Implementation and Comparison
4 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Fuzzy Clustering
No ratings yet
Fuzzy Clustering
6 pages
Ijettcs 2014 04 25 123
No ratings yet
Ijettcs 2014 04 25 123
5 pages
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
No ratings yet
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
4 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
An Initial Seed Selection Algorithm
No ratings yet
An Initial Seed Selection Algorithm
11 pages
Unit-4
No ratings yet
Unit-4
46 pages
Unsupervised Learning - Clustering Cheatsheet - Codecademy
No ratings yet
Unsupervised Learning - Clustering Cheatsheet - Codecademy
5 pages
DSV_Unit 3_Data Analysis in Depth
No ratings yet
DSV_Unit 3_Data Analysis in Depth
53 pages
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
No ratings yet
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
4 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
CV UNIT 4
No ratings yet
CV UNIT 4
60 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
K Mean
No ratings yet
K Mean
12 pages
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
No ratings yet
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
6 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Ijcttjournal V1i1p12
No ratings yet
Ijcttjournal V1i1p12
3 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
ML extended
No ratings yet
ML extended
25 pages
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
No ratings yet
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
7 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
Dynamicclustering
No ratings yet
Dynamicclustering
6 pages
A Density Clustering Based On Outlier
No ratings yet
A Density Clustering Based On Outlier
6 pages
Ijret 110306027
No ratings yet
Ijret 110306027
4 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
1 s2.0 S1877050923018549 Main
No ratings yet
1 s2.0 S1877050923018549 Main
5 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Complete Referenec of Sementics
No ratings yet
Complete Referenec of Sementics
6 pages
Unit 7 Clustering (P) (1) (1)
No ratings yet
Unit 7 Clustering (P) (1) (1)
22 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Data Mining and Machine Learning PDF
No ratings yet
Data Mining and Machine Learning PDF
10 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
K Mean
No ratings yet
K Mean
7 pages
K - Mean Clustering
No ratings yet
K - Mean Clustering
12 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Cluster Center Initialization Algorithm For K-Means Clustering
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
10 pages
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
No ratings yet
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
5 pages
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
No ratings yet
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
4 pages
Privacy Preserving K-Means Clustering On Horizontally Distributed Data
No ratings yet
Privacy Preserving K-Means Clustering On Horizontally Distributed Data
4 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
A Peek Into The Future
No ratings yet
A Peek Into The Future
4 pages
12d Model Course Notes - Basic Road Design PDF
No ratings yet
12d Model Course Notes - Basic Road Design PDF
50 pages
2 - Neural Network
100% (1)
2 - Neural Network
59 pages
Quantitative Reasoning Test 2 PDF
0% (1)
Quantitative Reasoning Test 2 PDF
20 pages
Probability Assignment
No ratings yet
Probability Assignment
53 pages
Class 8 Chapter 14 Part 2
No ratings yet
Class 8 Chapter 14 Part 2
40 pages
Graph Algorithms 10 271709207487730
No ratings yet
Graph Algorithms 10 271709207487730
21 pages
Tabel Bunga Ekonomi Teknik
No ratings yet
Tabel Bunga Ekonomi Teknik
32 pages
Color Moments
0% (1)
Color Moments
4 pages
An Example: Water Level Control: Sltank Sltank Tank - Fis Tank
No ratings yet
An Example: Water Level Control: Sltank Sltank Tank - Fis Tank
9 pages
Growth Prediction
No ratings yet
Growth Prediction
56 pages
Logarithm Assignment
No ratings yet
Logarithm Assignment
2 pages
Panel Lecture - Gujarati
100% (1)
Panel Lecture - Gujarati
26 pages
PMC Ovate Class 7 8
No ratings yet
PMC Ovate Class 7 8
15 pages
DIP Lab Manual No 09
No ratings yet
DIP Lab Manual No 09
22 pages
Albon, Mykaella Joy N. Managerial Economics BSA 1-9 Assignment Profit Maximization
No ratings yet
Albon, Mykaella Joy N. Managerial Economics BSA 1-9 Assignment Profit Maximization
7 pages
Flexibilit Y Principles in Boolean Semantics
No ratings yet
Flexibilit Y Principles in Boolean Semantics
312 pages
Akra, Bazzi - On The Solution of Linear Recurrence Equations PDF
No ratings yet
Akra, Bazzi - On The Solution of Linear Recurrence Equations PDF
16 pages
Bio Data Bidvae
No ratings yet
Bio Data Bidvae
6 pages
Geothermal Power Plants: Principles, Applications, Case Studies and Environmental Impact
No ratings yet
Geothermal Power Plants: Principles, Applications, Case Studies and Environmental Impact
1 page
Sets N Probability Classified 1
No ratings yet
Sets N Probability Classified 1
35 pages
FSC Project
No ratings yet
FSC Project
9 pages
What Will Be The New Optimal Solution? So That The Laborer 1's Willingness To Work Remains Relevant? Available Number of Hours
No ratings yet
What Will Be The New Optimal Solution? So That The Laborer 1's Willingness To Work Remains Relevant? Available Number of Hours
2 pages
Oldexams DSP
No ratings yet
Oldexams DSP
25 pages
Dataexcel
No ratings yet
Dataexcel
54 pages
FORM 2 Topical Test C3
No ratings yet
FORM 2 Topical Test C3
4 pages
The DES Algorithm Illustrated: by J. Orlin Grabbe
No ratings yet
The DES Algorithm Illustrated: by J. Orlin Grabbe
14 pages

Storage Technologies: Digital Assignment 1

Uploaded by

Storage Technologies: Digital Assignment 1

Uploaded by

DA1 1

Measure of information is getting greater in each minute and this information

A Proposed Modification of K-Means Algorithm

distance is equal from both cluster centers. As a result it can be utilized to

Compare di with the distance stored in Dij.

5. Repeat step 4 until the number of data in Ap reaches all data in D;

Phase 2: Data to the clusters

1. Compute distance between each data to all the centroids

3. Set ClusterCL[i] = j; // j:CL of the closest cluster

4. Set Shorter_Dist[i] = d (di, cj);

5. For each cluster j (1<=j<=k), recalculate the centroids;

7. for each data di,

7.2.1 For every centroids compute the distance. End for;

7.2.2 Data di assign to the cluster with the closest centroid cj

7.2.3 Set ClusterCL[i] =j;

7.2.4 Set Shorter_Dist[i] = d (di, cj); End for;

Implementing & Improvisation of K-means Clustering

Step1.1: Input Dataset

Step1.2: Check the Each attributes of the Records

Step1.3: Find the mean value for the given Dataset.

Part2: Assigning data points to nearest centroids

Step2.2: Calculate new centroids for these clusters.

The Distance stored previously is equal to or less then Distance stored in

Then Those Data points dont needs to move to other clusters.

Modification of K-means Clustering Algorithm

K-means algorithm is one of the important algorithms when it comes to

X1- x component of reference point

Y1- y component of reference point

X2- x component of given point

Y2- y component of given point

If the number of required cluster is n:

D- Distance N-Total points

Divide the points into N clusters

1 st cluster points having distance between 0 to E.

2 nd cluster points having distance between E to 2E.

Calculate the centroid of the clusters formed in step 5. We will do this by

n - number of points present in that cluster.

Similarly we can go it to n dimensions.

You might also like