0% found this document useful (0 votes)
21 views

Lab8 K Mean Clustering

This lab report discusses applying k-means clustering on random data sets. The student is asked to run k-means clustering with values of k ranging from 2 to 1000 over 100 iterations. They also analyze the effect of varying k and the number of iterations on clustering 100 and 500 data points. The conclusion discusses how k-means clustering works by assigning data points to the closest initial centroid and updating the centroid locations in each iteration until cluster assignments converge.

Uploaded by

Jahangir Awan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lab8 K Mean Clustering

This lab report discusses applying k-means clustering on random data sets. The student is asked to run k-means clustering with values of k ranging from 2 to 1000 over 100 iterations. They also analyze the effect of varying k and the number of iterations on clustering 100 and 500 data points. The conclusion discusses how k-means clustering works by assigning data points to the closest initial centroid and updating the centroid locations in each iteration until cluster assignments converge.

Uploaded by

Jahangir Awan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Department of Computer Science HITEC

University, Taxila BS Computer Science


Program (Batch 2021)

CS-428 Introduction to Machine Learning


3 (2+1)

Lab Report # 8
Name: Hammad Ali
Zaryab Ali Haider
Registration No: 21-cs-039
21-cs-119

Instructor: Miss. Faiza Jahangir


TASK
Apply K-mean clustering to cluster the random data set starting from 2
and ends on 1000.

CODE:
data = 2 + rand(100,2) * (1000 - 2);
K = 3;
initialCentroids = datasample(data, K);

%number of iterations algorihtam ke lie decide karrha he maxIterations =


100;

previousAssignments = zeros(size(data, 1), 1);

for iter = 1:maxIterations


distances = pdist2(data, initialCentroids);

[~, clusterAssignments] = min(distances, [], 2);


for k = 1:K
clusterPoints = data(clusterAssignments == k, :);
initialCentroids(k, :) = mean(clusterPoints, 1);
end
if isequal(clusterAssignments, previousAssignments)
break;
end %
previousAssignments = clusterAssignments;
end
scatter(data(:, 1), data(:, 2), 30, clusterAssignments, 'filled');
title('K-MEAN-CLUSTERING');

Output:
TASK
Apply k-mean clustering algorithm on the given below data matrix:
Data = [
1 150 15.4 50400200 18
2 144 11.3 42100650 15
3 120 9.9 39440420 12
4 110 12.5 36500520 16
5 100 9.7 40650005 10]

CODE
data = [
, 150, 15.4, 50400200, 18;
,144, 11.3, 42100650, 15;
, 120 9.9 39440420 12;
, 110 12.5 36500520 16;
, 100 9.7 40650005 10
];

K = 2;
initialCentroids = datasample(data, K);

maxIterations = 100;

previousAssignments = zeros(size(data, 1), 1);

for iter = 1:maxIterations


distances = pdist2(data, initialCentroids);

[~, clusterAssignments] = min(distances, [], 2); for k =


1:K
clusterPoints = data(clusterAssignments == k, :);
initialCentroids(k, :) = mean(clusterPoints, 1); end
if isequal(clusterAssignments, previousAssignments) break;
end %
previousAssignments = clusterAssignments; end
scatter(data(:, 1), data(:, 2), 30, clusterAssignments, 'filled'); title('K-
MEAN-CLUSTERING');
Output:

TASK
Analyze the working of k-mean clustering for above task 1and take k= 4,
k=8, k=12 and k=20 for 100 and 500 iterations.

CODE
data = 2 + rand(100,2) * (1000 - 2);

K = 20;
initialCentroids = datasample(data, K);

maxIterations = 500;

previousAssignments = zeros(size(data, 1), 1);

for iter = 1:maxIterations


distances = pdist2(data, initialCentroids);

[~, clusterAssignments] = min(distances, [], 2); for k =


1:K
clusterPoints = data(clusterAssignments == k, :);
initialCentroids(k, :) = mean(clusterPoints, 1); end
if isequal(clusterAssignments, previousAssignments) break;
end %
previousAssignments = clusterAssignments; end
scatter(data(:, 1), data(:, 2), 30, clusterAssignments, 'filled'); title('K-
MEAN-CLUSTERING');
OUTPUT:
K=20 AND ITERATION=500
K=12 AND ITERATION=100
K=8 AND ITERATION=100

CONCLUSION
In this lab we learned how to create a k mean clustering algoritham and use it. K represents
the number of centroids which is generated randomly from the dataset. Max iterations are
the maximum iterations for this algoritham. Maximum iteration depends on the data rows
of our data. If there is no difference between current iteration assignment of clusters, we
break the loop. In task 3 , there wont be any difference if we change the iteration from 100
to 500 because the data is only of 100 rows, it will produce better clustering if we have
greater data size. Higher number of k in this case will give us good result as there is good
amount of data to have 20 centroids. After every cluster assignment, the value of centroid
is updated based on the average of that cluster. So gradually the cluster becomes more
generalised

You might also like