0% found this document useful (0 votes)

9 views

DA_EXP_10

Uploaded by

anandkrishna1511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

DA_EXP_10

Uploaded by

anandkrishna1511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

BHARATIYA VIDYA BHAVAN’S

SARDAR PATEL INSTITUTE OF TECHNOLOGY

(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

Course - Data Analytics

UID 2021600022
2021600033

Name Mahek Gupta

Shruti Kedari

Class and Batch BE AIML Batch B

Date 10-11-2024

Lab 10

Aim To perform clustering on a dataset.

Objective Clustering groups similar data points together, revealing hidden patterns or structures. It
helps in tasks like customer segmentation, anomaly detection, and image recognition by
organizing data into meaningful clusters for better insights and decision-making.

Theory
K-Means Clustering Theory:
K-Means is a popular unsupervised clustering algorithm that divides a dataset into K
distinct clusters based on feature similarity. The objective is to minimize the variance (or
sum of squared distances) within each cluster, ensuring that data points in the same cluster
are as similar as possible.

Steps in K-Means Algorithm:

1. Initialization:
○ Select K initial cluster centroids randomly or using some heuristic (like
K-means++ for better initial centroids).
2. Assignment Step:
○ Assign each data point to the nearest centroid based on a distance metric
(typically Euclidean distance).
3. Update Step:
○ Recalculate the centroids of the clusters by taking the mean of all the data
points assigned to each cluster.
4. Repeat:
○ Repeat steps 2 and 3 until convergence (i.e., when the assignments no
longer change, or the centroids stabilize).

Distance Metric:
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

● The Euclidean distance is typically used to measure the distance between data
points and centroids:
d(x,y)=(x1−y1)2+(x2−y2)2+⋯+(xn−yn)2d(x, y) = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 +
\dots + (x_n - y_n)^2}d(x,y)=(x1−y1)2+(x2−y2)2+⋯+(xn−yn)2
where xxx and yyy are data points, and x1,x2,…,xnx_1, x_2, \dots, x_nx1,x2,…,xn
are their respective feature values.

Key Concepts:
1. Centroids: The central point of each cluster, typically the mean of all data points in
the cluster.
2. K: The number of clusters you want to divide your dataset into. Selecting the
optimal K is important and can be done using methods like the Elbow Method,
where the sum of squared distances (within-cluster variance) is plotted for different
values of K to find the "elbow" point, indicating the optimal number of clusters.
3. Convergence: K-means converges when either the centroids do not change
significantly between iterations or a predefined number of iterations is reached.

Advantages of K-Means:
● Scalability: K-means is computationally efficient and works well with large
datasets.
● Simplicity: The algorithm is easy to implement and understand.
● Efficiency: Converges quickly, especially when the data is well-separated.

Limitations of K-Means:
● Choosing K: The value of K must be specified in advance, and choosing the
correct K can be challenging.
● Sensitive to Initialization: Poor initialization of centroids can lead to suboptimal
clustering. This is addressed with techniques like K-means++.
● Assumes Spherical Clusters: K-means assumes that clusters are spherical and
equally sized, which may not always be the case.
● Outliers: K-means is sensitive to outliers, as they can distort the mean of the
cluster.

Applications:
● Market segmentation (grouping customers with similar buying behaviors).
● Document clustering (grouping similar texts or articles).
● Image compression (grouping similar pixel values).

Implementation / # importing required libraries

BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

Code import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
import pandas as pd

# Load the dataset

url =
'https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Whole
sale%20customers%20data.csv'
data = pd.read_csv(url)

# Display the first few rows

print(data.head())

# statistics of the data

data.describe()
# standardizing the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# statistics of scaled data

pd.DataFrame(data_scaled).describe()

# inertia on the fitted data

kmeans.inertia_
# fitting multiple k-means algorithms and storing the values in an
empty list
SSE = []
for cluster in range(1,20):
kmeans = KMeans( n_clusters = cluster, init='k-means++')
kmeans.fit(data_scaled)
SSE.append(kmeans.inertia_)
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

# converting the results into a dataframe and plotting them
frame = pd.DataFrame({'Cluster':range(1,20), 'SSE':SSE})
plt.figure(figsize=(12,6))
plt.plot(frame['Cluster'], frame['SSE'], marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
# k means using 5 clusters and k-means++ initialization
kmeans = KMeans( n_clusters = 5, init='k-means++')
kmeans.fit(data_scaled)
pred = kmeans.predict(data_scaled)
frame = pd.DataFrame(data_scaled)
frame['cluster'] = pred
frame['cluster'].value_counts()

Output
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

Conclusion K-Means is an efficient and widely used clustering algorithm that groups data into K
clusters based on similarity. It works by iteratively assigning data points to the nearest
centroid and updating centroids until convergence. While it is computationally efficient and
simple, it requires selecting the optimal K and can be sensitive to initialization and outliers.
Despite these limitations, K-Means is widely applied in areas like market segmentation,
image compression, and pattern recognition.

References https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/

CLRS Solutions
No ratings yet
CLRS Solutions
530 pages
Data Structure and Algorithms Assignment
No ratings yet
Data Structure and Algorithms Assignment
4 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
AAM UNIT 4 QB WITH ANSWER
No ratings yet
AAM UNIT 4 QB WITH ANSWER
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
MINOR PROJECT
No ratings yet
MINOR PROJECT
10 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Kmean
No ratings yet
Kmean
24 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
DOC-20250407-WA0033.
No ratings yet
DOC-20250407-WA0033.
38 pages
Pilot
No ratings yet
Pilot
3 pages
DWM_EXP4
No ratings yet
DWM_EXP4
9 pages
k-means
No ratings yet
k-means
25 pages
K_Means_Clustering_Report
No ratings yet
K_Means_Clustering_Report
3 pages
KMEANS
No ratings yet
KMEANS
9 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
K-Means_Clustering_Report
No ratings yet
K-Means_Clustering_Report
2 pages
K_means.ipynb_-_Colab
No ratings yet
K_means.ipynb_-_Colab
10 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K Mean
No ratings yet
K Mean
7 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Facebook Live Seller
No ratings yet
Facebook Live Seller
8 pages
K MEANS
No ratings yet
K MEANS
40 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
K-Means Clustering
No ratings yet
K-Means Clustering
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
06. k Clustering
No ratings yet
06. k Clustering
28 pages
Unit II Final
No ratings yet
Unit II Final
152 pages
BIL Report
No ratings yet
BIL Report
24 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Introduction To Kmeans
No ratings yet
Introduction To Kmeans
4 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
05 DSA PPT Algorithmic Anaysis-II
No ratings yet
05 DSA PPT Algorithmic Anaysis-II
19 pages
Lecture3 Partition
100% (1)
Lecture3 Partition
24 pages
Hcai Mock
100% (1)
Hcai Mock
5 pages
Difference Between ArrayList and LinkedList - Javatpoint
No ratings yet
Difference Between ArrayList and LinkedList - Javatpoint
4 pages
Clustering
No ratings yet
Clustering
125 pages
Data Structure Lab (CS29001) Lesson Plan
No ratings yet
Data Structure Lab (CS29001) Lesson Plan
19 pages
Introduction to Data Structures-1
No ratings yet
Introduction to Data Structures-1
60 pages
DS Lab Program 7
No ratings yet
DS Lab Program 7
4 pages
Roadmap
No ratings yet
Roadmap
2 pages
DSA ASSIGNMENT
No ratings yet
DSA ASSIGNMENT
15 pages
Bai402 Aimod2notes
No ratings yet
Bai402 Aimod2notes
17 pages
Ada Chapt6 Tronsform and Conquer
No ratings yet
Ada Chapt6 Tronsform and Conquer
106 pages
Data Structures and Algorithms Notes 1
No ratings yet
Data Structures and Algorithms Notes 1
3 pages
GHS - MST Algorithm
No ratings yet
GHS - MST Algorithm
27 pages
Disjoint Sets Data Structure (Chap. 21)
No ratings yet
Disjoint Sets Data Structure (Chap. 21)
32 pages
Equally-Spaced Collinear Subset of A Set. The Algorithm Works by "Overlapping" All Equally
No ratings yet
Equally-Spaced Collinear Subset of A Set. The Algorithm Works by "Overlapping" All Equally
11 pages
Chapter 9 Greedy Technique Selected
No ratings yet
Chapter 9 Greedy Technique Selected
13 pages
CSC203
No ratings yet
CSC203
5 pages
Data Structure Question Bank: Itm University, Gwalior
No ratings yet
Data Structure Question Bank: Itm University, Gwalior
8 pages
Searching and Sorting PYQ's
No ratings yet
Searching and Sorting PYQ's
33 pages
DAA Module 3
No ratings yet
DAA Module 3
25 pages
Eleven Slab Answers
100% (2)
Eleven Slab Answers
8 pages
Unimodal Function: F X Is Unimodal If I X X X FX FX II X X X FX F X, Where X
No ratings yet
Unimodal Function: F X Is Unimodal If I X X X FX FX II X X X FX F X, Where X
5 pages
COL106: Data Structures and Algorithms: Ragesh Jaiswal, IITD
No ratings yet
COL106: Data Structures and Algorithms: Ragesh Jaiswal, IITD
22 pages
Algorithms_Unit1_074220
No ratings yet
Algorithms_Unit1_074220
73 pages
Cse3521 hw1
No ratings yet
Cse3521 hw1
3 pages
Data Structures and Algorithms: Lecture 7. Basic Sorting Algorithms
No ratings yet
Data Structures and Algorithms: Lecture 7. Basic Sorting Algorithms
26 pages
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
No ratings yet
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
30 pages

DA_EXP_10

Uploaded by

DA_EXP_10

Uploaded by

BHARATIYA VIDYA BHAVAN’S

SARDAR PATEL INSTITUTE OF TECHNOLOGY

Department of Computer Science Engineering

Name Mahek Gupta

Class and Batch BE AIML Batch B

Aim To perform clustering on a dataset.

Steps in K-Means Algorithm:

Department of Computer Science Engineering

Implementation / # importing required libraries

Department of Computer Science Engineering

# Load the dataset

# Display the first few rows

# statistics of the data

# statistics of scaled data

# inertia on the fitted data

Department of Computer Science Engineering

Department of Computer Science Engineering

Department of Computer Science Engineering

You might also like