0% found this document useful (0 votes)

13 views

k-means-clustering

The document outlines a k-means clustering analysis using Python, including data import, visualization, and clustering execution. It demonstrates the elbow method for determining the optimal number of clusters and provides the resulting cluster labels and centroids. Additionally, it includes a silhouette score to evaluate the clustering quality.

Uploaded by

Arundhathi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

k-means-clustering

Uploaded by

Arundhathi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

k-means-clustering

November 13, 2024

[2]: # k means clustering

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[3]: data = pd.read_excel(r"C:\Users\lenovo\Downloads\Clustering_ex.xlsx.xlsx")

[4]: data

[4]: Variable_1 Variable_2

0 12 30
1 20 36
2 28 30
3 18 52
4 29 54
5 33 46
6 24 55
7 45 59
8 45 63
9 52 70
10 51 66
11 52 63
12 55 58
13 53 23
14 55 14
15 61 8
16 64 19
17 69 7
18 72 24

[5]: fig = plt.figure(figsize = (5,5))

x = data["Variable_1"]
y = data["Variable_2"]
n = range(0,19)
plt.grid()
plt.scatter(x, y, marker = 'o', c = 'red' )

1
plt.xlabel('Variable_1')
plt.ylabel('Variable_2')
for i, txt in enumerate(n):
plt.annotate(txt, (x[i], y[i]))

[51]: from sklearn.cluster import KMeans

individual_clustering_score = []
for i in range(1,5):
kmeans = KMeans(n_clusters = i)
kmeans.fit(data)
individual_clustering_score.append(kmeans.inertia_)

C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:870:
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1382:
UserWarning: KMeans is known to have a memory leak on Windows with MKL, when
there are less chunks than available threads. You can avoid it by setting the
environment variable OMP_NUM_THREADS=1.

2
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:870:
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1382:
UserWarning: KMeans is known to have a memory leak on Windows with MKL, when
there are less chunks than available threads. You can avoid it by setting the
environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:870:
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1382:
UserWarning: KMeans is known to have a memory leak on Windows with MKL, when
there are less chunks than available threads. You can avoid it by setting the
environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:870:
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
C:\Users\lenovo\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1382:
UserWarning: KMeans is known to have a memory leak on Windows with MKL, when
there are less chunks than available threads. You can avoid it by setting the
environment variable OMP_NUM_THREADS=1.
warnings.warn(

[61]: individual_clustering_score

[61]: [13773.57894736842, 5352.166666666667, 1794.142857142857, 1063.75]

[62]: plt.figure(figsize=(10,6))
plt.plot(range(1,5), individual_clustering_score)
plt.title("elbow methos")
plt.xlabel("Number of clusters")
plt.ylabel("Within cluster sum of squares")
plt.show()

3
[63]: labels = kmeans.predict(data)

[64]: labels

[64]: array([3, 3, 3, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1])

[65]: kmeans.labels_

[65]: array([3, 3, 3, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1])

[66]: centroids = kmeans.cluster_centers_

[60]: # cluster centers

centroids

[60]: array([[26. , 51.75 ],

[62.33333333, 15.83333333],
[50. , 63.16666667],
[20. , 32. ]])

[84]: fig = plt.figure(figsize = (5,5))

# dictionary- map numbers to colors
colmap = {1:'m', 2:'b', 3:'g', 4:'k'}
# map will assign colors to labels

4
colors = map(lambda x: colmap[x+1], labels)

colors1=list(colors)
plt.scatter(x, y, color= colors1, alpha = 0.5 )
# plotting the centroids wrt color
for idx, centroid in enumerate(centroids):
plt.scatter(*centroid, color = colmap[idx+1])
# labeling the points as 0,1,2,....18
for i, txt in enumerate(n):
plt.annotate(txt, (x[i], y[i]))
plt.grid()

<map object at 0x0000025EB57B2080>

[16]: from sklearn.metrics import silhouette_score

silhouette_score(data, labels)

[16]: 0.6179376814567372

[86]: print(colors1)

['k', 'k', 'k', 'm', 'm', 'm', 'm', 'g', 'g', 'g', 'g', 'g', 'g', 'b', 'b', 'b',

5
'b', 'b', 'b']

[ ]:

2B201-454EN - Aquilion Premium - Troubleshooting
No ratings yet
2B201-454EN - Aquilion Premium - Troubleshooting
33 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
K-Means Clustering - Jupyter Notebook
No ratings yet
K-Means Clustering - Jupyter Notebook
11 pages
K Means Algorithm
No ratings yet
K Means Algorithm
6 pages
ML Python Exercises UOM BDS Cluster Analysis
No ratings yet
ML Python Exercises UOM BDS Cluster Analysis
8 pages
Seed
No ratings yet
Seed
29 pages
linear-regression
No ratings yet
linear-regression
8 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
ZFNet For CIFAR-10 Classification
No ratings yet
ZFNet For CIFAR-10 Classification
33 pages
Day59 K Means Clustering 1701989733
No ratings yet
Day59 K Means Clustering 1701989733
5 pages
CV Project1
No ratings yet
CV Project1
11 pages
Labsheet2
No ratings yet
Labsheet2
8 pages
23CC554
No ratings yet
23CC554
10 pages
P3) Code Neural Networks
No ratings yet
P3) Code Neural Networks
3 pages
06K_means_clustering
No ratings yet
06K_means_clustering
4 pages
customers-k-means
No ratings yet
customers-k-means
11 pages
Computer vision activity
No ratings yet
Computer vision activity
6 pages
HASRITH ML LAB 10 ASSIGNMENT - Jupyter Notebook (1)
No ratings yet
HASRITH ML LAB 10 ASSIGNMENT - Jupyter Notebook (1)
8 pages
Wa0009.
No ratings yet
Wa0009.
6 pages
1 Kmeans-Pratical-No-1
No ratings yet
1 Kmeans-Pratical-No-1
8 pages
Clustering
No ratings yet
Clustering
1 page
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
6 pages
2020 06-06-02 Hierarchical Clustering.ipynb Colab
No ratings yet
2020 06-06-02 Hierarchical Clustering.ipynb Colab
5 pages
DM ML Practical
No ratings yet
DM ML Practical
13 pages
It - S All About Neighbors - Completed
No ratings yet
It - S All About Neighbors - Completed
14 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
4.cluster Analysis
No ratings yet
4.cluster Analysis
7 pages
Generative AI Binary Classification
No ratings yet
Generative AI Binary Classification
7 pages
MLT Exp 09
No ratings yet
MLT Exp 09
3 pages
kmeans
No ratings yet
kmeans
4 pages
Maxbox - Starter68 Machine Learning
No ratings yet
Maxbox - Starter68 Machine Learning
5 pages
ML 5
No ratings yet
ML 5
12 pages
2.3 Aiml Rishit
No ratings yet
2.3 Aiml Rishit
7 pages
MLT 8 KK
No ratings yet
MLT 8 KK
2 pages
Content: From Import Import As Import Import Import As
No ratings yet
Content: From Import Import As Import Import Import As
8 pages
Annotated Follow-Along Guide - Hello, Python!
No ratings yet
Annotated Follow-Along Guide - Hello, Python!
10 pages
heartbeat-disease-classifier (1)
No ratings yet
heartbeat-disease-classifier (1)
21 pages
AML Clustering
No ratings yet
AML Clustering
7 pages
27 Jupyter Notebook
No ratings yet
27 Jupyter Notebook
42 pages
Practical 5
No ratings yet
Practical 5
11 pages
Scenario 1:: Acknowlegement
No ratings yet
Scenario 1:: Acknowlegement
17 pages
Brain Tumor Detection Using Deep Learning
No ratings yet
Brain Tumor Detection Using Deep Learning
96 pages
Clustering - Jupyter Notebook
100% (1)
Clustering - Jupyter Notebook
11 pages
ml exp-5,6 (1)[1] (1)
No ratings yet
ml exp-5,6 (1)[1] (1)
6 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Ann 1
No ratings yet
Ann 1
20 pages
KNN
No ratings yet
KNN
14 pages
Python Code
No ratings yet
Python Code
52 pages
Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction
No ratings yet
Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction
6 pages
Data Science Practical
No ratings yet
Data Science Practical
22 pages
629 ML Assignment
No ratings yet
629 ML Assignment
6 pages
Revision
No ratings yet
Revision
11 pages
Untitled66 - Jupyter Notebook
No ratings yet
Untitled66 - Jupyter Notebook
2 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
22 pages
Kecerdasan Artifisial Dan Masyarakat - M5
No ratings yet
Kecerdasan Artifisial Dan Masyarakat - M5
8 pages
Planar Data Classification With One Hidden Layer v5
No ratings yet
Planar Data Classification With One Hidden Layer v5
19 pages
Cnnbyrohanga: # Create Datasets
No ratings yet
Cnnbyrohanga: # Create Datasets
1 page
IMP_Hierarchical_Clustering
No ratings yet
IMP_Hierarchical_Clustering
3 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Proposal for Venture Capitalist (1)
No ratings yet
Proposal for Venture Capitalist (1)
12 pages
ch15_concurrency control (2)
No ratings yet
ch15_concurrency control (2)
92 pages
Indexing
No ratings yet
Indexing
6 pages
pandas_workshop - Jupyter Notebook
No ratings yet
pandas_workshop - Jupyter Notebook
5 pages
Hardware: Architectures
No ratings yet
Hardware: Architectures
33 pages
Broad, Enterprise VPN
No ratings yet
Broad, Enterprise VPN
7 pages
Honors in Intelligent Computing All Sem
No ratings yet
Honors in Intelligent Computing All Sem
14 pages
The Growth of Cryptocurrency in India Its Challenges Potential Impactson Legislation
No ratings yet
The Growth of Cryptocurrency in India Its Challenges Potential Impactson Legislation
23 pages
F700 Version-Up Presentation
No ratings yet
F700 Version-Up Presentation
26 pages
The Tangent Line
No ratings yet
The Tangent Line
27 pages
Entrepreneurship and Information Technology: Prepared By: LUCILA G. AGENA
No ratings yet
Entrepreneurship and Information Technology: Prepared By: LUCILA G. AGENA
32 pages
Ieee Dissertation Template
100% (2)
Ieee Dissertation Template
8 pages
Business Data Communications and Networking A Research Perspective 1st Edition by Jairo Gutierrez ISBN 1599042746 9781599042749 download
100% (4)
Business Data Communications and Networking A Research Perspective 1st Edition by Jairo Gutierrez ISBN 1599042746 9781599042749 download
51 pages
AWP Notes
No ratings yet
AWP Notes
8 pages
Home Work ABC
No ratings yet
Home Work ABC
2 pages
Importance of Internet Connectivity To Grade 11 STEM Students of Senior High School Within Bacoor Elementary School During Online Learning
No ratings yet
Importance of Internet Connectivity To Grade 11 STEM Students of Senior High School Within Bacoor Elementary School During Online Learning
16 pages
Wave Net
No ratings yet
Wave Net
4 pages
Social Media Addiction and Study Habits of Senior High Students
No ratings yet
Social Media Addiction and Study Habits of Senior High Students
8 pages
Satish
No ratings yet
Satish
1 page
Prajwal - V - Resume - 09 08 2023 14 31 05
No ratings yet
Prajwal - V - Resume - 09 08 2023 14 31 05
1 page
Lexmark Cs720 Cs725 Cs727 Cs728 c4150 SM
No ratings yet
Lexmark Cs720 Cs725 Cs727 Cs728 c4150 SM
548 pages
Brksec 2464
100% (1)
Brksec 2464
140 pages
T2DDT0 Manual
No ratings yet
T2DDT0 Manual
4 pages
Gots Approved Chemicals List 2015
56% (9)
Gots Approved Chemicals List 2015
1,035 pages
Gapped Text-KEY - 5 BAI DAU
100% (1)
Gapped Text-KEY - 5 BAI DAU
24 pages
HGM9520N en
No ratings yet
HGM9520N en
111 pages
System Programming and Compiler Construction SPCC Viva Questions With Answer Sem 6 CS MU - Doubtly
No ratings yet
System Programming and Compiler Construction SPCC Viva Questions With Answer Sem 6 CS MU - Doubtly
13 pages
st-6100
No ratings yet
st-6100
2 pages
LT Series HMI Studio User Manual
No ratings yet
LT Series HMI Studio User Manual
492 pages
Pega Marketing 81 Product Overview
No ratings yet
Pega Marketing 81 Product Overview
42 pages
PED08 MODULE
No ratings yet
PED08 MODULE
102 pages
Shrutika Pawar CV
No ratings yet
Shrutika Pawar CV
2 pages
WTVB01-BT50 Manual
No ratings yet
WTVB01-BT50 Manual
17 pages

k-means-clustering

Uploaded by

k-means-clustering

Uploaded by

k-means-clustering

November 13, 2024

[2]: # k means clustering

[3]: data = pd.read_excel(r"C:\Users\lenovo\Downloads\Clustering_ex.xlsx.xlsx")

[4]: Variable_1 Variable_2

[5]: fig = plt.figure(figsize = (5,5))

[51]: from sklearn.cluster import KMeans

[61]: [13773.57894736842, 5352.166666666667, 1794.142857142857, 1063.75]

[64]: array([3, 3, 3, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1])

[65]: array([3, 3, 3, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1])

[66]: centroids = kmeans.cluster_centers_

[60]: # cluster centers

[60]: array([[26. , 51.75 ],

[84]: fig = plt.figure(figsize = (5,5))

<map object at 0x0000025EB57B2080>

[16]: from sklearn.metrics import silhouette_score

You might also like