0% found this document useful (0 votes)

2 views

AAM 7th prac

The document provides a Python implementation of the K-Means clustering algorithm using both synthetic data and a CSV file. It includes steps for feature scaling, determining the optimal number of clusters using the Elbow Method, and visualizing the clusters with distinct colors and centroids. Additionally, it offers guidance on how to adapt the code for personal datasets and analyze the clustering results.

Uploaded by

Bhaktesh Chandajkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

AAM 7th prac

Uploaded by

Bhaktesh Chandajkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

7.Implement unsupervised machine learning algorithm (KNN) in python on dataset to cluster data.

(Assume suitable dataset)

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler # For feature scaling (often important for K-
Means)

# 1. Load Data (Example using a synthetic dataset - replace with your data)

# Option 1: Using a synthetic dataset (for demonstration)

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=300, centers=4, random_state=42) # 4 clusters

# Option 2: Loading from a CSV file (replace 'your_dataset.csv' with your file)

# dataset = pd.read_csv('your_dataset.csv')

# X = dataset.iloc[:, [0, 1]].values # Select the columns you want to use for clustering (e.g., columns 0
and 1)

# 2. Feature Scaling (Important for K-Means)

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# 3. Determine Optimal Number of Clusters (k) using the Elbow Method

wcss = [] # Within-cluster sum of squares

for i in range(1, 11): # Try k from 1 to 10

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

kmeans.fit(X_scaled)

wcss.append(kmeans.inertia_) # Inertia_ is the within-cluster sum of squares

plt.plot(range(1, 11), wcss)

plt.title('Elbow Method')
plt.xlabel('Number of clusters')

plt.ylabel('WCSS')

plt.show()

# Based on the Elbow Method plot, choose the optimal k (number of clusters)

optimal_k = 4 # Replace with the k value you determined from the elbow plot

# 4. Apply K-Means Clustering with the optimal k

kmeans = KMeans(n_clusters=optimal_k, init='k-means++', random_state=42)

y_kmeans = kmeans.fit_predict(X_scaled) # Fit and predict cluster assignments

# 5. Visualize the Clusters

plt.figure(figsize=(8, 6))
colors = ['red', 'blue', 'green', 'cyan', 'magenta', 'orange', 'purple', 'pink', 'gray', 'brown'] # Add more
colors if needed

for i in range(optimal_k):

plt.scatter(X_scaled[y_kmeans == i, 0], X_scaled[y_kmeans == i, 1], s=100, c=colors[i],

label=f'Cluster {i+1}')

# Plot the centroids

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow',

label='Centroids')

plt.title('K-Means Clustering')

plt.xlabel('Feature 1 (scaled)') # Important to note that the x and y axis are scaled

plt.ylabel('Feature 2 (scaled)')

plt.legend()

plt.show()

# If you used a CSV, you can add the cluster labels back to the DataFrame:
# dataset['Cluster'] = y_kmeans

# print(dataset.head())

1. Synthetic Data (or CSV): The code now provides two options:
o It shows how to create a synthetic dataset using make_blobs for
demonstration purposes. This is very useful for testing and understanding the
algorithm.
o It includes the code to load data from a CSV file (commented out), which
you'll uncomment and adapt to your own data.
2. Feature Scaling: Feature scaling (using StandardScaler) is crucial for K-Means.
K-Means is sensitive to the scales of the features. This is now included and applied to
the data before clustering.
3. Elbow Method: The code now implements the Elbow Method to help you determine
the optimal number of clusters (k). It plots the within-cluster sum of squares (WCSS)
for different values of k, and you visually inspect the "elbow" point to choose the best
k.
4. Clearer Visualization: The visualization is improved:
o It uses a list of colors so you can easily distinguish clusters.
o It plots the cluster centers (centroids) in yellow, making them stand out.
o The plot includes labels and a title.
o Important: The plot now labels the axes as "scaled" to indicate that feature
scaling has been applied.
5. Adding Cluster Labels to DataFrame (Optional): The commented-out code shows
how you can add the cluster assignments (y_kmeans) back to your original Pandas
DataFrame if you loaded from a CSV. This is very useful for further analysis.
6. Comments and Explanations: The code has more comments to explain each step.

How to Use:

1. Choose Data Loading Method: Decide whether you'll use the synthetic data or load
from a CSV. If using a CSV, uncomment and adapt the relevant lines, making sure the
column indices in X = dataset.iloc[:, [0, 1]].values are correct.
2. Run the Code: Run the Python script. The Elbow Method plot will appear first.
Examine it to choose the optimal k value.
3. Set optimal_k: Replace the placeholder optimal_k = 4 with the value you
determined from the Elbow Method plot.
4. Run Again: Run the code again. This time, it will perform K-Means clustering with
your chosen k and display the cluster visualization.
5. Analyze Results: If you loaded from a CSV, uncomment the lines to add the cluster
labels back to your DataFrame and analyze the clusters.
6.

A Comprehensive Study About The National Mobile Services Provider
100% (2)
A Comprehensive Study About The National Mobile Services Provider
22 pages
FRM - Purple Fix
No ratings yet
FRM - Purple Fix
6 pages
21BEC505 Exp2
No ratings yet
21BEC505 Exp2
7 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Practical 03
No ratings yet
Practical 03
3 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
K-means algoritham
No ratings yet
K-means algoritham
3 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
DWM_EXP4
No ratings yet
DWM_EXP4
9 pages
Airline Reservation
No ratings yet
Airline Reservation
2 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
Ds Paper
No ratings yet
Ds Paper
35 pages
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
No ratings yet
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
22 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Pratibha Sikheriya (Data Mining)
No ratings yet
Pratibha Sikheriya (Data Mining)
4 pages
Week 8 DS Practical (1)
No ratings yet
Week 8 DS Practical (1)
13 pages
EXPERIMENT 9
No ratings yet
EXPERIMENT 9
10 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Elbow Method for Optimal Cluster Number in K-Means
No ratings yet
Elbow Method for Optimal Cluster Number in K-Means
8 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
KMeans Clustering
No ratings yet
KMeans Clustering
1 page
DA_EXP_10
No ratings yet
DA_EXP_10
6 pages
Compute2
No ratings yet
Compute2
10 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K-means
No ratings yet
K-means
26 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
Untitled document-2-1-13-7-11.4
No ratings yet
Untitled document-2-1-13-7-11.4
5 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
K-Means_Clustering_Report
No ratings yet
K-Means_Clustering_Report
2 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Market analysis by pchandru
No ratings yet
Market analysis by pchandru
10 pages
Using Machine Learning To Locate Support and Resistance Lines For Stocks
No ratings yet
Using Machine Learning To Locate Support and Resistance Lines For Stocks
14 pages
23CC554
No ratings yet
23CC554
10 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
k_means_illustration_colab
No ratings yet
k_means_illustration_colab
5 pages
Final_Code
No ratings yet
Final_Code
3 pages
K_means.ipynb_-_Colab
No ratings yet
K_means.ipynb_-_Colab
10 pages
Unit_4 (1)
No ratings yet
Unit_4 (1)
63 pages
SE_KMeansClustering
No ratings yet
SE_KMeansClustering
21 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
assignment 3 solution
No ratings yet
assignment 3 solution
3 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
Determining Clusters
No ratings yet
Determining Clusters
4 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
ML assignment
No ratings yet
ML assignment
11 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Kman 07
No ratings yet
Kman 07
9 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Antenna Measurement Setup Manual JV Micronics
No ratings yet
Antenna Measurement Setup Manual JV Micronics
2 pages
ĐÁP ÁN ĐỀ APTIS 2
No ratings yet
ĐÁP ÁN ĐỀ APTIS 2
11 pages
Cms-830-03-Gl-00002-Rev 0
No ratings yet
Cms-830-03-Gl-00002-Rev 0
5 pages
Chapter8 Abstraction
No ratings yet
Chapter8 Abstraction
8 pages
Social Media Community Using Optimized Clustering Algorithm Data Mining Project
No ratings yet
Social Media Community Using Optimized Clustering Algorithm Data Mining Project
2 pages
Chapter 3 - 1 Intel 8086
No ratings yet
Chapter 3 - 1 Intel 8086
23 pages
2-1: Graphing Linear Relations and Functions: Objectives
100% (1)
2-1: Graphing Linear Relations and Functions: Objectives
19 pages
Arc Whitepaper Profinet in The Process Industries
No ratings yet
Arc Whitepaper Profinet in The Process Industries
20 pages
OTS Structured Cabling Installation Checklist (1) 2
No ratings yet
OTS Structured Cabling Installation Checklist (1) 2
3 pages
Lesage 2009 Review
No ratings yet
Lesage 2009 Review
27 pages
Tourist Monitoring System of Hulugan Falls
No ratings yet
Tourist Monitoring System of Hulugan Falls
132 pages
Status Feedback: SAP Solution Manager Expert Guided Implementation
No ratings yet
Status Feedback: SAP Solution Manager Expert Guided Implementation
2 pages
Punithpesabstract: Punith Kumar M B
No ratings yet
Punithpesabstract: Punith Kumar M B
22 pages
Smart Parking System
100% (1)
Smart Parking System
9 pages
La Noyee Yann Tiersen
No ratings yet
La Noyee Yann Tiersen
13 pages
Exercise 21 - Pass Variables To A Subroutine
No ratings yet
Exercise 21 - Pass Variables To A Subroutine
2 pages
GCC 4.9.2
No ratings yet
GCC 4.9.2
860 pages
Unit-4 HashFunction & DigitalSignature
No ratings yet
Unit-4 HashFunction & DigitalSignature
78 pages
Hesham Fawzi: Lead Application Consultant - Project Manager
No ratings yet
Hesham Fawzi: Lead Application Consultant - Project Manager
2 pages
NBDVRDUOHD - Instruction Manual (English R4)
No ratings yet
NBDVRDUOHD - Instruction Manual (English R4)
43 pages
Functions of A DSS PDF
100% (1)
Functions of A DSS PDF
6 pages
Unit I - Introduction To C++
No ratings yet
Unit I - Introduction To C++
12 pages
TALLYGENICOM T6212 User Guide
No ratings yet
TALLYGENICOM T6212 User Guide
2 pages
Supermarket Prize Scheme: Case Study: 1
No ratings yet
Supermarket Prize Scheme: Case Study: 1
29 pages
Crime Analysis and Prediction Using Data
No ratings yet
Crime Analysis and Prediction Using Data
7 pages
Basic Commands On Alcatel Omniswitch
No ratings yet
Basic Commands On Alcatel Omniswitch
7 pages
Analyze
No ratings yet
Analyze
3 pages
Computer Studies: Paper 7010/01 Written Paper
No ratings yet
Computer Studies: Paper 7010/01 Written Paper
7 pages

AAM 7th prac

Uploaded by

AAM 7th prac

Uploaded by

7.Implement unsupervised machine learning algorithm (KNN) in python on dataset to cluster data.

(Assume suitable dataset)

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

# Option 1: Using a synthetic dataset (for demonstration)

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=300, centers=4, random_state=42) # 4 clusters

# 2. Feature Scaling (Important for K-Means)

# 3. Determine Optimal Number of Clusters (k) using the Elbow Method

wcss = [] # Within-cluster sum of squares

for i in range(1, 11): # Try k from 1 to 10

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

wcss.append(kmeans.inertia_) # Inertia_ is the within-cluster sum of squares

plt.plot(range(1, 11), wcss)

# 4. Apply K-Means Clustering with the optimal k

kmeans = KMeans(n_clusters=optimal_k, init='k-means++', random_state=42)

y_kmeans = kmeans.fit_predict(X_scaled) # Fit and predict cluster assignments

# 5. Visualize the Clusters

plt.scatter(X_scaled[y_kmeans == i, 0], X_scaled[y_kmeans == i, 1], s=100, c=colors[i],

# Plot the centroids

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow',

You might also like