0% found this document useful (0 votes)

2 views6 pages

Yogesh Siddiq Edited

The document outlines the implementation of K-means clustering algorithms using Python on three different datasets: the Iris dataset, the Breast Cancer dataset, and the Diabetes dataset. Each program includes data loading, preprocessing, model fitting, cluster prediction, and visualization of the results. The document provides code snippets and outputs for each dataset, demonstrating the clustering process and the shapes of the datasets.

Uploaded by

Yuvarani Aruchamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Yogesh Siddiq Edited

Uploaded by

Yuvarani Aruchamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

IMPLEMENTATION OF CLUSTERING ALGORITHMS

AIM: To write a python program for implementing clustering algorithms

for different datasets.

PROGRAM :1

K-MEANS CLUSTERING USING IRIS DATASET

import matplotlib.pyplot as plt

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans

# Load the iris dataset

iris_data = load_iris()
X = iris_data.data # Assigning dataset to X
y = iris_data.target # Target labels

# Convert to pandas DataFrame for viewing with head()

iris_df = pd.DataFrame(X, columns=iris_data.feature_names)
iris_df['species'] = iris_data.target_names[y]

# Display the first 5 rows of the dataset using head()

print(iris_df.head(5))

# Display the shape of the dataset

print("Shape of the dataset (samples, features):", iris_df.shape)

# Initialize KMeans with 3 clusters (since the iris dataset has 3 classes)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)

# Fit the model to the data

kmeans.fit(X)

# Predict the clusters

predicted_clusters = kmeans.predict(X)

# Visualize the clusters (for sepal length and sepal width features)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=predicted_clusters, s=50, cmap='viridis')

# Plot the cluster centers

centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, label='Cluster Centers')

# Labels and title

plt.xlabel(iris_data.feature_names[0]) # Sepal length
plt.ylabel(iris_data.feature_names[1]) # Sepal width
plt.title('K-means Clustering on Iris Dataset')
plt.legend()
# Show the plot
plt.show()

OUTPUT:

Shape of the dataset (samples, features):

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

species
0 setosa
1 setosa
2 setosa
3 setosa
4 setosa
PROGRAM : 2
K-MEANS CLUSTERING USING CANCER DATASET

# Import necessary libraries

from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd # For using the head() command

# Load the breast cancer dataset

cancer_data = load_breast_cancer()
X = cancer_data.data

# Convert to a pandas DataFrame

cancer_df = pd.DataFrame(X, columns=cancer_data.feature_names)

# Check the shape of the dataset

print(f"\nShape of the original dataset (X): {X.shape}")

# Display the first 5 rows of the dataset

print(cancer_df.head())

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Initialize KMeans with 2 clusters (since breast cancer dataset has 2 classes: malignant and benign)
kmeans = KMeans(n_clusters=2, random_state=42)

# Fit the model to the scaled data

kmeans.fit(X_scaled)

# Predict the clusters

predicted_clusters = kmeans.predict(X_scaled)

# Visualize the clusters (for the first two features, which can represent the first two principal components)
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=predicted_clusters, s=50, cmap='viridis')

# Plot the cluster centers

centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, label='Cluster Centers')

plt.xlabel('First feature (scaled)')

plt.ylabel('Second feature (scaled)')
plt.title('K-means Clustering on Breast Cancer Dataset')
plt.legend()
plt.show()
OUTPUT:

Shape of the original dataset (X): (569, 30)

Shape of the scaled dataset (X_scaled): (569, 30)

mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030

[5 rows x 30 columns]
PROGRAM 3:

K-MEANS CLUSTERING USING DIABETES DATASET

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import pandas as pd # For using the head() command

# Load the diabetes dataset

diabetes_data = load_diabetes()
X = diabetes_data.data

# Convert to a pandas DataFrame

diabetes_df = pd.DataFrame(X,
columns=diabetes_data.feature_names)

# Display the first 5 rows of the dataset

print(diabetes_df.head())

# Check the shape of the dataset

print(f"\nShape of the original dataset (X): {X.shape}")

# Check the shape of the scaled dataset

print(f"Shape of the scaled dataset (X_scaled): {X_scaled.shape}")

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Initialize KMeans with 2 clusters (as an example)

kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
# Added n_init to avoid FutureWarning

# Fit the model to the scaled data

kmeans.fit(X_scaled)

# Predict the clusters

predicted_clusters = kmeans.predict(X_scaled)

# Visualize the clusters (for the first two features)

plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=predicted_clusters,
s=50, cmap='Greens')

# Get cluster centers

centers = kmeans.cluster_centers_ # Fixed incorrect attribute
reference
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75,
label='Centers')

plt.xlabel(diabetes_data.feature_names[0])
plt.ylabel(diabetes_data.feature_names[1])
plt.title('K-means Clustering on Diabetes Dataset')
plt.legend()
plt.show()

OUTPUT:
Shape of the original dataset (X): (442, 10)
Shape of the scaled dataset (X_scaled): (442, 10)
age sex bmi bp s1 s2 s3 \
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142
s4 s5 s6
0 -0.002592 0.019907 -0.017646
1 -0.039493 -0.068332 -0.092204
2 -0.002592 0.002861 -0.025930
3 0.034309 0.022688 -0.009362
4 -0.002592 -0.031988 -0.046641

File System Questions
No ratings yet
File System Questions
34 pages
ITEC 7410/EDL 7105 SWOT Analysis Template For Assessment of Eight ISTE Essential Conditions
No ratings yet
ITEC 7410/EDL 7105 SWOT Analysis Template For Assessment of Eight ISTE Essential Conditions
16 pages
Week 8. K-Means
No ratings yet
Week 8. K-Means
7 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
Unit 1: Linear Programming L.P. Problems: Components
No ratings yet
Unit 1: Linear Programming L.P. Problems: Components
21 pages
Lecture 2 DC
No ratings yet
Lecture 2 DC
40 pages
Ah en Ax Software Suite Versions 106104 en 12
No ratings yet
Ah en Ax Software Suite Versions 106104 en 12
8 pages
Romax Concept
No ratings yet
Romax Concept
2 pages
Win2k19 Updates
No ratings yet
Win2k19 Updates
20 pages
MMS Combined Notes - GCES
No ratings yet
MMS Combined Notes - GCES
121 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
Absen September 2022-1
No ratings yet
Absen September 2022-1
5 pages
Casos de ML Unsupervised Daniel Ames Camayo
No ratings yet
Casos de ML Unsupervised Daniel Ames Camayo
20 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Задачи 10 - Вложенные структуры
No ratings yet
Задачи 10 - Вложенные структуры
11 pages
Clustering - Jupyter Notebook
100% (1)
Clustering - Jupyter Notebook
11 pages
PFM Manual
No ratings yet
PFM Manual
124 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
J Model Test
No ratings yet
J Model Test
3 pages
AN-SM-035 Fenced Perimeter Installations - FD500 Series Rev B 9-12
No ratings yet
AN-SM-035 Fenced Perimeter Installations - FD500 Series Rev B 9-12
30 pages
23-02-22 - System Administrator
No ratings yet
23-02-22 - System Administrator
2 pages
Implement Clustering Algorithms
No ratings yet
Implement Clustering Algorithms
4 pages
redirectToURL Chrome Tiktok 1
No ratings yet
redirectToURL Chrome Tiktok 1
6 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Sentron Pac3100 Manual en 03 en-US
No ratings yet
Sentron Pac3100 Manual en 03 en-US
172 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
DC - Unit 1 Complte Notes
No ratings yet
DC - Unit 1 Complte Notes
66 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
program-8
No ratings yet
program-8
11 pages
data science exercise hard
No ratings yet
data science exercise hard
12 pages
Low-Power High-Speed Full Adder For Portable Electronic Applications
No ratings yet
Low-Power High-Speed Full Adder For Portable Electronic Applications
2 pages
CV of Md. Atik Ullah Bhuiyan ExSS
No ratings yet
CV of Md. Atik Ullah Bhuiyan ExSS
4 pages
Exploitan Androiddeviceusingpayloadinjected APK
No ratings yet
Exploitan Androiddeviceusingpayloadinjected APK
7 pages
ML - LAB 2 - Jupyter Notebook
No ratings yet
ML - LAB 2 - Jupyter Notebook
9 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
DC Syllabus Ppt
No ratings yet
DC Syllabus Ppt
8 pages
Experiment 11ml
No ratings yet
Experiment 11ml
1 page
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
Import As Import As Import As From Import Import As Import
No ratings yet
Import As Import As Import As From Import Import As Import
7 pages
lab manual ML
No ratings yet
lab manual ML
23 pages
K-Means Cluster
No ratings yet
K-Means Cluster
2 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
kmeans
No ratings yet
kmeans
2 pages
Lab-7_Clustering
No ratings yet
Lab-7_Clustering
4 pages
Exercise for k means tutorial
No ratings yet
Exercise for k means tutorial
5 pages
LAB7_Kmeans[1]
No ratings yet
LAB7_Kmeans[1]
11 pages
All I Really Need To Know in Business I Learned at Microsoft
No ratings yet
All I Really Need To Know in Business I Learned at Microsoft
9 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
PGM 7
No ratings yet
PGM 7
3 pages
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
No ratings yet
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
4 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
1MRG043109_en Release of Relion 670_650 and SAM600-IO series Version 2
No ratings yet
1MRG043109_en Release of Relion 670_650 and SAM600-IO series Version 2
7 pages
A Mini Rpoject
No ratings yet
A Mini Rpoject
7 pages
Clustering - With - Elbow - Plot - ML - 4 - Jupyter Notebook
No ratings yet
Clustering - With - Elbow - Plot - ML - 4 - Jupyter Notebook
6 pages
SE_KMeansClustering
No ratings yet
SE_KMeansClustering
21 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Vertopal.com Lab4 KNN
No ratings yet
Vertopal.com Lab4 KNN
9 pages
Ite1003 Database-Management-Systems Eth 1.0 37 Ite1003
No ratings yet
Ite1003 Database-Management-Systems Eth 1.0 37 Ite1003
6 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Clustering
No ratings yet
Clustering
1 page
085
No ratings yet
085
4 pages
Document 10
No ratings yet
Document 10
3 pages
k means
No ratings yet
k means
5 pages
AAM CODES
No ratings yet
AAM CODES
8 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Input Output Systems
No ratings yet
Input Output Systems
15 pages
9-ds
No ratings yet
9-ds
5 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Bürkert Communicator Release Notes: Releas e Release Date Description Action
No ratings yet
Bürkert Communicator Release Notes: Releas e Release Date Description Action
9 pages
2.3 Aiml Rishit
No ratings yet
2.3 Aiml Rishit
7 pages
frm010-Serial LCD
No ratings yet
frm010-Serial LCD
4 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
8 pages
5f62f4d4a52ca Bibhuranjan Sahoo
No ratings yet
5f62f4d4a52ca Bibhuranjan Sahoo
1 page
370 Skewb L2L
No ratings yet
370 Skewb L2L
6 pages
K Means
No ratings yet
K Means
3 pages
DMDW Lab8
No ratings yet
DMDW Lab8
3 pages
Chap5_wei.ipynb - Colab
No ratings yet
Chap5_wei.ipynb - Colab
29 pages
DC Syllabus
No ratings yet
DC Syllabus
3 pages
Quizizz: IT202-MIDTERM EXAM
No ratings yet
Quizizz: IT202-MIDTERM EXAM
55 pages
Program 9
No ratings yet
Program 9
2 pages
Rectangular Prisms Easy 1
100% (1)
Rectangular Prisms Easy 1
2 pages
OS objective type qustions
No ratings yet
OS objective type qustions
27 pages
Using Visio and Arena
No ratings yet
Using Visio and Arena
5 pages
Knn Datacamp
No ratings yet
Knn Datacamp
31 pages
Experiment 10 vtu ml
No ratings yet
Experiment 10 vtu ml
5 pages
bone suplement market segmentation
No ratings yet
bone suplement market segmentation
20 pages
7cseaimlsyll
No ratings yet
7cseaimlsyll
11 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Verilog UBC
No ratings yet
Verilog UBC
29 pages
Linear Seating Arrangement Questions For IBPS PO PDF
No ratings yet
Linear Seating Arrangement Questions For IBPS PO PDF
9 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
MHC PDF
No ratings yet
MHC PDF
2 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Yogesh Siddiq Edited

Uploaded by

Yogesh Siddiq Edited

Uploaded by

IMPLEMENTATION OF CLUSTERING ALGORITHMS

AIM: To write a python program for implementing clustering algorithms

K-MEANS CLUSTERING USING IRIS DATASET

import matplotlib.pyplot as plt

# Load the iris dataset

# Convert to pandas DataFrame for viewing with head()

# Display the first 5 rows of the dataset using head()

# Display the shape of the dataset

# Fit the model to the data

# Predict the clusters

# Plot the cluster centers

# Labels and title

Shape of the dataset (samples, features):

# Import necessary libraries

# Load the breast cancer dataset

# Convert to a pandas DataFrame

# Check the shape of the dataset

# Display the first 5 rows of the dataset

# Standardize the features

# Fit the model to the scaled data

# Predict the clusters

# Plot the cluster centers

plt.xlabel('First feature (scaled)')

Shape of the original dataset (X): (569, 30)

K-MEANS CLUSTERING USING DIABETES DATASET

# Import necessary libraries

# Load the diabetes dataset

# Convert to a pandas DataFrame

# Display the first 5 rows of the dataset

# Check the shape of the dataset

# Check the shape of the scaled dataset

# Standardize the features

# Initialize KMeans with 2 clusters (as an example)

# Fit the model to the scaled data

# Predict the clusters

# Visualize the clusters (for the first two features)

# Get cluster centers

You might also like