0% found this document useful (0 votes)
2 views6 pages

Yogesh Siddiq Edited

The document outlines the implementation of K-means clustering algorithms using Python on three different datasets: the Iris dataset, the Breast Cancer dataset, and the Diabetes dataset. Each program includes data loading, preprocessing, model fitting, cluster prediction, and visualization of the results. The document provides code snippets and outputs for each dataset, demonstrating the clustering process and the shapes of the datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

Yogesh Siddiq Edited

The document outlines the implementation of K-means clustering algorithms using Python on three different datasets: the Iris dataset, the Breast Cancer dataset, and the Diabetes dataset. Each program includes data loading, preprocessing, model fitting, cluster prediction, and visualization of the results. The document provides code snippets and outputs for each dataset, demonstrating the clustering process and the shapes of the datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IMPLEMENTATION OF CLUSTERING ALGORITHMS

AIM: To write a python program for implementing clustering algorithms


for different datasets.

PROGRAM :1

K-MEANS CLUSTERING USING IRIS DATASET

import matplotlib.pyplot as plt


import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans

# Load the iris dataset


iris_data = load_iris()
X = iris_data.data # Assigning dataset to X
y = iris_data.target # Target labels

# Convert to pandas DataFrame for viewing with head()


iris_df = pd.DataFrame(X, columns=iris_data.feature_names)
iris_df['species'] = iris_data.target_names[y]

# Display the first 5 rows of the dataset using head()


print(iris_df.head(5))

# Display the shape of the dataset


print("Shape of the dataset (samples, features):", iris_df.shape)

# Initialize KMeans with 3 clusters (since the iris dataset has 3 classes)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)

# Fit the model to the data


kmeans.fit(X)

# Predict the clusters


predicted_clusters = kmeans.predict(X)

# Visualize the clusters (for sepal length and sepal width features)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=predicted_clusters, s=50, cmap='viridis')

# Plot the cluster centers


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, label='Cluster Centers')

# Labels and title


plt.xlabel(iris_data.feature_names[0]) # Sepal length
plt.ylabel(iris_data.feature_names[1]) # Sepal width
plt.title('K-means Clustering on Iris Dataset')
plt.legend()
# Show the plot
plt.show()

OUTPUT:

Shape of the dataset (samples, features):

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

species
0 setosa
1 setosa
2 setosa
3 setosa
4 setosa
PROGRAM : 2
K-MEANS CLUSTERING USING CANCER DATASET

# Import necessary libraries


from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd # For using the head() command

# Load the breast cancer dataset


cancer_data = load_breast_cancer()
X = cancer_data.data

# Convert to a pandas DataFrame


cancer_df = pd.DataFrame(X, columns=cancer_data.feature_names)

# Check the shape of the dataset


print(f"\nShape of the original dataset (X): {X.shape}")

# Display the first 5 rows of the dataset


print(cancer_df.head())

# Standardize the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Initialize KMeans with 2 clusters (since breast cancer dataset has 2 classes: malignant and benign)
kmeans = KMeans(n_clusters=2, random_state=42)

# Fit the model to the scaled data


kmeans.fit(X_scaled)

# Predict the clusters


predicted_clusters = kmeans.predict(X_scaled)

# Visualize the clusters (for the first two features, which can represent the first two principal components)
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=predicted_clusters, s=50, cmap='viridis')

# Plot the cluster centers


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, label='Cluster Centers')

plt.xlabel('First feature (scaled)')


plt.ylabel('Second feature (scaled)')
plt.title('K-means Clustering on Breast Cancer Dataset')
plt.legend()
plt.show()
OUTPUT:

Shape of the original dataset (X): (569, 30)


Shape of the scaled dataset (X_scaled): (569, 30)

mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030

[5 rows x 30 columns]
PROGRAM 3:

K-MEANS CLUSTERING USING DIABETES DATASET

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import pandas as pd # For using the head() command

# Load the diabetes dataset


diabetes_data = load_diabetes()
X = diabetes_data.data

# Convert to a pandas DataFrame


diabetes_df = pd.DataFrame(X,
columns=diabetes_data.feature_names)

# Display the first 5 rows of the dataset


print(diabetes_df.head())

# Check the shape of the dataset


print(f"\nShape of the original dataset (X): {X.shape}")

# Check the shape of the scaled dataset


print(f"Shape of the scaled dataset (X_scaled): {X_scaled.shape}")

# Standardize the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Initialize KMeans with 2 clusters (as an example)


kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
# Added n_init to avoid FutureWarning

# Fit the model to the scaled data


kmeans.fit(X_scaled)

# Predict the clusters


predicted_clusters = kmeans.predict(X_scaled)

# Visualize the clusters (for the first two features)


plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=predicted_clusters,
s=50, cmap='Greens')

# Get cluster centers


centers = kmeans.cluster_centers_ # Fixed incorrect attribute
reference
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75,
label='Centers')

plt.xlabel(diabetes_data.feature_names[0])
plt.ylabel(diabetes_data.feature_names[1])
plt.title('K-means Clustering on Diabetes Dataset')
plt.legend()
plt.show()

OUTPUT:
Shape of the original dataset (X): (442, 10)
Shape of the scaled dataset (X_scaled): (442, 10)
age sex bmi bp s1 s2 s3 \
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142
s4 s5 s6
0 -0.002592 0.019907 -0.017646
1 -0.039493 -0.068332 -0.092204
2 -0.002592 0.002861 -0.025930
3 0.034309 0.022688 -0.009362
4 -0.002592 -0.031988 -0.046641

You might also like