0% found this document useful (0 votes)
29 views

ML Lab Manual 2025-2

The document is a lab manual for the Machine Learning course at N.B.K.R. Institute of Science and Technology for the academic year 2024-2025, prepared by Assistant Professor C. Jyothsna. It includes a detailed index of experiments covering various machine learning algorithms and techniques, such as KNN, decision trees, random forests, and data preprocessing methods. Each experiment is accompanied by source code examples and expected outputs to facilitate hands-on learning for students in the Computer Science and Engineering department, specifically in the AI and ML branch.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

ML Lab Manual 2025-2

The document is a lab manual for the Machine Learning course at N.B.K.R. Institute of Science and Technology for the academic year 2024-2025, prepared by Assistant Professor C. Jyothsna. It includes a detailed index of experiments covering various machine learning algorithms and techniques, such as KNN, decision trees, random forests, and data preprocessing methods. Each experiment is accompanied by source code examples and expected outputs to facilitate hands-on learning for students in the Computer Science and Engineering department, specifically in the AI and ML branch.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

N.B.K.R.

INSTITUTE OF SCIENCE AND TECHNOLOGY: VIDYANAGAR

Department of Computer Science and Engineering

Machine Learning Lab Manual

Academic Year : 2024-2025


Year and Semester : II-II
Branch : CSE-AI & ML

Prepared by
C. Jyothsna, Assistant Professor, CSE

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Index
S. No Experiment Name
1 Compute Central Tendency Measures: Mean, Median, Mode Measure of
Dispersion: Variance, Standard Deviation.
2 Apply the following Pre-processing techniques for a given dataset.
a. Attribute selection
b. Handling Missing Values
c. Discretization
d. Elimination of Outliers
3 Apply KNN algorithm for classification and regression
4 Demonstrate decision tree algorithm for a classification problem and perform
parameter tuning for better results
5 Demonstrate decision tree algorithm for a regression problem
6 Apply Random Forest algorithm for classification and regression
7 Demonstrate Naïve Bayes Classification algorithm.
8 Apply Support Vector algorithm for classification
9 Demonstrate simple linear regression algorithm for a regression problem
10 Apply Logistic regression algorithm for a classification problem
11 Demonstrate Multi-layer Perceptron algorithm for a classification problem
12 Implement the K-means algorithm and apply it to the data you selected. Evaluate
performance by measuring the sum of the Euclidean distance of each example
from its class centre. Test the performance of the algorithm as a function of the
parameters K.
13 Demonstrate the use of Fuzzy C-Means Clustering
14 Demonstrate the use of Expectation Maximization based clustering algorithm

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Experiment-1
Aim: 1. Compute Central Tendency Measures: Mean, Median, Mode
Measure of Dispersion: Variance, Standard Deviation.

Source Code:
import numpy as np
# Sample data (replace with your actual data)
data = [1, 2, 3, 4, 5, 5, 6, 7, 8, 9]

# Calculate mean
mean = np.mean(data)
print(f"Mean: {mean}")

# Calculate median
median = np.median(data)
print(f"Median: {median}")

# Calculate mode
from scipy import stats
mode = stats.mode(data)
print(f"Mode: {mode.mode[0]}")

# Calculate variance
variance = np.var(data)
print(f"Variance: {variance}")

# Calculate standard deviation


std_dev = np.std(data)
print(f"Standard Deviation: {std_dev}")

Output:

Mean: 5.0
Median: 5.0
Mode: 5
Variance: 6.0
Standard Deviation: 2.449489742783178

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Aim: 2. Apply the following Pre-processing techniques for a given dataset.
a. Attribute selection
b. Handling Missing Values
c. Discretization
d. Elimination of Outliers

Attribute selection
import pandas as pd
from sklearn.datasets import
load_iris
from
sklearn.feature_selection
import SelectKBest, f_classif
# Load dataset
data = load_iris()
X = pd.DataFrame(data.data,
columns=data.feature_name
s)
y = pd.Series(data.target)
# Select the top 2 features
selector =
SelectKBest(score_func=f_cla
ssif, k=2)
NBKRIST Department of CSE Prepared by C. Jyothsna
Assistant Professor
X_selected =
selector.fit_transform(X, y)
print("Selected Features:\n",
X.columns[selector.get_supp
ort()])
Handling Missing Values
import pandas as pd
import numpy as np
# Create a sample
DataFrame
data = {
'A': [1, 2, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
}
df = pd.DataFrame(data)
# Option 1: Drop rows with
missing values
df_dropped = df.dropna()

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
# Option 2: Fill missing
values with the mean
df_filled = df.fillna(df.mean())
print("DataFrame after
dropping missing values:\n",
df_dropped)
print("DataFrame after filling
missing values:\n", df_filled)
Discretization
import numpy as np
import pandas as pd
from sklearn.preprocessing
import KBinsDiscretizer
# Create a sample
DataFrame
data = {
'A': [1, 2, 6, 4, 5, 8, 7, 3]
}
df = pd.DataFrame(data)

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
# Discretize the data into 3
bins
discretizer =
KBinsDiscretizer(n_bins=3,
encode='ordinal',
strategy='uniform')
df['A_binned'] =
discretizer.fit_transform(df[['
A']])
print("Discretized
DataFrame:\n", df)
Elimination of Outliers
import pandas as pd
# Create a sample
DataFrame
data = {
'A': [1, 2, 3, 4, 5, 100] #
100 is an outlier
}
Source Code:

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
from google.colab import files
uploaded = files.upload()
# from google.colab import drive
#drive.mount(‘contents/Sales.csv’)

import pandas as pd
import numpy as np
df = pd.read_csv('Sales.csv')

# a. Attribute Selection (Assuming 'value' is the only attribute)


# In this simple example, we only have one attribute, so selection isn't necessary.
# For multiple attributes, you'd use feature importance from models or correlation analysis

print(df[df['Product Category']=='Electronics'])
print(df[df['Customer Region']=='West'])

# b. Handling Missing Values (if any)


# Check for missing values
print("Missing Values:", df.isnull().sum())
# If there were missing values, you could replace them using imputation techniques like:
# df['value'].fillna(df['value'].mean(), inplace=True) # Mean imputation
# df['value'].fillna(df['value'].median(), inplace=True) # Median imputation
# or more advanced methods like k-NN imputation.

# c. Discretization
# Equal-width binning
num_bins = 3
df['Discretized_value'] = pd.cut(df['Sales Amount'], bins=num_bins, labels=False)
#labels=False provides integer labels
print(df)

# d. Elimination of Outliers (using z-score method)


from scipy import stats
z=np.abs(stats.zscore(df['Sales Amount']))
threshold=2
df_no_outliers=df[(z<threshold)]
print(f"\n Dataframe without outliers: \n {df_no_outliers}")

Output:

Order ID 0
Date 0
Product Category 0
Sales Amount 0
Quantity 0
Customer Region 0
dtype: int64

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Dataframe without outliers:
Order ID Date Product Category Sales Amount Quantity \
0 1 1/1/2024 Toys 490 5
1 2 2/1/2024 Apparel 105 2
2 3 3/1/2024 Electronics 238 1
3 4 4/1/2024 Apparel 943 3
4 5 5/1/2024 Apparel 741 4
5 6 6/1/2024 Apparel 655 4
6 7 7/1/2024 Apparel 937 4
7 8 8/1/2024 Electronics 749 4
8 9 9/1/2024 Apparel 785 3
9 10 10/1/2024 Toys 608 4

Aim: 3. Apply KNN algorithm for classification and regression.

Source Code (Classification):


# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from google.colab import files


uploaded = files.upload()

# from google.colab import drive


#drive.mount(‘contents/Sales.csv’)

# Importing the dataset


dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the K-NN model on the Training set


from sklearn.neighbors import KNeighborsClassifier

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test, y_pred)

print (cm)
print(ac)

Output:

[[55 3]
[ 1 21]]

0.95

Source Code (Regression):

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler,StandardScaler
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

from google.colab import files


uploaded = files.upload()

df = pd.read_csv('./diamonds.csv')
df
df.shape
df.drop('Unnamed: 0',inplace=True,axis=1)
df.info()
df.isna().sum()
df.isnull().sum()
df_new=pd.get_dummies(df,drop_first=True)
df_new
X=df_new.drop('price',axis=1)
Y=df_new['price']
train_x,test_x,train_y,test_y=train_test_split(X,Y,test_size=0.2,random_state=100)
train_y=train_y.to_numpy().reshape(-1,1)
test_y=test_y.to_numpy().reshape(-1,1)
scale_x = MinMaxScaler().fit(train_x)

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
scale_y = MinMaxScaler().fit(train_y)
train_x = scale_x.transform(train_x)
train_y = scale_y.transform(train_y)

tran_x = StandardScaler().fit(train_x)
tran_y = StandardScaler().fit(train_y)
train_x = tran_x.transform(train_x)
train_y = tran_y.transform(train_y)
test_x=scale_x.transform(test_x)
test_x=tran_x.transform(test_x)
test_y=scale_y.transform(test_y)
test_y=tran_y.transform(test_y)
para = {
'n_neighbors':[3,5,7,12],
'weights' : ['uniform', 'distance']
}
dia_reg=GridSearchCV(KNeighborsRegressor(),para,cv=10)
dia_reg.fit(train_x,train_y)
dia_reg.best_score_
dia_reg.best_params_
reg = KNeighborsRegressor(n_neighbors=5, weights='distance')
reg.fit(train_x,train_y)
pred=reg.predict(test_x)
test_y=tran_y.inverse_transform(test_y)
test_y=scale_y.inverse_transform(test_y)
pred=tran_y.inverse_transform(pred)
pred=scale_y.inverse_transform(pred)
r2_score(pred,test_y)
mean_absolute_error(pred,test_y)
mean_squared_error(pred,test_y)
np.sqrt(mean_squared_error(pred,test_y))

Output:

0.951350450423818
391.5401970196721
695982.7207807746
834.2557885809211

Aim: 4. Demonstrate decision tree algorithm for a classification problem


and perform parameter tuning for better results.

Source Code

import pandas as pd
from google.colab import files
uploaded = files.upload()

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
df = pd.read_csv("salaries.csv")
df.head()
inputs = df.drop('salary_more_then_100k',axis='columns')
target = df['salary_more_then_100k']
from sklearn.preprocessing import LabelEncoder
le_company = LabelEncoder()
le_job = LabelEncoder()
le_degree = LabelEncoder()
inputs['company_n'] = le_company.fit_transform(inputs['company'])
inputs['job_n'] = le_job.fit_transform(inputs['job'])
inputs['degree_n'] = le_degree.fit_transform(inputs['degree'])
inputs
inputs_n = inputs.drop(['company','job','degree'],axis='columns')
inputs_n
target
from sklearn import tree
model = tree.DecisionTreeClassifier()
model.fit(inputs_n, target)
model.score(inputs_n,target)
# Is salary of Google, Computer Engineer, Bachelors degree > 100 k ?
model.predict([[2,1,0]])
#Is salary of Google, Computer Engineer, Masters degree > 100 k ?
model.predict([[2,1,1]])

Output:

1.0
array([0], dtype=int64)
array([1], dtype=int64)

Aim-5: Demonstrate decision tree algorithm for a regression problem

Source Code:

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
from sklearn.metrics import r2_score,mean_squared_error
from google.colab import files
uploaded = files.upload()
df = pd.read_csv('DT-Regression-Data.csv')
df.head()
df.shape
sns.scatterplot(x=df.x, y=df.y, data=df)
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)
DecisionTreeRegModel = DecisionTreeRegressor()
DecisionTreeRegModel.fit(x_train,y_train)
y_pred = DecisionTreeRegModel.predict(x_test)
r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mse
rmse = np.sqrt(mse)
rmse

Output
0.7875383967595564
7.425010498369913
2.724887245074539

Aim-6: Apply Random Forest algorithm for classification and regression

Source Code: (Classification)

from sklearn import datasets


from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
iris = datasets.load_iris()

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
print(iris.feature_names)
print(iris.target_names)
iris.data
iris.target
X = iris.data
Y = iris.target
X.shape
Y.shape
clf = RandomForestClassifier()
clf.fit(X, Y)
print(clf.feature_importances_)
X[0]
print(clf.predict([[5.1, 3.5, 1.4, 0.2]]))
print(clf.predict(X[[0]]))
print(clf.predict_proba(X[[0]]))
clf.fit(iris.data, iris.target_names[iris.target])
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
X_train.shape, Y_train.shape
X_test.shape, Y_test.shape
clf.fit(X_train, Y_train)
print(clf.predict([[5.1, 3.5, 1.4, 0.2]]))
print(clf.predict_proba([[5.1, 3.5, 1.4, 0.2]]))
print(clf.predict(X_test))
print(Y_test)
print(clf.score(X_test, Y_test))

Output:
[2 1 0 1 1 2 1 0 1 0 2 1 1 1 1 1 1 2 2 0 0 2 0 0 0 1 1 1 1 0]
[2 1 0 1 1 2 1 0 1 0 2 1 2 1 1 2 2 2 2 0 0 2 0 0 0 1 1 1 1 0]
0.9

Source Code: (Regression)

# Importing the libraries


import numpy as np # for array operations
import pandas as pd # for working with DataFrames
import requests, io # for HTTP requests and I/O commands
import matplotlib.pyplot as plt # for data visualization
%matplotlib inline

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
# scikit-learn modules
from sklearn.model_selection import train_test_split # for splitting the data
from sklearn.metrics import mean_squared_error # for calculating the cost function
from sklearn.ensemble import RandomForestRegressor # for building the model

# Importing the dataset from the url of the data set


url = "https://drive.google.com/u/0/uc?
id=1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_&export=download"
data = requests.get(url).content

# Reading the data


dataset = pd.read_csv(io.StringIO(data.decode('utf-8')))
dataset.head()

x = dataset.drop('Petrol_Consumption', axis = 1) # Features


y = dataset['Petrol_Consumption'] # Target

# Splitting the dataset into training and testing set (80/20)


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 28)

# Initializing the Random Forest Regression model with 10 decision trees


model = RandomForestRegressor(n_estimators = 10, random_state = 0)

# Fitting the Random Forest Regression model to the data


model.fit(x_train, y_train)

# Predicting the target values of the test set


y_pred = model.predict(x_test)

# RMSE (Root Mean Square Error)


rmse = float(format(np.sqrt(mean_squared_error(y_test, y_pred)),'.3f'))
print("\nRMSE:\n",rmse)

Output:

RMSE:
96.389

Aim: 7. Demonstrate Naïve Bayes Classification algorithm.

Source Code:

import pandas as pd
from google.colab import files
uploaded = files.upload()
df = pd.read_csv("titanic.csv")

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
df.head()
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis='columns',inpla
ce=True)
df.head()
inputs = df.drop('Survived',axis='columns')
target = df.Survived
dummies = pd.get_dummies(inputs.Sex)
dummies.head(3)
inputs = pd.concat([inputs,dummies],axis='columns')
inputs.head(3)
inputs.drop(['Sex','male'],axis='columns',inplace=True)
inputs.head(3)
inputs.columns[inputs.isna().any()]
inputs.Age[:10]
inputs.Age = inputs.Age.fillna(inputs.Age.mean())
inputs.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs,target,test_size=0.3)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train,y_train)
model.score(X_test,y_test)
X_test[0:10]
y_test[0:10]
model.predict(X_test[0:10])
model.predict_proba(X_test[:10])
from sklearn.model_selection import cross_val_score
cross_val_score(GaussianNB(),X_train, y_train, cv=5)
Output:

array([0.8 , 0.744 , 0.8 , 0.72580645, 0.80645161])

Aim-8: Apply Support Vector algorithm for classification

Source Code:

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
iris.feature_names
iris.target_names

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
df = pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()
df['target'] = iris.target
df.head()
df[df.target==1].head()
df[df.target==2].head()
df['flower_name'] =df.target.apply(lambda x: iris.target_names[x])
df.head()
df[45:55]
df0 = df[:50]
df1 = df[50:100]
df2 = df[100:]
import matplotlib.pyplot as plt
%matplotlib inline
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.scatter(df0['sepal length (cm)'], df0['sepal width (cm)'],color="green",marker='+')
plt.scatter(df1['sepal length (cm)'], df1['sepal width (cm)'],color="blue",marker='.')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.scatter(df0['petal length (cm)'], df0['petal width (cm)'],color="green",marker='+')
plt.scatter(df1['petal length (cm)'], df1['petal width (cm)'],color="blue",marker='.')
from sklearn.model_selection import train_test_split
X = df.drop(['target','flower_name'], axis='columns')
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
len(X_train)
len(X_test)
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
model.score(X_test, y_test)
model.predict([[4.8,3.0,1.5,0.3]])

Output:
0.93333333333333335
array([0])

Aim-9: Demonstrate simple linear regression algorithm for a regression


problem

Source Code:

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
from google.colab import files

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
uploaded = files.upload()
df = pd.read_csv('homeprices.csv')
df
%matplotlib inline
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='+')
new_df = df.drop('price',axis='columns')
new_df
price = df.price
price
# Create linear regression object
reg = linear_model.LinearRegression()
reg.fit(new_df,price)
reg.predict([[3300]])
reg.coef_
reg.intercept_
3300*135.78767123 + 180616.43835616432
reg.predict([[5000]])
Output:
628715.7534151643
Array ([859554.79452055])

Aim-10: Apply Logistic regression algorithm for a classification problem

Source Code

import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
from google.colab import files
uploaded = files.upload()
df = pd.read_csv("insurance_data.csv")

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
df.head()
plt.scatter(df.age,df.bought_insurance,marker='+',color='red')
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =
train_test_split(df[['age']],df.bought_insurance,train_size=0.8)
X_test
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
X_test
y_predicted = model.predict(X_test)
model.predict_proba(X_test)
model.score(X_test,y_test)
y_predicted
X_test
model.coef_
model.intercept_
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def prediction_function(age):
z = 0.042 * age - 1.53 # 0.04150133 ~ 0.042 and -1.52726963 ~ -1.53
y = sigmoid(z)
return y
age = 35
prediction_function(age)
age = 43
prediction_function(age)

Output
0.4850044983805899
0.485 is less than 0.5 which means person with 35 age will not buy insurance

0.568565299077705
0.485 is more than 0.5 which means person with 43 will buy the insurance

Aim-11: Demonstrate Multi-layer Perceptron algorithm for a classification


problem

Source Code

# Kick off by importing libraries, and outlining the Iris dataset


import pandas as pd
import sklearn
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
from sklearn.metrics import classification_report, confusion_matrix

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# Let's start by naming the features
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Reading the dataset through a Pandas function
irisdata = pd.read_csv(url, names=names)
# Takes first 4 columns and assign them to variable "X"
X = irisdata.iloc[:, 0:4]
# Takes first 5th columns and assign them to variable "Y". Object dtype refers to strings.
y = irisdata.select_dtypes(include=[object])
X.head()
y.head()
# y actually contains all categories or classes:
y.Class.unique()
# Now transforming categorial into numerical values
le = preprocessing.LabelEncoder()
y = y.apply(le.fit_transform)
y.head()
# Now for train and test split (80% of dataset into training set and other 20% into test data)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
# Feature scaling
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
# Finally for the MLP- Multilayer Perceptron
mlp = MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)
mlp.fit(X_train, y_train.values.ravel())
predictions = mlp.predict(X_test)
print(predictions)
# Last thing: evaluation of algorithm performance in classifying flowers
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))

Output:
[0 0 2 2 0 2 2 2 2 2 1 1 1 2 2 1 1 2 1 2 2 1 0 0 2 2 1 2 2 1]

[[ 5 0 0]
[ 0 8 0]
[ 0 1 16]]
precision recall f1-score support

0 1.00 1.00 1.00 5


1 0.89 1.00 0.94 8
2 1.00 0.94 0.97 17

accuracy 0.97 30
macro avg 0.96 0.98 0.97 30

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
weighted avg 0.97 0.97 0.97 30

Aim-12: Implement the K-means algorithm and apply it to the data you
selected. Evaluate performance by measuring the sum of the Euclidean
distance of each example from its class centre. Test the performance of the
algorithm as a function of the parameters K.

Source Code

from sklearn.cluster import KMeans


import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline
from google.colab import files
uploaded = files.upload()
df = pd.read_csv("income.csv")
df.head()
plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted
df['cluster']=y_predicted
df.head()
km.cluster_centers_
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',label='
centroid')
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()
scaler = MinMaxScaler()

scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])

scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
df.head()
plt.scatter(df.Age,df['Income($)'])
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted
df['cluster']=y_predicted
df.head()
km.cluster_centers_
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',label='
centroid')
plt.legend()
sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_rng,sse)

Output:

Array ([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2])

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Aim-13: Demonstrate the use of Fuzzy C-Means Clustering

Source Code:

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
import pandas as pd # reading all required header files
import numpy as np
import random
import operator
import math
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data)
df.head()
#number of data
n = len(df)

#number of clusters
k=3

#dimension of cluster
d=4

# m parameter
m=2

#number of iterations
MAX_ITERS = 12
plt.figure(0,figsize=(5,5)) #scatter plot of sepal length vs sepal width
plt.scatter(list(df.iloc[:,0]), list(df.iloc[:,1]), marker='o')
plt.axis('equal')
plt.xlabel('Sepal Length', fontsize=16)
plt.ylabel('Sepal Width', fontsize=16)
plt.title('Sepal Plot', fontsize=25,color='b')
plt.grid()
plt.show()

plt.figure(1,figsize=(5,5)) #scatter plot of sepal length vs sepal width


plt.scatter(list(df.iloc[:,2]), list(df.iloc[:,3]), marker='o')
plt.axis('equal')
plt.xlabel('petal Length', fontsize=16)
plt.ylabel('petal Width', fontsize=16)
plt.title('Petal Plot', fontsize=25,color='b')
plt.grid()
plt.show()
def initializeMembershipWeights():
"""
membership_mat = []
for i in range(n):
wts = []
sum=0;
for j in range(k):
weight = np.random.random_integers(1,10)

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
wts.append(weight)
sum = sum + weight
weights = [w/sum for w in wts]
membership_mat.append(weights)
print(membership_mat)

"""
weight = np.random.dirichlet(np.ones(k),n)
weight_arr = np.array(weight)
return weight_arr
def computeCentroids(weight_arr):
C = []
for i in range(k):
weight_sum = np.power(weight_arr[:,i],m).sum()
Cj = []
for x in range(d):
numerator = ( df.iloc[:,x].values * np.power(weight_arr[:,i],m)).sum()
c_val = numerator/weight_sum;
Cj.append(c_val)
C.append(Cj)
return C
def updateWeights(weight_arr,C):
denom = np.zeros(n)
for i in range(k):
dist = (df.iloc[:,:].values - C[i])**2
dist = np.sum(dist, axis=1)
dist = np.sqrt(dist)
denom = denom + np.power(1/dist,1/(m-1))

for i in range(k):
dist = (df.iloc[:,:].values - C[i])**2
dist = np.sum(dist, axis=1)
dist = np.sqrt(dist)
weight_arr[:,i] = np.divide(np.power(1/dist,1/(m-1)),denom)
return weight_arr
def plotData(z,C):
plt.subplot(4,3,z+1) #scatter plot of sepal length vs sepal width
plt.scatter(list(df.iloc[:,2]), list(df.iloc[:,3]), marker='o')
for center in C:
plt.scatter(center[2],center[3], marker='o',color='r')
plt.axis('equal')
plt.xlabel('Sepal Length', fontsize=16)
plt.ylabel('Sepal Width', fontsize=16)
plt.grid()
def FuzzyMeansAlgorithm():
weight_arr = initializeMembershipWeights()
plt.figure(figsize=(50,50))
for z in range(MAX_ITERS):
C = computeCentroids(weight_arr)
updateWeights(weight_arr,C)

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
plotData(z,C)
plt.show()
return (weight_arr,C)
final_weights,Centers = FuzzyMeansAlgorithm()
df_sepal = df.iloc[:,0:2]
df_petal = df.iloc[:,2:5]
plt.figure(0,figsize=(5,5)) #scatter plot of sepal length vs sepal width
plt.scatter(list(df_sepal.iloc[:,0]), list(df_sepal.iloc[:,1]), marker='o')
plt.axis('equal')
plt.xlabel('Sepal Length', fontsize=16)
plt.ylabel('Sepal Width', fontsize=16)
plt.title('Sepal Plot', fontsize=25,color='b')
plt.grid()
for center in Centers:
plt.scatter(center[0],center[1], marker='o',color='r')
plt.show()

plt.figure(1,figsize=(5,5)) #scatter plot of sepal length vs sepal width


plt.scatter(list(df_petal.iloc[:,0]), list(df_petal.iloc[:,1]), marker='o')
plt.axis('equal')
plt.xlabel('petal Length', fontsize=16)
plt.ylabel('petal Width', fontsize=16)
plt.title('Petal Plot', fontsize=25,color='b')
plt.grid()
for center in Centers:
plt.scatter(center[2],center[3], marker='o',color='r')
plt.show()
X = np.zeros((n,1))
plt.figure(0,figsize=(8,8)) #scatter plot of sepal length vs sepal width
plt.axis('equal')
plt.xlabel('Sepal Length', fontsize=16)
plt.ylabel('Sepal Width', fontsize=16)
plt.title('Sepal Plot', fontsize=25,color='b')
plt.grid()
for center in Centers:
plt.scatter(center[0],center[1], marker='D',color='r')
clr = 'b'
for i in range(n):
cNumber = np.where(final_weights[i] == np.amax(final_weights[i]))
if cNumber[0][0]==0:
clr = 'y'
elif cNumber[0][0]==1:
clr = 'g'
elif cNumber[0][0]==2:
clr = 'm'
plt.scatter(list(df_sepal.iloc[i:i+1,0]), list(df_sepal.iloc[i:i+1,1]),
alpha=0.25,s=100,color=clr)
plt.show()
X = np.zeros((n,1))
plt.figure(0,figsize=(8,8)) #scatter plot of sepal length vs sepal width

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
plt.axis('equal')
plt.xlabel('Petal Length', fontsize=16)
plt.ylabel('Petal Width', fontsize=16)
plt.title('Petal Plot', fontsize=25,color='b')
plt.grid()
for center in Centers:
plt.scatter(center[2],center[3], marker='D',color='r')
clr = 'b'
for i in range(n):
cNumber = np.where(final_weights[i] == np.amax(final_weights[i]))
if cNumber[0][0]==0:
clr = 'y'
elif cNumber[0][0]==1:
clr = 'g'
elif cNumber[0][0]==2:
clr = 'm'
plt.scatter(list(df_petal.iloc[i:i+1,0]), list(df_petal.iloc[i:i+1,1]), alpha=0.25, s=100,
color=clr)
plt.show()

Output:

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
Aim-14: Demonstrate the use of Expectation Maximization based
clustering algorithm

Source Code:

import numpy as np # import numpy


from numpy.linalg import inv # for matrix inverse
import matplotlib.pyplot as plt # import matplotlib.pyplot for plotting framework
from scipy.stats import multivariate_normal # for generating pdf
m1 = [1,1] # consider a random mean and covariance value
m2 = [7,7]
cov1 = [[3, 2], [2, 3]]
cov2 = [[2, -1], [-1, 2]]
x = np.random.multivariate_normal(m1, cov1, size=(200,)) # Generating 200 samples for
each mean and covariance
y = np.random.multivariate_normal(m2, cov2, size=(200,))
d = np.concatenate((x, y), axis=0)
plt.figure(figsize=(10,10))
plt.scatter(d[:,0], d[:,1], marker='o')

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
plt.axis('equal')
plt.xlabel('X-Axis', fontsize=16)
plt.ylabel('Y-Axis', fontsize=16)
plt.title('Ground Truth', fontsize=22)
plt.grid()
plt.show()
m1 = random.choice(d)
m2 = random.choice(d)
cov1 = np.cov(np.transpose(d))
cov2 = np.cov(np.transpose(d))
pi = 0.5
x1 = np.linspace(-4,11,200)
x2 = np.linspace(-4,11,200)
X, Y = np.meshgrid(x1,x2)

Z1 = multivariate_normal(m1, cov1)
Z2 = multivariate_normal(m2, cov2)

pos = np.empty(X.shape + (2,)) # a new array of given shape and type, without initializing
entries
pos[:, :, 0] = X; pos[:, :, 1] = Y

plt.figure(figsize=(10,10)) # creating the figure and assigning the size


plt.scatter(d[:,0], d[:,1], marker='o')
plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5)
plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5)
plt.axis('equal') # making both the axis equal
plt.xlabel('X-Axis', fontsize=16) # X-Axis
plt.ylabel('Y-Axis', fontsize=16) # Y-Axis
plt.title('Initial State', fontsize=22) # Title of the plot
plt.grid() # displaying gridlines
plt.show()
##Expectation step
def Estep(lis1):
m1=lis1[0]
m2=lis1[1]
cov1=lis1[2]
cov2=lis1[3]
pi=lis1[4]

pt2 = multivariate_normal.pdf(d, mean=m2, cov=cov2)


pt1 = multivariate_normal.pdf(d, mean=m1, cov=cov1)
w1 = pi * pt2
w2 = (1-pi) * pt1
eval1 = w1/(w1+w2)

return(eval1)
## Maximization step
def Mstep(eval1):
num_mu1,din_mu1,num_mu2,din_mu2=0,0,0,0

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
for i in range(0,len(d)):
num_mu1 += (1-eval1[i]) * d[i]
din_mu1 += (1-eval1[i])

num_mu2 += eval1[i] * d[i]


din_mu2 += eval1[i]

mu1 = num_mu1/din_mu1
mu2 = num_mu2/din_mu2

num_s1,din_s1,num_s2,din_s2=0,0,0,0
for i in range(0,len(d)):

q1 = np.matrix(d[i]-mu1)
num_s1 += (1-eval1[i]) * np.dot(q1.T, q1)
din_s1 += (1-eval1[i])

q2 = np.matrix(d[i]-mu2)
num_s2 += eval1[i] * np.dot(q2.T, q2)
din_s2 += eval1[i]

s1 = num_s1/din_s1
s2 = num_s2/din_s2

pi = sum(eval1)/len(d)

lis2=[mu1,mu2,s1,s2,pi]
return(lis2)
def plot(lis1):
mu1=lis1[0]
mu2=lis1[1]
s1=lis1[2]
s2=lis1[3]
Z1 = multivariate_normal(mu1, s1)
Z2 = multivariate_normal(mu2, s2)

pos = np.empty(X.shape + (2,)) # a new array of given shape and type, without initializing
entries
pos[:, :, 0] = X; pos[:, :, 1] = Y

plt.figure(figsize=(10,10)) # creating the figure and assigning the size


plt.scatter(d[:,0], d[:,1], marker='o')
plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5)
plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5)
plt.axis('equal') # making both the axis equal
plt.xlabel('X-Axis', fontsize=16) # X-Axis
plt.ylabel('Y-Axis', fontsize=16) # Y-Axis
plt.grid() # displaying gridlines
plt.show()

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
iterations = 20
lis1=[m1,m2,cov1,cov2,pi]
for i in range(0,iterations):
lis2 = Mstep(Estep(lis1))
lis1=lis2
if(i==0 or i == 4 or i == 9 or i == 14 or i == 19):
plot(lis1)

Output:

NBKRIST Department of CSE Prepared by C. Jyothsna


Assistant Professor
NBKRIST Department of CSE Prepared by C. Jyothsna
Assistant Professor
NBKRIST Department of CSE Prepared by C. Jyothsna
Assistant Professor
NBKRIST Department of CSE Prepared by C. Jyothsna
Assistant Professor
NBKRIST Department of CSE Prepared by C. Jyothsna
Assistant Professor

You might also like