0% found this document useful (0 votes)
46 views

Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)

This document contains code to implement LDA (Linear Discriminant Analysis) on the iris data set to classify iris flowers into three categories based on their sepal and petal attributes. The code loads the iris data, calculates within-class and between-class scatter matrices, computes eigenvectors and eigenvalues, projects the data onto the first two LDA dimensions for visualization, and compares LDA's performance to PCA and a decision tree classifier.

Uploaded by

Raheel Aslam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)

This document contains code to implement LDA (Linear Discriminant Analysis) on the iris data set to classify iris flowers into three categories based on their sepal and petal attributes. The code loads the iris data, calculates within-class and between-class scatter matrices, computes eigenvectors and eigenvalues, projects the data onto the first two LDA dimensions for visualization, and compares LDA's performance to PCA and a decision tree classifier.

Uploaded by

Raheel Aslam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

LAB#08 Raheel Aslam (74-FET/BSEE/F16)

Machine Learning Lab


Code of LDA algorithm using iris Data Set:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
np.set_printoptions(precision=4)
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Categorical.from_codes(iris.target, iris.target_names)
X.shape
X.head()
iris.target_names
df = X.join(pd.Series(y, name='class'))
class_feature_means = pd.DataFrame(columns=iris.target_names)
for c, rows in df.groupby('class'):
class_feature_means[c] = rows.mean()
class_feature_means
within_class_scatter_matrix = np.zeros((4,4))
for c, rows in df.groupby('class'):
rows = rows.drop(['class'], axis=1)
s = np.zeros((4,4))
for index, row in rows.iterrows():
x, mc = row.values.reshape(4,1), class_feature_means[c].values.reshape(4,1)
s += (x - mc).dot((x - mc).T)
within_class_scatter_matrix += s
feature_means = df.mean()
between_class_scatter_matrix = np.zeros((4,4))
for c in class_feature_means:
n = len(df.loc[df['class'] == c].index)
mc, m = class_feature_means[c].values.reshape(4,1),
feature_means.values.reshape(4,1)
between_class_scatter_matrix += n * (mc - m).dot((mc - m).T)
eigen_values, eigen_vectors =
np.linalg.eig(np.linalg.inv(within_class_scatter_matrix).dot(between_class_scatter_ma
trix))
pairs = [(np.abs(eigen_values[i]), eigen_vectors[:,i]) for i in
range(len(eigen_values))]
pairs = sorted(pairs, key=lambda x:x[0], reverse=True)
for pair in pairs:
print(pair[0])
eigen_value_sums = sum(eigen_values)
print('Explained Variance')

for i,pair in enumerate(pairs):


print('Eigenvector {}: {}'.format(i, (pair[0]/eigen_value_sums).real))
w_matrix = np.hstack((pairs[0][1].reshape(4,1), pairs[1][1].reshape(4,1))).real
X_lda = np.array(X.dot(w_matrix))
le = LabelEncoder()
y = le.fit_transform(df['class'])
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.scatter( X_lda[:,0], X_lda[:,1], c=y, cmap='rainbow', alpha=0.7, edgecolors='b' )
plt.show()
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
X_lda = lda.fit_transform(X, y)
lda.explained_variance_ratio_
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.scatter( X_lda[:,0], X_lda[:,1], c=y, cmap='rainbow', alpha=0.7, edgecolors='b' )
plt.show()
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X, y)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.scatter( X_pca[:,0], X_pca[:,1], c=y, cmap='rainbow', alpha=0.7, edgecolors='b' )
plt.show()
X_train, X_test, y_train, y_test = train_test_split(X_lda, y, random_state=1)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
confusion_matrix(y_test, y_pred)

Output:
3.3340137930233347
0.027034739042874168
3.379438918053471e-16
3.379438918053471e-16
Explained Variance
Eigenvector 0: 0.9919564568065745
Eigenvector 1: 0.008043543193425574
Eigenvector 2: 1.0054716216715731e-16
Eigenvector 3: 1.0054716216715731e-16

You might also like