0% found this document useful (0 votes)
107 views

KR&AI-ML-DM Practical Journal ANS

This laboratory journal document contains details of various machine learning and deep learning experiments conducted by a student named Shradha Ulhas Badhe with roll number 011. The journal includes 16 experiments conducted between November 2022 to December 2022 on topics like linear regression, logistic regression, KNN, SVM, clustering, neural networks etc. using Python programming and various datasets. The performance of each model is evaluated and compared.

Uploaded by

mst ahe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

KR&AI-ML-DM Practical Journal ANS

This laboratory journal document contains details of various machine learning and deep learning experiments conducted by a student named Shradha Ulhas Badhe with roll number 011. The journal includes 16 experiments conducted between November 2022 to December 2022 on topics like linear regression, logistic regression, KNN, SVM, clustering, neural networks etc. using Python programming and various datasets. The performance of each model is evaluated and compared.

Uploaded by

mst ahe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

IT31L– KR&AI, ML, DL

Laboratory Journal

MCA - Semester-III

Name: Shradha Ulhas Badhe.

Roll No.:011

Academic Year
2022-2023
-INDEX-

Sr. Pg.
Date Lab Title
No No.

1 28-Nov-22 Write a Python program to Find the correlation matrix

Plot the correlation plot on dataset and visualize giving an overview of


2 29-Nov-22
relationships among data on iris data

Implementing the ANOVA testing on iris dataset. Using only one


3 2-Dec-22 independent variable i.e. Species (iris-setosa, iris- versicolor, iris-
virginica) which are categorical and sepal width as a continuous
variable.

Write a Python program to predict mpg (miles per gallon) for a car based on
variable wt by applying simple linear regression on 'mtcars' dataset (Use
Training data 80% and Testing Data 20%).
4 08-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value.
Change Training data to 70% and Testing Data 30%, compare & interpret
the performance of your model.

Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset
(Use Training data 80% and Testing Data 20%).
5 16-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value.
Remove variable disp from the feature set and check the performance again.
Compare & interpret the performance of your
model.

Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset
(Use Training data 80% and Testing Data 20%).
6 23-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value .
Replace disp by drat variable in the feature set and check the
performance again. Interpret the performance of your model.

Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying logistic regression on 'apples_and_oranges' dataset (Use
Training data 80% and Testing Data 20%).
7 30-Sep-22 Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for the
model and interpret the model performance.

Write a Python program to predict fruit (Apple or Orange) based on its size
8 07-Oct-22 & weight by applying K-Nearest Neighbour (KNN)
model on 'apples_and_oranges' dataset (Use Training data 80% and Testing
Data 20%).
Evaluate the performance of the model using Accuracy Score metric,
Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.

Implementing the K-mean Algorithm on unsupervised data of a mall, that


9 5-Dec-22 contains the basic information (ID, age, gender, income, spending score)
about the customers. Finding the clusters basedon the income and spending.
Implementing the Agglomerative Hierarchical Clustering Algorithm on
10 6-Dec-22 unsupervised data of a mall, that contains the basic information (ID, age,
gender, income, spending score) about the
customers. Finding the clusters based on the income and spending.

11 5-Dec-22 Write a Python program to create an Association algorithm for supervised


classification on any dataset

Write a Python program to predict fruit (Apple or Orange) based on its size
& weight by applying Support Vector Machine (SVM) model on
12 14-Oct-22 'apples_and_oranges' dataset (Use Training data 80% and Testing Data
20%).
Evaluate the performance of the model using Accuracy Score metric,
Classification Report & Confusion Matrix, AUC ROC score for the model and
interpret the model performance.

Write a Python program to predict species (Setosa, Versicolor, or Viriginica)


for a new iris flower based on length & width of its petals and sepals by
applying Logistic Regression, KNN and SVM model on 'iris' dataset (Use
13 21-Oct-22 Training data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for each
model and suggest the best model.

Write a Python program to predict species (Setosa, Versicolor, or Viriginica)


for a new iris flower based on length & width of its petals and sepals by
14 04-Nov-22 applying Naive Bays Classification model on 'iris' dataset (Use Training data
80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric,
Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict species (Setosa, Versicolor, or Viriginica)
for a new iris flower based on length & width of its petals and sepals by
15 11-Nov-22 applying Decision Tree model on 'iris' dataset (Use Training data 80% and
Testing Data 20%).
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for the model
and interpret the model performance.
Write a Python program to predict whether or not a patient has diabetes
based on given diagnostic measurements included in the dataset Pima
Indians Diabetes Database "Diabetes.csv" by applying Logistic Regression,
16 18-Nov-22 KNN, SVM, Naive Bays, Decision Tree models. (Use Training data 80% and
Testing Data 20%).
Evaluate & Compare the performance of all the models using Accuracy Score
metric, Classification Report & Confusion Matrix,
AUC ROC score for each model and suggest the best model for Diabetes
prediction.
Python Program to implement Text Mining Basics:
5-Dec-22 i. Tokenization
17 ii. Finding frequency distinct
iii. Removing punctuations
iv. Stemming
v.
Program to implement Text Mining: Sentimental Analysis,
18 5-Dec-22 using RNN LSTM learning model on DataSet of tweets on an airline
management.

19 30-Nov-22 Implementing python visualizations on cluster data.


20 7-Dec-22 Creating & visualizing a simple ANN problem to understand the
implementation of an artificial neuron using python

Program to pre-process data of Australian weather and implementing an


21 7-Dec-22 Artificial Neural Network to predict the whether

Write a Python program to prepare data, to be given to a convolutional


22 9-Dec-22 neural network CNN and create an Image Classifier. Use the cat and dog
training and test dataset.

12-Dec-22 Write a Python program to implement RNN by building a


23 character level prediction RNN and train in on the text of “Harry Potter and
the Philosopher’s Stone”.

Write a Python program to implement GAN, to create a curve resembling a


24 12-Dec-22 sine wave. Python library pytorch must be used to set a random generator.

Signature of Faculty Incharge

Internal Examiner :

External Examiner :

Date :
Q1.) Write a Python program to Find the correlation matrix.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
x=7,8,9,11,15,20
y=130,135,140,142,147,156
corr,_=pearsonr(x,y)
print('Pearsons Correlation: %.3f'%corr)
plt.scatter(x,y)
plt.plot(np.unique (x), np.poly1d (np.polyfit(x,y,1))(np.unique(x)), col
or ='blue')

Pearsons Correlation: 0.971


[<matplotlib.lines.Line2D at 0x7f37cd8b9c10>]

OUTPUT:
Q.2) Plot the correlation plot on dataset and visualize giving an overview of
relationships among data on iris data

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f_oneway

performance1=[89,89,88,78,79]
performance2=[93,92,94,89,88]
performance3=[89,88,89,93,90]
performance4=[81,78,81,92,82]
#Conduct the one-way ANOVA
print(f_oneway(performance1,performance2,performance3,performance4))

F_onewayResult(statistic=4.625000000000002,
pvalue=0.016336459839780215)
Q.3) Implementing the ANOVA testing on iris dataset. Using only one independent
variable i.e. Species (iris-setosa, iris-versicolor, iris-virginica) which are categorical
and sepal width as a continuous variable.

# importing the necessary libraries
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import SelectKBest
from scipy.stats import shapiro
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.sandbox.stats.multicomp import TukeyHSDResults
from statsmodels.graphics.factorplots import interaction_plot
from pandas.plotting import scatter_matrix

# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Iris.csv')
df.head()

dataframe_iris=pd.DataFrame(df,columns=['SepalLengthCm','SepalWidthCm','PetalLengthCm','Pe
talWidthCm'])

# Visualising the dataframe by plotting
scatter_matrix(dataframe_iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm','PetalWidthCm
']],figsize=(15,10))  
plt.show()

ID=[]
for i in range(0,150):
    ID.append(i)
dataframe=pd.DataFrame(ID,columns=['ID'])
dataframe_iris_new=pd.concat([dataframe_iris,dataframe_iris,dataframe],axis=1)
dataframe_iris_new.columns

##fig = interaction_plot(dataframe_iris_new.SepalWidthCm,dataframe_iris_new.target,dataframe_i
ris_new.ID,colors=['red','blue','green'], ms=12)

dataframe_iris_new.info()

dataframe_iris_new.describe()

# To implement Anova test we have to create null hypothesis and alternate hypothesis
# Null hypothesis=sample means are equal
# Alternate hypothesis=sample means are not equal

##print(dataframe_iris_new['SepalWidthCm'].groupby(dataframe_iris_new['target']).mean())

dataframe_iris_new.mean()

# Anova calculate f-value and p-value. 
# P-value:-p-value is used to evaluate hypothesis results.
# If p-value<0.05 we have to reject. And p-value>0.05 we have to accept null hypothesis.
# F-value:-f-value is the ratio of variance between groups and variance within groups. 
# If f-value is close to 1 then we say that our null hypothesis is true 
# check whether variance between groups are equal Anova use levene/barlett test.
# Check normal distribution of data(shapiro-wilk test) 

##stats.shapiro(dataframe_iris_new['SepalWidthCm'][dataframe_iris_new['target']])

# Check equality of variance between groups(levene/bartlett test)
##p_value=stats.levene(dataframe_iris_new['SepalWidthCm'],dataframe_iris_new['target'])
##p_value

##F_value,P_value=stats.f_oneway(dataframe_iris_new['SepalWidthCm'],dataframe_iris_new['tar
get'])
##print("F_value=",F_value,",","P_value=",P_value)

OUTPUT:
Q.4) Write a Python program to predict mpg (miles per gallon) for a car based on
variable wt by applying simple linear regression on 'mtcars' dataset (Use Training
data 80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Change Training data to 70% and Testing Data 30%, compare & interpret the
performance of your model.

from google.colab import drive.
drive.mount("/content/drive",force_remount=True)

from google.colab import drive
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(5))
x =df.iloc[:,[6]].values
y =df.iloc[:,1].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)

from sklearn.linear_model import LinearRegression
LinRegressor=LinearRegression()
LinRegressor.fit(x_train,y_train)
print(x,y)

y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
y_pred1=LinRegressor.predict([[wt]])
print("The Predicted MPG Value is: ",y_pred1)
from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))
print('Mean Squared Error: %.2f'% mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %.2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %.2f" % r2_score(y_test, y_pred))
Q.5) Write a Python program to predict mpg (miles per gallon) for a car based on variables
wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset (Use Training data
80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Remove variable disp from the feature set and check the performance again. Compare &
interpret the performance of your
model.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,3,6]].values
y =df.iloc[:,1].values
print(x,y)

import pandas as pd
import numpy as np
df=pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(5))
x=df.iloc[:,[2,6]].values #Remove variable disp from the feature set and che
ck the performance again.
y=df.iloc[:,1].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LinearRegression
LinRegressor=LinearRegression()
LinRegressor.fit(x_train,y_train)
print(x,y)

y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
cyl=input("Enter Cyl of a Car: ")
disp=input("Enter Disp of a Car: ")
y_pred1=LinRegressor.predict([[wt,cyl,disp]])
print("The Predicted MPG Value is: ",y_pred1)

from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %2f' % mean_absolute_error(y_test,y_pred))
print("Mean Squared Error: %2f" % mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %2f" % r2_score(y_test,y_pred))
Q.6) Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset (Use
Training data 80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value .
Replace disp by drat variable in the feature set and check the performance again.
Interpret the performance of your model.

from google .colab import drive
drive.mount("/content/drive",force_remount=True)

import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,3,6]].values
y =df.iloc[:,1].values
print(x,y)

import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,5,6]].values #Replace disp by drat variable
y =df.iloc[:,1].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LinearRegression
LinearRegressor=LinearRegression()
LinearRegressor.fit(x_train,y_train)
print(x,y)

y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
cyl=input("Enter Cyl of a Car: ")
disp=input("Enter Disp of a Car: ")
y_pred1=LinRegressor.predict([[wt,cyl,disp]])
print("The Predicted MPG Value is: ",y_pred1)

from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %2f' % mean_absolute_error(y_test,y_pred))
print("Mean Squared Error: %2f" % mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %2f" % r2_score(y_test,y_pred))
Q.7) Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying logistic regression on 'apples_and_oranges' dataset (Use Training
data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for the model and
interpret the model performance.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(10))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)

from sklearn.linear_model import LogisticRegression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.8) Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying K-Nearest Neighbour (KNN) model on 'apples_and_oranges'
dataset (Use Training data 80% and Testing Data 20%). Evaluate the performance
of the model using Accuracy Score metric, Classification Report & Confusion Matrix,
AUC ROC score for the model and interpret the model performance.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(10))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=9)

from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.9) Implementing the K-mean Algorithm on unsupervised data of a mall, that
contains the basic information (ID, age, gender, income, spending score) about the
customers. Finding the clusters based on the income and spending.

# importing the necessary libraries
from sklearn.preprocessing import StandardScaler
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
import os
import warnings
warnings.filterwarnings('ignore')

# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Mall_Customers.csv')
df.head()

# renaming the heads
df.rename(index=str, columns={'Annual Income (k$)': 'Income','Spending Score (1-100)': 'Score'}, 
inplace=True)

# data in a detailed way with pairplot
X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()

from sklearn.cluster import KMeans

clusters = []

for i in range(1, 11):
    km = KMeans(n_clusters=i).fit(X)
clusters.append(km.inertia_)
    
fig, ax = plt.subplots(figsize=(12, 8))
sns.lineplot(x=list(range(1, 11)), y=clusters, ax=ax)
ax.set_title('Searching for Elbow')
ax.set_xlabel('Clusters')
ax.set_ylabel('Inertia')

# Annotate arrow
ax.annotate('Possible Elbow Point', xy=(3, 140000), xytext=(3, 50000), xycoords='data',          
             arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))

ax.annotate('Possible Elbow Point', xy=(5, 80000), xytext=(5, 150000), xycoords='data',          
             arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))

plt.show()

# based on the elbow points can have 3 or 5 clusters, creating 5 cluster to classify based on income 
and spending 

km5 = KMeans(n_clusters=5).fit(X)
X['Labels'] = km5.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'], palette=sns.color_palette('hls', 5))
plt.title('KMeans with 5 Clusters')
plt.show()
Q.10) Implementing the Agglomerative Hierarchical Clustering Algorithm on
unsupervised data of a mall, that contains the basic information (ID, age, gender,
income, spending score) about thecustomers. Finding the clusters based on the income
and spending.

# importing the necessary libraries
from sklearn.preprocessing import StandardScaler

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 

import os
import warnings

warnings.filterwarnings('ignore')

# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Mall_Customers.csv')
df.head()

# renaming the heads
df.rename(index=str, columns={'Annual Income (k$)': 'Income',
                              'Spending Score (1-100)': 'Score'}, inplace=True)

# data in a detailed way with pairplot
X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()

from sklearn.cluster import AgglomerativeClustering 

agglom = AgglomerativeClustering(n_clusters=5, linkage='average').fit(X)

X['Labels'] = agglom.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'], palette=sns.color_palette('hls', 5))
plt.title('Agglomerative with 5 Clusters')
plt.show()
Q.11) Write a Python program to create an Association algorithm for supervised
classification on any dataset
Q.12) Write a Python program to predict fruit (Apple or Orange) based on its size & weight
by applying Support Vector Machine (SVM) model on 'apples_and_oranges' dataset (Use
Training data 80% and Testing Data 20%). Evaluate the performance of the model using
Accuracy Score metric, Classification Report & Confusion Matrix, AUC ROC score for the
model and interpret the model performance.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(5))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=5)
print(x,y)

from sklearn.svm import SVC
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)

y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(y_test)
print(pred_prob)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score matrics=",accuracy_score(y_test,y_pred))
print("Classifiction matrics=",classification_report(y_test,y_pred))
print("Confusion matrics=",confusion_matrix(y_test,y_pred))
print("Roc_auc_score",roc_auc_score(y_test,pred_prob[:,1]))
Q.13) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Logistic Regression,
KNN and SVM model on 'iris' dataset (Use Training data 80% and Testing Data 20%).Evaluate
the performance of the model using Accuracy Score metric, Classification Report & Confusion
Matrix, AUC ROC score for each model and suggest the best model.

13.1-Logistic Regression:

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.linear_model import LogisticRegression #Logistic Regression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
13.2-K-Nearest Neighbour(KNN):

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.neighbors import KNeighborsClassifier #K-Nearest Neighbour(KNN)
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
print(x,y)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))

13.3-Support Vector Machine(SVM):

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.svm import SVC #Support Vector Machine(SVM)
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)

y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(x,y)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
Q.14) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Naive Bays
Classification model on 'iris' dataset (Use Training data 80% and Testing Data 20%). Evaluate
the performance of the model using Accuracy Score metric, Classification Report & Confusion
Matrix, AUC ROC score for the model and interpret the model performance.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x=df.iloc[:,1:2].values
y=df.iloc[:,5].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
print(x,y)

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
print(x,y)

y_pred = classifier.predict(x_test) 
y_pred
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score 
print ("Accuracy : ", accuracy_score(y_test, y_pred))

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
Q.15) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Decision Tree model
on 'iris' dataset (Use Training data 80% and Testing Data 20%).Evaluate the performance
of the model using Accuracy Score metric, Classification Report & Confusion Matrix, AUC
ROC score for the model and interpret the model performance.

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(5))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.tree import DecisionTreeClassifier 
dt= DecisionTreeClassifier()
print(x_train,y_train)
dt.fit(x_train,y_train)

y_pred=dt.predict(x_test)
pred_prob=dt.predict_proba(x_test)
print(x,y)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
Q.16) Write a Python program to predict whether or not a patient has diabetes based on given
diagnostic measurements included in the dataset Pima Indians Diabetes Database
"Diabetes.csv" by applying Logistic Regression, KNN, SVM, Naive Bays, Decision Tree models.
(Use Training data 80% and Testing Data 20%).Evaluate & Compare the performance of all
the models using Accuracy Score metric, Classification Report & Confusion Matrix, AUC ROC
score for each model and suggest the best model for Diabetes prediction.

16.1-Logistic Regression:

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df)
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)

from sklearn.linear_model import LogisticRegression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))

16.2-KNN:

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(10))
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=9)

from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)

y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))

16.3-SVM:

from google.colab import drive
drive.mount("/content/drive",force_remount=True

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv ")
print(df.head(5))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=5)
print(x,y)
from sklearn.svm import SVC
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)

y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(y_test)
print(pred_prob)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score matrics=",accuracy_score(y_test,y_pred))
print("Classifiction matrics=",classification_report(y_test,y_pred))
print("Confusion matrics=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))

16.4-Naive Bays:

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(10))
x=df.iloc[:,0:7].values
y=df.iloc[:,8].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
print(x,y)

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
print(x,y)

y_pred = classifier.predict(x_test) 
y_pred

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score 
print ("Accuracy : ", accuracy_score(y_test, y_pred))

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
16.5-Decision Tree:

from google.colab import drive
drive.mount("/content/drive",force_remount=True)

from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(5))
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)

from sklearn.tree import DecisionTreeClassifier 
dt= DecisionTreeClassifier()
print(x_train,y_train)
dt.fit(x_train,y_train)
y_pred=dt.predict(x_test)
pred_prob=dt.predict_proba(x_test)
print(x,y)

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.17) Python Program to implement Text Mining Basics:
i. Tokenization
ii. Finding frequency distinct
iii. Removing punctuations
iv. Stemming

# Tokenization
# Importing necessary library
import pandas as pd
import numpy as np
import nltk
import os
import nltk.corpus
# sample text for performing tokenization
text = “We are learning text mining basics with python. python will help in implementing different
algorithms"
# importing word_tokenize from nltk
from nltk.tokenize import word_tokenize
# Passing the string text into word tokenize for breaking the sentences
token = word_tokenize(text)
token

Output:

['We','are','learning','text', 'mining','basics', 'with', 'python', '.',


'python', 'will', 'help', 'in', 'implementing', 'different', 'algorithms']

Program :Finding frequency distinct in the text

# finding the frequency distinct in the tokens


# Importing FreqDist library from nltk and passing token into FreqDist
from nltk.probability import FreqDist
fdist = FreqDist(token)
fdist
# To find the frequency of top 10 words
fdist1 = fdist.most_common(10)
fdist1

Output:

FreqDist({'python': 2, 'We': 1, 'are': 1, 'learning': 1, 'text': 1, 'mining':


1, 'basics': 1, 'with': 1, 'will': 1, 'help': 1, 'in': 1, 'implementing': 1,
'different': 1, 'algorithms': 1,})
[('python', 2),
('We', 1),
('are', 1),
('learning', 1),
('text', 1),
('mining', 1),
('basics', 1),
('with', 1),
('will', 1),
('help', 1),
('in', 1),
('implementing', 1),
('different', 1),
('algorithms', 1),]
# remove punctuation
import string text = "Thank you! For learning. Just adding, a few notes, diagrams and ppts."
punct = set(string.punctuation)
text = "".join([ch for ch in tweet if ch not in punct])
print(text)

Output:

Thank you For learning Just adding a few notes diagrams and ppts

# program for the example of stemming

import nltk
from nltk.stem.porter import PorterStemmer
words = ["walk", "walking", "walked", "walks", "ran", "run", "running", "runs"]
stemmer = PorterStemmer()

for word in words:


print(word + " ---> " + stemmer.stem(word))

Output:

walk ---> walk


walking ---> walk
walked ---> walk
walks ---> walk
ran ---> ran
run ---> run
running ---> run
runs ---> run
Q.18) Program to implement Text Mining: Sentimental Analysis, using RNN LSTM
learning model on DataSet of tweets on an airline management.

# Sentimental analysis using RNN


# Setting up the data for model creation

import pandas as pd

df = pd.read_excel(r “D: \ KR&AI \ Lab \ DataSet\Tweets.xlsx")

# Check the column names


df.columns

# Removing neutral Reviews


review_df = review_df[review_df['airline_sentiment'] != 'neutral']

print(review_df.shape)
review_df.head(5)

# convert the categorical values to numeric using the factorize() method


sentiment_label = review_df.airline_sentiment.factorize()

# retrieve all the text data from the dataset.


tweet = review_df.text.values

# Tokenize all the words in the text 


from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(tweet)
encoded_docs = tokenizer.texts_to_sequences(tweet)

from tensorflow.keras.preprocessing.sequence import pad_sequences


padded_sequence = pad_sequences(encoded_docs, maxlen=200)

# Sentimental analysis using RNN


# Building the text classifier, using RNN LSTM model. 

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import LSTM,Dense, Dropout, SpatialDropout1D
from tensorflow.keras.layers import Embedding

embedding_vector_length = 32
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=200))
model.add(SpatialDropout1D(0.25))
model.add(LSTM(50, dropout=0.5, recurrent_dropout=0.5))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy'])

print(model.summary())

# Train the sentiment analysis model for 5 epochs on the whole dataset with a batch size of 32 and
a validation split of 20%.

history = model.fit(padded_sequence,sentiment_label[0],validation_split=0.2, epochs=5,


batch_size=32)

Output:

# Sentimental analysis using RNN


# Testing the sentiment analysis model on new data
# Define a function that takes a text as input and outputs its prediction label.

def predict_sentiment(text):
tw = tokenizer.texts_to_sequences([text])
tw = pad_sequences(tw,maxlen=200)
prediction = int(model.predict(tw).round().item())
print("Predicted label: ", sentiment_label[1][prediction])

test_sentence1 = "I enjoyed my journey on this flight."


predict_sentiment(test_sentence1)

test_sentence2 = "This is the worst flight experience of my life!"


predict_sentiment(test_sentence2)

Output:
Q.19) Implementing python visualizations on cluster data.
Q.20) Creating & visualizing a simple ANN problem to understand the implementation
of an artificial neuron using python

Training Data:
Input Input Input Output
1 2 3
0 1 1 1
1 0 0 0
1 0 1 1

Test Data:
1 0 1 ?

import numpy as np

class NeuralNetwork():
    
    def __init__(self):
      # seeding for random number generation
        np.random.seed(1)
        #converting weights to a 3 by 1 matrix with values from -1 to 1 and mean of 0
        self.synaptic_weights = 2 * np.random.random((3, 1)) - 1

    def sigmoid(self, x):
        #applying the sigmoid function
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        #computing derivative to the Sigmoid function
        return x * (1 - x)

    def train(self, training_inputs, training_outputs, training_iterations):

      #training the model to make accurate predictions while adjusting weights continually
        for iteration in range(training_iterations):
            #siphon the training data via  the neuron
            output = self.think(training_inputs)

            #computing error rate for back-propagation
            error = training_outputs - output
            
            #performing weight adjustments
            adjustments = np.dot(training_inputs.T, error * self.sigmoid_derivative(output))

            self.synaptic_weights += adjustments

    def think(self, inputs):
        #passing the inputs via the neuron to get output   
        #converting values to floats
        
        inputs = inputs.astype(float)
        output = self.sigmoid(np.dot(inputs, self.synaptic_weights))
        return output

if __name__ == "__main__":

    #initializing the neuron class
    neural_network = NeuralNetwork()

    print("Beginning Randomly Generated Weights: ")
    print(neural_network.synaptic_weights)

    #training data consisting of 4 examples--3 input values and 1 output
    training_inputs = np.array([[0,0,1],
                                [1,1,1],
                                [1,0,1],
                                [0,1,1]])

    training_outputs = np.array([[0,1,1,0]]).T

    #training taking place
    neural_network.train(training_inputs, training_outputs, 15000)

    print("Ending Weights After Training: ")
    print(neural_network.synaptic_weights)

    user_input_one = str(input("User Input One: "))
    user_input_two = str(input("User Input Two: "))
    user_input_three = str(input("User Input Three: "))
    
    print("Considering New Situation: ", user_input_one, user_input_two, user_input_three)
    print("New Output data: ")
    print(neural_network.think(np.array([user_input_one, user_input_two, user_input_three])))
    print("Wow, we did it!")

Q.21) Program to pre-process data of Australian weather and implementing an


Artificial Neural Network to predict the whether.

import matplotlib.pyplot as plt


import seaborn as sns
import datetime
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import seaborn as sns
from keras.layers import Dense, BatchNormalization, Dropout, LSTM
from keras.models import Sequential
from keras.utils import to_categorical
from keras.optimizers import Adam
from tensorflow.keras import regularizers
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report,
accuracy_score, f1_score
from keras import callbacks

np.random.seed(0)

# Loading the dataset file


data = pd.read_csv("D:\SIBAR MCA\KR&AI\Lab\DataSet\weatherAUS.csv")
data.head()

# Print the data details


data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null object
1 Location 145460 non-null object
2 MinTemp 143975 non-null float64
3 MaxTemp 144199 non-null float64
4 Rainfall 142199 non-null float64
5 Evaporation 82670 non-null float64
6 Sunshine 75625 non-null float64
7 WindGustDir 135134 non-null object
8 WindGustSpeed 135197 non-null float64
9 WindDir9am 134894 non-null object
10 WindDir3pm 141232 non-null object
11 WindSpeed9am 143693 non-null float64
12 WindSpeed3pm 142398 non-null float64
13 Humidity9am 142806 non-null float64
14 Humidity3pm 140953 non-null float64
15 Pressure9am 130395 non-null float64
16 Pressure3pm 130432 non-null float64
17 Cloud9am 89572 non-null float64
18 Cloud3pm 86102 non-null float64
19 Temp9am 143693 non-null float64
20 Temp3pm 141851 non-null float64
21 RainToday 142199 non-null object
22 RainTomorrow 142193 non-null object
dtypes: float64(16), object(7)
memory usage: 25.5+ MB
#Parsing datetime
# exploring the length of date objects
lengths = data["Date"].str.len()
lengths.value_counts()

#There don't seem to be any error in dates so parsing values into datetime
data['Date']= pd.to_datetime(data["Date"])
#Creating a collumn of year
data['year'] = data.Date.dt.year

# function to encode datetime into cyclic parameters.


#As I am planning to use this data in a neural network I prefer the months and days in a cyclic
continuous feature.

def encode(data, col, max_val):


data[col + '_sin'] = np.sin(2 * np.pi * data[col]/max_val)
data[col + '_cos'] = np.cos(2 * np.pi * data[col]/max_val)
return data

data['month'] = data.Date.dt.month
data = encode(data, 'month', 12)

data['day'] = data.Date.dt.day
data = encode(data, 'day', 31)

data.head()

# roughly a year's span section


# To see if the "year" attribute of data repeats
section = data[:360]
tm = section["day"].plot(color="#C2C4E2")
tm.set_title("Distribution Of Days Over Year")
tm.set_ylabel("Days In month")
tm.set_xlabel("Days In Year")
#  Splitting months and days into Sine and cosine combination provides the cyclical continuous
feature. This can be used as input features to ANN.
# Splitting of Month

cyclic_month = sns.scatterplot(x="month_sin",y="month_cos",data=data, color="#C2C4E2")


cyclic_month.set_title("Cyclic Encoding of Month")
cyclic_month.set_ylabel("Cosine Encoded Months")
cyclic_month.set_xlabel("Sine Encoded Months")

# Splitting of Day

cyclic_day = sns.scatterplot(x='day_sin',y='day_cos',data=data, color="#C2C4E2")


cyclic_day.set_title("Cyclic Encoding of Day")
cyclic_day.set_ylabel("Cosine Encoded Day")
cyclic_day.set_xlabel("Sine Encoded Day")

# Processing the data for missing values


# Filling missing values with mode of the column in value, for categorical variables
for i in object_cols:
data[i].fillna(data[i].mode()[0], inplace=True)

# Filling missing values with mode of the column in value, for numerical variables
for i in num_cols:
data[i].fillna(data[i].median(), inplace=True)
# Printing the Set
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null datetime64[ns]
1 Location 145460 non-null object
2 MinTemp 145460 non-null float64
3 MaxTemp 145460 non-null float64
4 Rainfall 145460 non-null float64
5 Evaporation 145460 non-null float64
6 Sunshine 145460 non-null float64
7 WindGustDir 145460 non-null object
8 WindGustSpeed 145460 non-null float64
9 WindDir9am 145460 non-null object
10 WindDir3pm 145460 non-null object
11 WindSpeed9am 145460 non-null float64
12 WindSpeed3pm 145460 non-null float64
13 Humidity9am 145460 non-null float64
14 Humidity3pm 145460 non-null float64
15 Pressure9am 145460 non-null float64
16 Pressure3pm 145460 non-null float64
17 Cloud9am 145460 non-null float64
18 Cloud3pm 145460 non-null float64
19 Temp9am 145460 non-null float64
20 Temp3pm 145460 non-null float64
21 RainToday 145460 non-null object
22 RainTomorrow 145460 non-null object
23 year 145460 non-null int64
24 month 145460 non-null int64
25 month_sin 145460 non-null float64
26 month_cos 145460 non-null float64
27 day 145460 non-null int64
28 day_sin 145460 non-null float64
29 day_cos 145460 non-null float64
dtypes: datetime64[ns](1), float64(20), int64(3), object(6)
memory usage: 33.3+ MB

# Apply label encoder to each column with categorical data


label_encoder = LabelEncoder()
for i in object_cols:
data[i] = label_encoder.fit_transform(data[i])

# Prepairing attributes of scale data

features = data.drop(['RainTomorrow', 'Date','day', 'month'], axis=1) # dropping target and extra


columns

target = data['RainTomorrow']

#Set up a standard scaler for the features


col_names = list(features.columns)
s_scaler = preprocessing.StandardScaler()
features = s_scaler.fit_transform(features)
features = pd.DataFrame(features, columns=col_names)

features.describe().T

# Creating Model
#Assigning X and y the status of attributes and tags
X = features.drop(["RainTomorrow"], axis=1)
y = features["RainTomorrow"]

# Splitting test and training sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

X.shape

#Early stopping
early_stopping = callbacks.EarlyStopping(
min_delta=0.001, # minimium amount of change to count as an improvement
patience=20, # how many epochs to wait before stopping
restore_best_weights=True,
)

# Initialising the NN
model = Sequential()

# layers

model.add(Dense(units = 32, kernel_initializer = 'uniform', activation = 'relu', input_dim = 26))


model.add(Dense(units = 32, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 16, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.25))
model.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN


opt = Adam(learning_rate=0.00009)
model.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])

# Train the ANN


history = model.fit(X_train, y_train, batch_size = 32, epochs = 150, callbacks=[early_stopping],
validation_split=0.2)

Output:
Epoch 1/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.5967 - accuracy:
0.7805 - val_loss: 0.3964 - val_accuracy: 0.7860
Epoch 2/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4413 - accuracy:
0.7919 - val_loss: 0.3860 - val_accuracy: 0.8388
Epoch 3/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4290 - accuracy:
0.8257 - val_loss: 0.3761 - val_accuracy: 0.8400
Epoch 4/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4174 - accuracy:
0.8295 - val_loss: 0.3712 - val_accuracy: 0.8421
Epoch 5/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4137 - accuracy:
0.8327 - val_loss: 0.3693 - val_accuracy: 0.8436
Epoch 6/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4091 - accuracy:
0.8338 - val_loss: 0.3669 - val_accuracy: 0.8443
Epoch 7/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4082 - accuracy:
0.8348 - val_loss: 0.3665 - val_accuracy: 0.8441
Epoch 8/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4049 - accuracy:
0.8354 - val_loss: 0.3650 - val_accuracy: 0.8439
Epoch 9/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4020 - accuracy:
0.8357 - val_loss: 0.3642 - val_accuracy: 0.8441
Epoch 10/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3977 - accuracy:
0.8363 - val_loss: 0.3635 - val_accuracy: 0.8445
Epoch 11/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3984 - accuracy:
0.8353 - val_loss: 0.3615 - val_accuracy: 0.8445
Epoch 12/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3953 - accuracy:
0.8368 - val_loss: 0.3618 - val_accuracy: 0.8443
Epoch 13/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3975 - accuracy:
0.8340 - val_loss: 0.3608 - val_accuracy: 0.8444
Epoch 14/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3908 - accuracy:
0.8373 - val_loss: 0.3597 - val_accuracy: 0.8449
Epoch 15/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3859 - accuracy:
0.8383 - val_loss: 0.3597 - val_accuracy: 0.8445
Epoch 16/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3899 - accuracy:
0.8355 - val_loss: 0.3593 - val_accuracy: 0.8433
Epoch 17/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3889 - accuracy:
0.8364 - val_loss: 0.3581 - val_accuracy: 0.8441
Epoch 18/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3924 - accuracy:
0.8336 - val_loss: 0.3580 - val_accuracy: 0.8438
Epoch 19/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3886 - accuracy:
0.8361 - val_loss: 0.3582 - val_accuracy: 0.8431
Epoch 20/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3860 - accuracy:
0.8352 - val_loss: 0.3578 - val_accuracy: 0.8421

#Plotting training and validation loss over epochs

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['loss']], "#BDE2E2", label='Training loss')


plt.plot(history_df.loc[:, ['val_loss']],"#C2C4E2", label='Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc="best")

plt.show()

#Plotting training and validation accuracy over epochs

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['accuracy']], "#BDE2E2", label='Training accuracy')


plt.plot(history_df.loc[:, ['val_accuracy']], "#C2C4E2", label='Validation accuracy')

plt.title('Training and Validation accuracy')


plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# Testing the model on Test Data


# Predicting the test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)

print(classification_report(y_test, y_pred))

Q.22) Write a Python program to prepare data, to be given to a convolutional neural


network CNN and create an Image Classifier. Use the cat and dog training and test
dataset.

# Importing the required libraries


import cv2
import os
import numpy as np
from random import shuffle
from tqdm import tqdm

'''Setting up the env'''


TRAIN_DIR = ‘D:/SIBAR MCA/KR&AI 2021/Lab AI/DataSet/train'
TEST_DIR = ‘D:/SIBAR MCA/KR&AI 2021/Lab AI/DataSet/ test1'
IMG_SIZE = 50
LR = 1e-3

'''Setting up the model which will help with tensorflow models'''


MODEL_NAME = 'dogsvscats-{}-{}.model'.format(LR, '6conv-basic')

'''Labelling the dataset'''


def label_img(img):
word_label = img.split('.')[-3]
# DIY One hot encoder
if word_label == 'cat': return [1, 0]
elif word_label == 'dog': return [0, 1]

'''Creating the training data'''


def create_train_data():
# Creating an empty list where we should store the training data
# after a little preprocessing of the data
training_data = []

# tqdm is only used for interactive loading


# loading the training data
for img in tqdm(os.listdir(TRAIN_DIR)):

# labeling the images


label = label_img(img)

path = os.path.join(TRAIN_DIR, img)

# loading the image from the path and then converting them into
# grayscale for easier covnet prob
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

# resizing the image for processing them in the covnet


img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

# final step-forming the training data list with numpy array of the images
training_data.append([np.array(img), np.array(label)])

# shuffling of the training data to preserve the random state of our data
shuffle(training_data)

# saving our trained data for further uses if required


np.save('train_data.npy', training_data)
return training_data

'''Processing the given test data'''


# Almost same as processing the training data but
# we dont have to label it.
def process_test_data():
testing_data = []
for img in tqdm(os.listdir(TEST_DIR)):
path = os.path.join(TEST_DIR, img)
img_num = img.split('.')[0]
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
testing_data.append([np.array(img), img_num])

shuffle(testing_data)
np.save('test_data.npy', testing_data)
return testing_data

'''Running the training and the testing in the dataset for our model'''
train_data = create_train_data()
test_data = process_test_data()

# train_data = np.load('train_data.npy')
# test_data = np.load('test_data.npy')
'''Creating the neural network using tensorflow'''
# Importing the required libraries
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

import tensorflow as tf
tf.reset_default_graph()
convnet = input_data(shape =[None, IMG_SIZE, IMG_SIZE, 1], name ='input')

convnet = conv_2d(convnet, 32, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 128, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 32, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = fully_connected(convnet, 1024, activation ='relu')


convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation ='softmax')


convnet = regression(convnet, optimizer ='adam', learning_rate = LR,
loss ='categorical_crossentropy', name ='targets')

model = tflearn.DNN(convnet, tensorboard_dir ='log')

# Splitting the testing data and training data


train = train_data[:-500]
test = train_data[-500:]

'''Setting up the features and labels'''


# X-Features & Y-Labels
X = np.array([i[0] for i in train]).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
Y = [i[1] for i in train]
test_x = np.array([i[0] for i in test]).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
test_y = [i[1] for i in test]

'''Fitting the data into our model'''


# epoch = 5 taken
model.fit({'input': X}, {'targets': Y}, n_epoch = 5,
validation_set =({'input': test_x}, {'targets': test_y}),
snapshot_step = 500, show_metric = True, run_id = MODEL_NAME)
model.save(MODEL_NAME)

'''Testing the data'''


import matplotlib.pyplot as plt
# if you need to create the data:
# test_data = process_test_data()
# if you already have some saved:
test_data = np.load('test_data.npy')

fig = plt.figure()

for num, data in enumerate(test_data[:20]):


# cat: [1, 0]
# dog: [0, 1]

img_num = data[1]
img_data = data[0]

y = fig.add_subplot(4, 5, num + 1)
orig = img_data
data = img_data.reshape(IMG_SIZE, IMG_SIZE, 1)

# model_out = model.predict([data])[0]
model_out = model.predict([data])[0]

if np.argmax(model_out) == 1: str_label ='Dog'


else: str_label ='Cat'

y.imshow(orig, cmap ='gray')


plt.title(str_label)
y.axes.get_xaxis().set_visible(False)
y.axes.get_yaxis().set_visible(False)
plt.show()

Q.23) Write a Python program to implement RNN by building a character level


prediction RNN and train in on the text of “Harry Potter and the Philosopher’s Stone”.

import numpy as np
import matplotlib.pyplot as plt
class ReccurentNN:
def __init__(self, char_to_idx, idx_to_char, vocab, h_size=75,
seq_len=20, clip_value=5, epochs=50, learning_rate=1e-2):
self.n_h = h_size
self.seq_len = seq_len # number of characters in each batch/time steps
self.clip_value = clip_value # maximum allowed value for the gradients
self.epochs = epochs
self.learning_rate = learning_rate
self.char_to_idx = char_to_idx # dictionary that maps characters to an index
self.idx_to_char = idx_to_char # dictionary that maps indices to characters
self.vocab = vocab # number of unique characters in the training text
# smoothing out loss as batch SGD is noisy
self.smooth_loss = -np.log(1.0 / self.vocab) * self.seq_len

# initialize parameters
self.params = {}

self.params["W_xh"] = np.random.randn(self.vocab, self.n_h) * 0.01


self.params["W_hh"] = np.identity(self.n_h) * 0.01
self.params["b_h"] = np.zeros((1, self.n_h))
self.params["W_hy"] = np.random.randn(self.n_h, self.vocab) * 0.01
self.params["b_y"] = np.zeros((1, self.vocab))

self.h0 = np.zeros((1, self.n_h)) # value of the hidden state at time step t = -1

# initialize gradients and memory parameters for Adagrad


self.grads = {}
self.m_params = {}
for key in self.params:
self.grads["d" + key] = np.zeros_like(self.params[key])
self.m_params["m" + key] = np.zeros_like(self.params[key])
def _encode_text(self, X):
X_encoded = []
for char in X:
X_encoded.append(self.char_to_idx[char])
return X_encoded

def _prepare_batches(self, X, index):


X_batch_encoded = X[index: index + self.seq_len]
y_batch_encoded = X[index + 1: index + self.seq_len + 1]

X_batch = []
y_batch = []

for i in X_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][i] = 1
X_batch.append(one_hot_char)

for j in y_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][j] = 1
y_batch.append(one_hot_char)
return X_batch, y_batch

def _softmax(self, x):


# max value is substracted for numerical stability
# https://stats.stackexchange.com/a/338293
e_x = np.exp(x - np.max(x))
return e_x / np.sum(e_x)

def _forward_pass(self, X):

h = {} # stores hidden states


h[-1] = self.h0 # set initial hidden state at t=-1

y_pred = {} # stores softmax output probabilities

# iterate over each character in the input sequence


for t in range(self.seq_len):
h[t] = np.tanh(
np.dot(X[t], self.params["W_xh"]) + np.dot(h[t - 1], self.params["W_hh"]) +
self.params["b_h"])
y_pred[t] = self._softmax(np.dot(h[t], self.params["W_hy"]) + self.params["b_y"])

self.ho = h[t]
return y_pred, h
def _backward_pass(self, X, y, y_pred, h):
dh_next = np.zeros_like(h[0])
for t in reversed(range(self.seq_len)):
dy = np.copy(y_pred[t])
dy[0][np.argmax(y[t])] -= 1 # predicted y - actual y

self.grads["dW_hy"] += np.dot(h[t].T, dy)


self.grads["db_y"] += dy

dhidden = (1 - h[t] ** 2) * (np.dot(dy, self.params["W_hy"].T) + dh_next)


dh_next = np.dot(dhidden, self.params["W_hh"].T)

self.grads["dW_hh"] += np.dot(h[t - 1].T, dhidden)


self.grads["dW_xh"] += np.dot(X[t].T, dhidden)
self.grads["db_h"] += dhidden

# clip gradients to mitigate exploding gradients


for grad, key in enumerate(self.grads):
np.clip(self.grads[key], -self.clip_value, self.clip_value, out=self.grads[key])
return
def _update(self):
for key in self.params:
self.m_params["m" + key] += self.grads["d" + key] * self.grads["d" + key]
self.params[key] -= self.grads["d" + key] * self.learning_rate / (np.sqrt(self.m_params["m"
+ key]) + 1e-8)
def test(self, test_size, start_index):
res = ""
x = np.zeros((1, self.vocab))
x[0][start_index] = 1
for i in range(test_size):
# forward propagation
h = np.tanh(np.dot(x, self.params["W_xh"]) + np.dot(self.h0, self.params["W_hh"]) +
self.params["b_h"])
y_pred = self._softmax(np.dot(h, self.params["W_hy"]) + self.params["b_y"])

# get a random index from the probability distribution of y


index = np.random.choice(range(self.vocab), p=y_pred.ravel())

# set x-one_hot_vector for the next character


x = np.zeros((1, self.vocab))
x[0][index] = 1

# find the char with the index and concat to the output string
char = self.idx_to_char[index]
res += char
return res
def train(self, X):
J = []
num_batches = len(X) // self.seq_len
X_trimmed = X[:num_batches * self.seq_len] # trim end of the input text so that we have full
sequences
X_encoded = self._encode_text(X_trimmed) # transform words to indices to enable
processing
for i in range(self.epochs):
for j in range(0, len(X_encoded) - self.seq_len, self.seq_len):
X_batch, y_batch = self._prepare_batches(X_encoded, j)
y_pred, h = self._forward_pass(X_batch)
loss = 0
for t in range(self.seq_len):
loss += -np.log(y_pred[t][0, np.argmax(y_batch[t])])
self.smooth_loss = self.smooth_loss * 0.999 + loss * 0.001
J.append(self.smooth_loss)
self._backward_pass(X_batch, y_batch, y_pred, h)
self._update()
print('Epoch:', i + 1, "\tLoss:", loss, "")
return J, self.params
with open('Harry-Potter.txt') as f:
text = f.read().lower()
# use only a part of the text to make the process faster
text = text[:20000]
# text = [char for char in text if char not in ["(", ")", "\"", "'", ".", "?", "!", ",", "-"]]
# text = [char for char in text if char not in ["(", ")", "\"", "'"]]
chars = set(text)
vocab = len(chars)
# print(f"Length of training text {len(text)}")
# print(f"Size of vocabulary {vocab}")

# creating the encoding decoding dictionaries


char_to_idx = {w: i for i, w in enumerate(chars)}
idx_to_char = {i: w for i, w in enumerate(chars)}

parameter_dict = {
'char_to_idx': char_to_idx,
'idx_to_char': idx_to_char,
'vocab': vocab,
'h_size': 75,
'seq_len': 20,
# keep small to avoid diminishing/exploding gradients
'clip_value': 5,
'epochs': 50,
'learning_rate': 1e-2,
}

model = ReccurentNN(**parameter_dict)
loss, params = model.train(text)
plt.figure(figsize=(12, 8))
plt.plot([i for i in range(len(loss))], loss)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.show()
print(model.test(50,10))

OUTPUT:
Epoch: 1 Loss: 56.938160313575075
Epoch: 2 Loss: 49.479841032771944
Epoch: 3 Loss: 44.287300754487774
Epoch: 4 Loss: 42.75894603770088
Epoch: 5 Loss: 40.962449282519785
Epoch: 6 Loss: 41.06907316142755
Epoch: 7 Loss: 39.77795494997328
Epoch: 8 Loss: 41.059521063295485
Epoch: 9 Loss: 39.848893648177594
Epoch: 10 Loss: 40.42097045126549
Epoch: 11 Loss: 39.183043247471126
Epoch: 12 Loss: 40.09713939411275
Epoch: 13 Loss: 38.786694845855145
Epoch: 14 Loss: 39.41259563289025
Epoch: 15 Loss: 38.87094988626352
Epoch: 16 Loss: 38.80896936130275
Epoch: 17 Loss: 38.65301294936609
Epoch: 18 Loss: 38.2922486206415
Epoch: 19 Loss: 38.120326247610286
Epoch: 20 Loss: 37.94743442371039
Epoch: 21 Loss: 37.781826419304245
Epoch: 22 Loss: 38.02242197941186
Epoch: 23 Loss: 37.34639374983505
Epoch: 24 Loss: 37.383830387022115
Epoch: 25 Loss: 36.863261576664286
Epoch: 26 Loss: 36.81717706027801
Epoch: 27 Loss: 35.98781618662626
Epoch: 28 Loss: 34.883143187020806
Epoch: 29 Loss: 35.74233839750379
Epoch: 30 Loss: 34.17457373354039
Epoch: 31 Loss: 34.3659838303625
Epoch: 32 Loss: 34.6155982440106
Epoch: 33 Loss: 33.428021716569035
Epoch: 34 Loss: 33.06226727751935
Epoch: 35 Loss: 33.23334401686566
Epoch: 36 Loss: 32.9818416477839
Epoch: 37 Loss: 33.155764725505655
Epoch: 38 Loss: 32.937205806520474
Epoch: 39 Loss: 32.93063638107538
Epoch: 40 Loss: 32.943368437981256
Epoch: 41 Loss: 32.92520056534523
Epoch: 42 Loss: 32.96074563399301
Epoch: 43 Loss: 32.974579784369666
Epoch: 44 Loss: 32.86483014312194
Epoch: 45 Loss: 33.10532379921245
Epoch: 46 Loss: 32.89950584889016
Epoch: 47 Loss: 33.11303116056217
Epoch: 48 Loss: 32.731237824441756
Epoch: 49 Loss: 32.742918023080314
Epoch: 50 Loss: 32.421869906086144
is othe on. ogofostheodindearidut wlethallle, st oserarey d -lers amoathe y
thasathey at dll tos dn t s med d.). t t ile brs t d g htherive, d ogostare d.
ay shag hythay boumay tey thas ot havininggon
Q.24) Write a Python program to implement GAN, to create a curve resembling a sine
wave. Python library pytorch must be used to set a random generator.

#importing the necessary libraries:


import torch
from torch import nn

import math
import matplotlib.pyplot as plt

#Set up a random generator seed, 111 represents the random seed 


torch.manual_seed(111)

#Preparing the Training Data


train_data_length = 1024
train_data = torch.zeros((train_data_length, 2))
train_data[:, 0] = 2 * math.pi * torch.rand(train_data_length)
train_data[:, 1] = torch.sin(train_data[:, 0])
train_labels = torch.zeros(train_data_length)
train_set = [
(train_data[i], train_labels[i]) for i in range(train_data_length) ]

# Plotting the training data, each point (x₁, x₂)


plt.plot(train_data[:, 0], train_data[:, 1], ".")

# Create a PyTorch data loader


batch_size = 32
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True
)

#Implementing the Discriminator, in PyTorch, the neural network models are represented by
classes that inherit from nn.Module
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(2, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 1),
nn.Sigmoid(),
)
def forward(self, x):
output = self.model(x)
return output

#instantiate a Discriminator object
discriminator = Discriminator()

#Implementing the Generator, create a Generator class that inherits from nn.Module


class Generator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(2, 16),
nn.ReLU(),
nn.Linear(16, 32),
nn.ReLU(),
nn.Linear(32, 2),
)

def forward(self, x):


output = self.model(x)
return output

generator = Generator()

#set up parameters to use during training


lr = 0.001
num_epochs = 300
loss_function = nn.BCELoss()

#Create the optimizers using torch.optim


optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr)
optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr)

# implement a training loop 


for epoch in range(num_epochs):
for n, (real_samples, _) in enumerate(train_loader):
# Data for training the discriminator
real_samples_labels = torch.ones((batch_size, 1))
latent_space_samples = torch.randn((batch_size, 2))
generated_samples = generator(latent_space_samples)
generated_samples_labels = torch.zeros((batch_size, 1))
all_samples = torch.cat((real_samples, generated_samples))
all_samples_labels = torch.cat(
(real_samples_labels, generated_samples_labels)
)

# Training the discriminator


discriminator.zero_grad()
output_discriminator = discriminator(all_samples)
loss_discriminator = loss_function(
output_discriminator, all_samples_labels)
loss_discriminator.backward()
optimizer_discriminator.step()

# Data for training the generator


latent_space_samples = torch.randn((batch_size, 2))

# Training the generator


generator.zero_grad()
generated_samples = generator(latent_space_samples)
output_discriminator_generated = discriminator(generated_samples)
loss_generator = loss_function(
output_discriminator_generated, real_samples_labels
)
loss_generator.backward()
optimizer_generator.step()

# Show loss
if epoch % 10 == 0 and n == batch_size - 1:
print(f"Epoch: {epoch} Loss D.: {loss_discriminator}")
print(f"Epoch: {epoch} Loss G.: {loss_generator}")

# Checking the Samples Generated by the GAN


latent_space_samples = torch.randn(100, 2)
generated_samples = generator(latent_space_samples)
generated_samples = generated_samples.detach()
plt.plot(generated_samples[:, 0], generated_samples[:, 1], ".")

Output: After 300 epochs

You might also like