KR&AI-ML-DM Practical Journal ANS
KR&AI-ML-DM Practical Journal ANS
Laboratory Journal
MCA - Semester-III
Roll No.:011
Academic Year
2022-2023
-INDEX-
Sr. Pg.
Date Lab Title
No No.
Write a Python program to predict mpg (miles per gallon) for a car based on
variable wt by applying simple linear regression on 'mtcars' dataset (Use
Training data 80% and Testing Data 20%).
4 08-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value.
Change Training data to 70% and Testing Data 30%, compare & interpret
the performance of your model.
Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset
(Use Training data 80% and Testing Data 20%).
5 16-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value.
Remove variable disp from the feature set and check the performance again.
Compare & interpret the performance of your
model.
Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset
(Use Training data 80% and Testing Data 20%).
6 23-Sep-22 Record the performance of model in terms of MAE, MSE, RMSE and R-squared
value .
Replace disp by drat variable in the feature set and check the
performance again. Interpret the performance of your model.
Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying logistic regression on 'apples_and_oranges' dataset (Use
Training data 80% and Testing Data 20%).
7 30-Sep-22 Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for the
model and interpret the model performance.
Write a Python program to predict fruit (Apple or Orange) based on its size
8 07-Oct-22 & weight by applying K-Nearest Neighbour (KNN)
model on 'apples_and_oranges' dataset (Use Training data 80% and Testing
Data 20%).
Evaluate the performance of the model using Accuracy Score metric,
Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict fruit (Apple or Orange) based on its size
& weight by applying Support Vector Machine (SVM) model on
12 14-Oct-22 'apples_and_oranges' dataset (Use Training data 80% and Testing Data
20%).
Evaluate the performance of the model using Accuracy Score metric,
Classification Report & Confusion Matrix, AUC ROC score for the model and
interpret the model performance.
Internal Examiner :
External Examiner :
Date :
Q1.) Write a Python program to Find the correlation matrix.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
x=7,8,9,11,15,20
y=130,135,140,142,147,156
corr,_=pearsonr(x,y)
print('Pearsons Correlation: %.3f'%corr)
plt.scatter(x,y)
plt.plot(np.unique (x), np.poly1d (np.polyfit(x,y,1))(np.unique(x)), col
or ='blue')
OUTPUT:
Q.2) Plot the correlation plot on dataset and visualize giving an overview of
relationships among data on iris data
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f_oneway
performance1=[89,89,88,78,79]
performance2=[93,92,94,89,88]
performance3=[89,88,89,93,90]
performance4=[81,78,81,92,82]
#Conduct the one-way ANOVA
print(f_oneway(performance1,performance2,performance3,performance4))
F_onewayResult(statistic=4.625000000000002,
pvalue=0.016336459839780215)
Q.3) Implementing the ANOVA testing on iris dataset. Using only one independent
variable i.e. Species (iris-setosa, iris-versicolor, iris-virginica) which are categorical
and sepal width as a continuous variable.
# importing the necessary libraries
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import SelectKBest
from scipy.stats import shapiro
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.sandbox.stats.multicomp import TukeyHSDResults
from statsmodels.graphics.factorplots import interaction_plot
from pandas.plotting import scatter_matrix
# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Iris.csv')
df.head()
dataframe_iris=pd.DataFrame(df,columns=['SepalLengthCm','SepalWidthCm','PetalLengthCm','Pe
talWidthCm'])
# Visualising the dataframe by plotting
scatter_matrix(dataframe_iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm','PetalWidthCm
']],figsize=(15,10))
plt.show()
ID=[]
for i in range(0,150):
ID.append(i)
dataframe=pd.DataFrame(ID,columns=['ID'])
dataframe_iris_new=pd.concat([dataframe_iris,dataframe_iris,dataframe],axis=1)
dataframe_iris_new.columns
##fig = interaction_plot(dataframe_iris_new.SepalWidthCm,dataframe_iris_new.target,dataframe_i
ris_new.ID,colors=['red','blue','green'], ms=12)
dataframe_iris_new.info()
dataframe_iris_new.describe()
# To implement Anova test we have to create null hypothesis and alternate hypothesis
# Null hypothesis=sample means are equal
# Alternate hypothesis=sample means are not equal
##print(dataframe_iris_new['SepalWidthCm'].groupby(dataframe_iris_new['target']).mean())
dataframe_iris_new.mean()
# Anova calculate f-value and p-value.
# P-value:-p-value is used to evaluate hypothesis results.
# If p-value<0.05 we have to reject. And p-value>0.05 we have to accept null hypothesis.
# F-value:-f-value is the ratio of variance between groups and variance within groups.
# If f-value is close to 1 then we say that our null hypothesis is true
# check whether variance between groups are equal Anova use levene/barlett test.
# Check normal distribution of data(shapiro-wilk test)
##stats.shapiro(dataframe_iris_new['SepalWidthCm'][dataframe_iris_new['target']])
# Check equality of variance between groups(levene/bartlett test)
##p_value=stats.levene(dataframe_iris_new['SepalWidthCm'],dataframe_iris_new['target'])
##p_value
##F_value,P_value=stats.f_oneway(dataframe_iris_new['SepalWidthCm'],dataframe_iris_new['tar
get'])
##print("F_value=",F_value,",","P_value=",P_value)
OUTPUT:
Q.4) Write a Python program to predict mpg (miles per gallon) for a car based on
variable wt by applying simple linear regression on 'mtcars' dataset (Use Training
data 80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Change Training data to 70% and Testing Data 30%, compare & interpret the
performance of your model.
from google.colab import drive.
drive.mount("/content/drive",force_remount=True)
from google.colab import drive
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(5))
x =df.iloc[:,[6]].values
y =df.iloc[:,1].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LinearRegression
LinRegressor=LinearRegression()
LinRegressor.fit(x_train,y_train)
print(x,y)
y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
y_pred1=LinRegressor.predict([[wt]])
print("The Predicted MPG Value is: ",y_pred1)
from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))
print('Mean Squared Error: %.2f'% mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %.2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %.2f" % r2_score(y_test, y_pred))
Q.5) Write a Python program to predict mpg (miles per gallon) for a car based on variables
wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset (Use Training data
80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Remove variable disp from the feature set and check the performance again. Compare &
interpret the performance of your
model.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,3,6]].values
y =df.iloc[:,1].values
print(x,y)
import pandas as pd
import numpy as np
df=pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(5))
x=df.iloc[:,[2,6]].values #Remove variable disp from the feature set and che
ck the performance again.
y=df.iloc[:,1].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LinearRegression
LinRegressor=LinearRegression()
LinRegressor.fit(x_train,y_train)
print(x,y)
y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
cyl=input("Enter Cyl of a Car: ")
disp=input("Enter Disp of a Car: ")
y_pred1=LinRegressor.predict([[wt,cyl,disp]])
print("The Predicted MPG Value is: ",y_pred1)
from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %2f' % mean_absolute_error(y_test,y_pred))
print("Mean Squared Error: %2f" % mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %2f" % r2_score(y_test,y_pred))
Q.6) Write a Python program to predict mpg (miles per gallon) for a car based on
variables wt, cyl & disp by applying multi-linear regression on 'mtcars' dataset (Use
Training data 80% and Testing Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value .
Replace disp by drat variable in the feature set and check the performance again.
Interpret the performance of your model.
from google .colab import drive
drive.mount("/content/drive",force_remount=True)
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,3,6]].values
y =df.iloc[:,1].values
print(x,y)
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df.head(10))
x =df.iloc[:,[2,5,6]].values #Replace disp by drat variable
y =df.iloc[:,1].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LinearRegression
LinearRegressor=LinearRegression()
LinearRegressor.fit(x_train,y_train)
print(x,y)
y_pred=LinRegressor.predict(x_test)
wt=input("Enter Weight of a Car: ")
cyl=input("Enter Cyl of a Car: ")
disp=input("Enter Disp of a Car: ")
y_pred1=LinRegressor.predict([[wt,cyl,disp]])
print("The Predicted MPG Value is: ",y_pred1)
from math import sqrt
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
print('Mean Absolute Error: %2f' % mean_absolute_error(y_test,y_pred))
print("Mean Squared Error: %2f" % mean_squared_error(y_test,y_pred))
print("Root Mean Squared Error: %2f" % sqrt(mean_squared_error(y_test,y_pred)))
print("R2_Score: %2f" % r2_score(y_test,y_pred))
Q.7) Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying logistic regression on 'apples_and_oranges' dataset (Use Training
data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score for the model and
interpret the model performance.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(10))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LogisticRegression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.8) Write a Python program to predict fruit (Apple or Orange) based on its size &
weight by applying K-Nearest Neighbour (KNN) model on 'apples_and_oranges'
dataset (Use Training data 80% and Testing Data 20%). Evaluate the performance
of the model using Accuracy Score metric, Classification Report & Confusion Matrix,
AUC ROC score for the model and interpret the model performance.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(10))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=9)
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.9) Implementing the K-mean Algorithm on unsupervised data of a mall, that
contains the basic information (ID, age, gender, income, spending score) about the
customers. Finding the clusters based on the income and spending.
# importing the necessary libraries
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import os
import warnings
warnings.filterwarnings('ignore')
# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Mall_Customers.csv')
df.head()
# renaming the heads
df.rename(index=str, columns={'Annual Income (k$)': 'Income','Spending Score (1-100)': 'Score'},
inplace=True)
# data in a detailed way with pairplot
X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()
from sklearn.cluster import KMeans
clusters = []
for i in range(1, 11):
km = KMeans(n_clusters=i).fit(X)
clusters.append(km.inertia_)
fig, ax = plt.subplots(figsize=(12, 8))
sns.lineplot(x=list(range(1, 11)), y=clusters, ax=ax)
ax.set_title('Searching for Elbow')
ax.set_xlabel('Clusters')
ax.set_ylabel('Inertia')
# Annotate arrow
ax.annotate('Possible Elbow Point', xy=(3, 140000), xytext=(3, 50000), xycoords='data',
arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))
ax.annotate('Possible Elbow Point', xy=(5, 80000), xytext=(5, 150000), xycoords='data',
arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))
plt.show()
# based on the elbow points can have 3 or 5 clusters, creating 5 cluster to classify based on income
and spending
km5 = KMeans(n_clusters=5).fit(X)
X['Labels'] = km5.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'], palette=sns.color_palette('hls', 5))
plt.title('KMeans with 5 Clusters')
plt.show()
Q.10) Implementing the Agglomerative Hierarchical Clustering Algorithm on
unsupervised data of a mall, that contains the basic information (ID, age, gender,
income, spending score) about thecustomers. Finding the clusters based on the income
and spending.
# importing the necessary libraries
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import os
import warnings
warnings.filterwarnings('ignore')
# loading the dataset
df = pd.read_csv('/content/drive/MyDrive/Dataset/Mall_Customers.csv')
df.head()
# renaming the heads
df.rename(index=str, columns={'Annual Income (k$)': 'Income',
'Spending Score (1-100)': 'Score'}, inplace=True)
# data in a detailed way with pairplot
X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()
#
from sklearn.cluster import AgglomerativeClustering
agglom = AgglomerativeClustering(n_clusters=5, linkage='average').fit(X)
X['Labels'] = agglom.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'], palette=sns.color_palette('hls', 5))
plt.title('Agglomerative with 5 Clusters')
plt.show()
Q.11) Write a Python program to create an Association algorithm for supervised
classification on any dataset
Q.12) Write a Python program to predict fruit (Apple or Orange) based on its size & weight
by applying Support Vector Machine (SVM) model on 'apples_and_oranges' dataset (Use
Training data 80% and Testing Data 20%). Evaluate the performance of the model using
Accuracy Score metric, Classification Report & Confusion Matrix, AUC ROC score for the
model and interpret the model performance.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Fruits.xls")
print(df.head(5))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=5)
print(x,y)
from sklearn.svm import SVC
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)
y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(y_test)
print(pred_prob)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score matrics=",accuracy_score(y_test,y_pred))
print("Classifiction matrics=",classification_report(y_test,y_pred))
print("Confusion matrics=",confusion_matrix(y_test,y_pred))
print("Roc_auc_score",roc_auc_score(y_test,pred_prob[:,1]))
Q.13) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Logistic Regression,
KNN and SVM model on 'iris' dataset (Use Training data 80% and Testing Data 20%).Evaluate
the performance of the model using Accuracy Score metric, Classification Report & Confusion
Matrix, AUC ROC score for each model and suggest the best model.
13.1-Logistic Regression:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.linear_model import LogisticRegression #Logistic Regression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
13.2-K-Nearest Neighbour(KNN):
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.neighbors import KNeighborsClassifier #K-Nearest Neighbour(KNN)
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
print(x,y)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
13.3-Support Vector Machine(SVM):
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.svm import SVC #Support Vector Machine(SVM)
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)
y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
Q.14) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Naive Bays
Classification model on 'iris' dataset (Use Training data 80% and Testing Data 20%). Evaluate
the performance of the model using Accuracy Score metric, Classification Report & Confusion
Matrix, AUC ROC score for the model and interpret the model performance.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(10))
x=df.iloc[:,1:2].values
y=df.iloc[:,5].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
print(x,y)
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
print(x,y)
y_pred = classifier.predict(x_test)
y_pred
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
Q.15) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new
iris flower based on length & width of its petals and sepals by applying Decision Tree model
on 'iris' dataset (Use Training data 80% and Testing Data 20%).Evaluate the performance
of the model using Accuracy Score metric, Classification Report & Confusion Matrix, AUC
ROC score for the model and interpret the model performance.
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
print(df.head(5))
x =df.iloc[:,1:5].values
y =df.iloc[:,5].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.tree import DecisionTreeClassifier
dt= DecisionTreeClassifier()
print(x_train,y_train)
dt.fit(x_train,y_train)
y_pred=dt.predict(x_test)
pred_prob=dt.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
Q.16) Write a Python program to predict whether or not a patient has diabetes based on given
diagnostic measurements included in the dataset Pima Indians Diabetes Database
"Diabetes.csv" by applying Logistic Regression, KNN, SVM, Naive Bays, Decision Tree models.
(Use Training data 80% and Testing Data 20%).Evaluate & Compare the performance of all
the models using Accuracy Score metric, Classification Report & Confusion Matrix, AUC ROC
score for each model and suggest the best model for Diabetes prediction.
16.1-Logistic Regression:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df)
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=42)
print(x,y)
from sklearn.linear_model import LogisticRegression
knn=LogisticRegression()
print(x_train,y_train)
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
16.2-KNN:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(10))
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=9)
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
y_pred=knn.predict(x_test)
pred_prob=knn.predict_proba(x_test)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score metric=", accuracy_score(y_test,y_pred))
print("Classification_report=", classification_report(y_test,y_pred))
print("confusion_matrix=", confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
16.3-SVM:
from google.colab import drive
drive.mount("/content/drive",force_remount=True
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv ")
print(df.head(5))
x =df.iloc[:,0:2].values
y =df.iloc[:,2].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,random_state=5)
print(x,y)
from sklearn.svm import SVC
svm= SVC(kernel='rbf', C=50, gamma=5, probability=True)
print(x_train,y_train)
svm.fit(x_train, y_train)
y_pred=svm.predict(x_test)
pred_prob=svm.predict_proba(x_test)
print(y_test)
print(pred_prob)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy score matrics=",accuracy_score(y_test,y_pred))
print("Classifiction matrics=",classification_report(y_test,y_pred))
print("Confusion matrics=",confusion_matrix(y_test,y_pred))
print("Roc_Auc_Score",roc_auc_score(y_test,pred_prob,multi_class='ovo'))
16.4-Naive Bays:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(10))
x=df.iloc[:,0:7].values
y=df.iloc[:,8].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
print(x,y)
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
print(x,y)
y_pred = classifier.predict(x_test)
y_pred
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
16.5-Decision Tree:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)
from IPython.core.interactiveshell import no_op
import pandas as pd
import numpy as np
df =pd.read_csv("/content/drive/MyDrive/Dataset/diabetes.csv")
print(df.head(5))
x =df.iloc[:,0:7].values
y =df.iloc[:,8].values
print(x,y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)
print(x,y)
from sklearn.tree import DecisionTreeClassifier
dt= DecisionTreeClassifier()
print(x_train,y_train)
dt.fit(x_train,y_train)
y_pred=dt.predict(x_test)
pred_prob=dt.predict_proba(x_test)
print(x,y)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
print("Accuracy Score=",accuracy_score(y_test,y_pred))
print("Classifiction Report=",classification_report(y_test,y_pred))
print("Confution Matrix=",confusion_matrix(y_test,y_pred))
print("roc_auc_score", roc_auc_score(y_test,pred_prob[:,1]))
Q.17) Python Program to implement Text Mining Basics:
i. Tokenization
ii. Finding frequency distinct
iii. Removing punctuations
iv. Stemming
# Tokenization
# Importing necessary library
import pandas as pd
import numpy as np
import nltk
import os
import nltk.corpus
# sample text for performing tokenization
text = “We are learning text mining basics with python. python will help in implementing different
algorithms"
# importing word_tokenize from nltk
from nltk.tokenize import word_tokenize
# Passing the string text into word tokenize for breaking the sentences
token = word_tokenize(text)
token
Output:
Output:
Output:
Thank you For learning Just adding a few notes diagrams and ppts
import nltk
from nltk.stem.porter import PorterStemmer
words = ["walk", "walking", "walked", "walks", "ran", "run", "running", "runs"]
stemmer = PorterStemmer()
Output:
import pandas as pd
print(review_df.shape)
review_df.head(5)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(tweet)
encoded_docs = tokenizer.texts_to_sequences(tweet)
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=200))
model.add(SpatialDropout1D(0.25))
model.add(LSTM(50, dropout=0.5, recurrent_dropout=0.5))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy'])
print(model.summary())
# Train the sentiment analysis model for 5 epochs on the whole dataset with a batch size of 32 and
a validation split of 20%.
Output:
def predict_sentiment(text):
tw = tokenizer.texts_to_sequences([text])
tw = pad_sequences(tw,maxlen=200)
prediction = int(model.predict(tw).round().item())
print("Predicted label: ", sentiment_label[1][prediction])
Output:
Q.19) Implementing python visualizations on cluster data.
Q.20) Creating & visualizing a simple ANN problem to understand the implementation
of an artificial neuron using python
Training Data:
Input Input Input Output
1 2 3
0 1 1 1
1 0 0 0
1 0 1 1
Test Data:
1 0 1 ?
import numpy as np
class NeuralNetwork():
def __init__(self):
# seeding for random number generation
np.random.seed(1)
#converting weights to a 3 by 1 matrix with values from -1 to 1 and mean of 0
self.synaptic_weights = 2 * np.random.random((3, 1)) - 1
def sigmoid(self, x):
#applying the sigmoid function
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
#computing derivative to the Sigmoid function
return x * (1 - x)
def train(self, training_inputs, training_outputs, training_iterations):
#training the model to make accurate predictions while adjusting weights continually
for iteration in range(training_iterations):
#siphon the training data via the neuron
output = self.think(training_inputs)
#computing error rate for back-propagation
error = training_outputs - output
#performing weight adjustments
adjustments = np.dot(training_inputs.T, error * self.sigmoid_derivative(output))
self.synaptic_weights += adjustments
def think(self, inputs):
#passing the inputs via the neuron to get output
#converting values to floats
inputs = inputs.astype(float)
output = self.sigmoid(np.dot(inputs, self.synaptic_weights))
return output
if __name__ == "__main__":
#initializing the neuron class
neural_network = NeuralNetwork()
print("Beginning Randomly Generated Weights: ")
print(neural_network.synaptic_weights)
#training data consisting of 4 examples--3 input values and 1 output
training_inputs = np.array([[0,0,1],
[1,1,1],
[1,0,1],
[0,1,1]])
training_outputs = np.array([[0,1,1,0]]).T
#training taking place
neural_network.train(training_inputs, training_outputs, 15000)
print("Ending Weights After Training: ")
print(neural_network.synaptic_weights)
user_input_one = str(input("User Input One: "))
user_input_two = str(input("User Input Two: "))
user_input_three = str(input("User Input Three: "))
print("Considering New Situation: ", user_input_one, user_input_two, user_input_three)
print("New Output data: ")
print(neural_network.think(np.array([user_input_one, user_input_two, user_input_three])))
print("Wow, we did it!")
np.random.seed(0)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null object
1 Location 145460 non-null object
2 MinTemp 143975 non-null float64
3 MaxTemp 144199 non-null float64
4 Rainfall 142199 non-null float64
5 Evaporation 82670 non-null float64
6 Sunshine 75625 non-null float64
7 WindGustDir 135134 non-null object
8 WindGustSpeed 135197 non-null float64
9 WindDir9am 134894 non-null object
10 WindDir3pm 141232 non-null object
11 WindSpeed9am 143693 non-null float64
12 WindSpeed3pm 142398 non-null float64
13 Humidity9am 142806 non-null float64
14 Humidity3pm 140953 non-null float64
15 Pressure9am 130395 non-null float64
16 Pressure3pm 130432 non-null float64
17 Cloud9am 89572 non-null float64
18 Cloud3pm 86102 non-null float64
19 Temp9am 143693 non-null float64
20 Temp3pm 141851 non-null float64
21 RainToday 142199 non-null object
22 RainTomorrow 142193 non-null object
dtypes: float64(16), object(7)
memory usage: 25.5+ MB
#Parsing datetime
# exploring the length of date objects
lengths = data["Date"].str.len()
lengths.value_counts()
#There don't seem to be any error in dates so parsing values into datetime
data['Date']= pd.to_datetime(data["Date"])
#Creating a collumn of year
data['year'] = data.Date.dt.year
data['month'] = data.Date.dt.month
data = encode(data, 'month', 12)
data['day'] = data.Date.dt.day
data = encode(data, 'day', 31)
data.head()
# Splitting of Day
# Filling missing values with mode of the column in value, for numerical variables
for i in num_cols:
data[i].fillna(data[i].median(), inplace=True)
# Printing the Set
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null datetime64[ns]
1 Location 145460 non-null object
2 MinTemp 145460 non-null float64
3 MaxTemp 145460 non-null float64
4 Rainfall 145460 non-null float64
5 Evaporation 145460 non-null float64
6 Sunshine 145460 non-null float64
7 WindGustDir 145460 non-null object
8 WindGustSpeed 145460 non-null float64
9 WindDir9am 145460 non-null object
10 WindDir3pm 145460 non-null object
11 WindSpeed9am 145460 non-null float64
12 WindSpeed3pm 145460 non-null float64
13 Humidity9am 145460 non-null float64
14 Humidity3pm 145460 non-null float64
15 Pressure9am 145460 non-null float64
16 Pressure3pm 145460 non-null float64
17 Cloud9am 145460 non-null float64
18 Cloud3pm 145460 non-null float64
19 Temp9am 145460 non-null float64
20 Temp3pm 145460 non-null float64
21 RainToday 145460 non-null object
22 RainTomorrow 145460 non-null object
23 year 145460 non-null int64
24 month 145460 non-null int64
25 month_sin 145460 non-null float64
26 month_cos 145460 non-null float64
27 day 145460 non-null int64
28 day_sin 145460 non-null float64
29 day_cos 145460 non-null float64
dtypes: datetime64[ns](1), float64(20), int64(3), object(6)
memory usage: 33.3+ MB
target = data['RainTomorrow']
features.describe().T
# Creating Model
#Assigning X and y the status of attributes and tags
X = features.drop(["RainTomorrow"], axis=1)
y = features["RainTomorrow"]
X.shape
#Early stopping
early_stopping = callbacks.EarlyStopping(
min_delta=0.001, # minimium amount of change to count as an improvement
patience=20, # how many epochs to wait before stopping
restore_best_weights=True,
)
# Initialising the NN
model = Sequential()
# layers
Output:
Epoch 1/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.5967 - accuracy:
0.7805 - val_loss: 0.3964 - val_accuracy: 0.7860
Epoch 2/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4413 - accuracy:
0.7919 - val_loss: 0.3860 - val_accuracy: 0.8388
Epoch 3/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4290 - accuracy:
0.8257 - val_loss: 0.3761 - val_accuracy: 0.8400
Epoch 4/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4174 - accuracy:
0.8295 - val_loss: 0.3712 - val_accuracy: 0.8421
Epoch 5/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4137 - accuracy:
0.8327 - val_loss: 0.3693 - val_accuracy: 0.8436
Epoch 6/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4091 - accuracy:
0.8338 - val_loss: 0.3669 - val_accuracy: 0.8443
Epoch 7/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4082 - accuracy:
0.8348 - val_loss: 0.3665 - val_accuracy: 0.8441
Epoch 8/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4049 - accuracy:
0.8354 - val_loss: 0.3650 - val_accuracy: 0.8439
Epoch 9/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4020 - accuracy:
0.8357 - val_loss: 0.3642 - val_accuracy: 0.8441
Epoch 10/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3977 - accuracy:
0.8363 - val_loss: 0.3635 - val_accuracy: 0.8445
Epoch 11/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3984 - accuracy:
0.8353 - val_loss: 0.3615 - val_accuracy: 0.8445
Epoch 12/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3953 - accuracy:
0.8368 - val_loss: 0.3618 - val_accuracy: 0.8443
Epoch 13/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3975 - accuracy:
0.8340 - val_loss: 0.3608 - val_accuracy: 0.8444
Epoch 14/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3908 - accuracy:
0.8373 - val_loss: 0.3597 - val_accuracy: 0.8449
Epoch 15/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3859 - accuracy:
0.8383 - val_loss: 0.3597 - val_accuracy: 0.8445
Epoch 16/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3899 - accuracy:
0.8355 - val_loss: 0.3593 - val_accuracy: 0.8433
Epoch 17/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3889 - accuracy:
0.8364 - val_loss: 0.3581 - val_accuracy: 0.8441
Epoch 18/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3924 - accuracy:
0.8336 - val_loss: 0.3580 - val_accuracy: 0.8438
Epoch 19/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3886 - accuracy:
0.8361 - val_loss: 0.3582 - val_accuracy: 0.8431
Epoch 20/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3860 - accuracy:
0.8352 - val_loss: 0.3578 - val_accuracy: 0.8421
history_df = pd.DataFrame(history.history)
plt.show()
history_df = pd.DataFrame(history.history)
print(classification_report(y_test, y_pred))
# loading the image from the path and then converting them into
# grayscale for easier covnet prob
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
# final step-forming the training data list with numpy array of the images
training_data.append([np.array(img), np.array(label)])
# shuffling of the training data to preserve the random state of our data
shuffle(training_data)
shuffle(testing_data)
np.save('test_data.npy', testing_data)
return testing_data
'''Running the training and the testing in the dataset for our model'''
train_data = create_train_data()
test_data = process_test_data()
# train_data = np.load('train_data.npy')
# test_data = np.load('test_data.npy')
'''Creating the neural network using tensorflow'''
# Importing the required libraries
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
import tensorflow as tf
tf.reset_default_graph()
convnet = input_data(shape =[None, IMG_SIZE, IMG_SIZE, 1], name ='input')
fig = plt.figure()
img_num = data[1]
img_data = data[0]
y = fig.add_subplot(4, 5, num + 1)
orig = img_data
data = img_data.reshape(IMG_SIZE, IMG_SIZE, 1)
# model_out = model.predict([data])[0]
model_out = model.predict([data])[0]
import numpy as np
import matplotlib.pyplot as plt
class ReccurentNN:
def __init__(self, char_to_idx, idx_to_char, vocab, h_size=75,
seq_len=20, clip_value=5, epochs=50, learning_rate=1e-2):
self.n_h = h_size
self.seq_len = seq_len # number of characters in each batch/time steps
self.clip_value = clip_value # maximum allowed value for the gradients
self.epochs = epochs
self.learning_rate = learning_rate
self.char_to_idx = char_to_idx # dictionary that maps characters to an index
self.idx_to_char = idx_to_char # dictionary that maps indices to characters
self.vocab = vocab # number of unique characters in the training text
# smoothing out loss as batch SGD is noisy
self.smooth_loss = -np.log(1.0 / self.vocab) * self.seq_len
# initialize parameters
self.params = {}
X_batch = []
y_batch = []
for i in X_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][i] = 1
X_batch.append(one_hot_char)
for j in y_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][j] = 1
y_batch.append(one_hot_char)
return X_batch, y_batch
self.ho = h[t]
return y_pred, h
def _backward_pass(self, X, y, y_pred, h):
dh_next = np.zeros_like(h[0])
for t in reversed(range(self.seq_len)):
dy = np.copy(y_pred[t])
dy[0][np.argmax(y[t])] -= 1 # predicted y - actual y
# find the char with the index and concat to the output string
char = self.idx_to_char[index]
res += char
return res
def train(self, X):
J = []
num_batches = len(X) // self.seq_len
X_trimmed = X[:num_batches * self.seq_len] # trim end of the input text so that we have full
sequences
X_encoded = self._encode_text(X_trimmed) # transform words to indices to enable
processing
for i in range(self.epochs):
for j in range(0, len(X_encoded) - self.seq_len, self.seq_len):
X_batch, y_batch = self._prepare_batches(X_encoded, j)
y_pred, h = self._forward_pass(X_batch)
loss = 0
for t in range(self.seq_len):
loss += -np.log(y_pred[t][0, np.argmax(y_batch[t])])
self.smooth_loss = self.smooth_loss * 0.999 + loss * 0.001
J.append(self.smooth_loss)
self._backward_pass(X_batch, y_batch, y_pred, h)
self._update()
print('Epoch:', i + 1, "\tLoss:", loss, "")
return J, self.params
with open('Harry-Potter.txt') as f:
text = f.read().lower()
# use only a part of the text to make the process faster
text = text[:20000]
# text = [char for char in text if char not in ["(", ")", "\"", "'", ".", "?", "!", ",", "-"]]
# text = [char for char in text if char not in ["(", ")", "\"", "'"]]
chars = set(text)
vocab = len(chars)
# print(f"Length of training text {len(text)}")
# print(f"Size of vocabulary {vocab}")
parameter_dict = {
'char_to_idx': char_to_idx,
'idx_to_char': idx_to_char,
'vocab': vocab,
'h_size': 75,
'seq_len': 20,
# keep small to avoid diminishing/exploding gradients
'clip_value': 5,
'epochs': 50,
'learning_rate': 1e-2,
}
model = ReccurentNN(**parameter_dict)
loss, params = model.train(text)
plt.figure(figsize=(12, 8))
plt.plot([i for i in range(len(loss))], loss)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.show()
print(model.test(50,10))
OUTPUT:
Epoch: 1 Loss: 56.938160313575075
Epoch: 2 Loss: 49.479841032771944
Epoch: 3 Loss: 44.287300754487774
Epoch: 4 Loss: 42.75894603770088
Epoch: 5 Loss: 40.962449282519785
Epoch: 6 Loss: 41.06907316142755
Epoch: 7 Loss: 39.77795494997328
Epoch: 8 Loss: 41.059521063295485
Epoch: 9 Loss: 39.848893648177594
Epoch: 10 Loss: 40.42097045126549
Epoch: 11 Loss: 39.183043247471126
Epoch: 12 Loss: 40.09713939411275
Epoch: 13 Loss: 38.786694845855145
Epoch: 14 Loss: 39.41259563289025
Epoch: 15 Loss: 38.87094988626352
Epoch: 16 Loss: 38.80896936130275
Epoch: 17 Loss: 38.65301294936609
Epoch: 18 Loss: 38.2922486206415
Epoch: 19 Loss: 38.120326247610286
Epoch: 20 Loss: 37.94743442371039
Epoch: 21 Loss: 37.781826419304245
Epoch: 22 Loss: 38.02242197941186
Epoch: 23 Loss: 37.34639374983505
Epoch: 24 Loss: 37.383830387022115
Epoch: 25 Loss: 36.863261576664286
Epoch: 26 Loss: 36.81717706027801
Epoch: 27 Loss: 35.98781618662626
Epoch: 28 Loss: 34.883143187020806
Epoch: 29 Loss: 35.74233839750379
Epoch: 30 Loss: 34.17457373354039
Epoch: 31 Loss: 34.3659838303625
Epoch: 32 Loss: 34.6155982440106
Epoch: 33 Loss: 33.428021716569035
Epoch: 34 Loss: 33.06226727751935
Epoch: 35 Loss: 33.23334401686566
Epoch: 36 Loss: 32.9818416477839
Epoch: 37 Loss: 33.155764725505655
Epoch: 38 Loss: 32.937205806520474
Epoch: 39 Loss: 32.93063638107538
Epoch: 40 Loss: 32.943368437981256
Epoch: 41 Loss: 32.92520056534523
Epoch: 42 Loss: 32.96074563399301
Epoch: 43 Loss: 32.974579784369666
Epoch: 44 Loss: 32.86483014312194
Epoch: 45 Loss: 33.10532379921245
Epoch: 46 Loss: 32.89950584889016
Epoch: 47 Loss: 33.11303116056217
Epoch: 48 Loss: 32.731237824441756
Epoch: 49 Loss: 32.742918023080314
Epoch: 50 Loss: 32.421869906086144
is othe on. ogofostheodindearidut wlethallle, st oserarey d -lers amoathe y
thasathey at dll tos dn t s med d.). t t ile brs t d g htherive, d ogostare d.
ay shag hythay boumay tey thas ot havininggon
Q.24) Write a Python program to implement GAN, to create a curve resembling a sine
wave. Python library pytorch must be used to set a random generator.
import math
import matplotlib.pyplot as plt
#Implementing the Discriminator, in PyTorch, the neural network models are represented by
classes that inherit from nn.Module
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(2, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 1),
nn.Sigmoid(),
)
def forward(self, x):
output = self.model(x)
return output
#instantiate a Discriminator object
discriminator = Discriminator()
generator = Generator()
# Show loss
if epoch % 10 == 0 and n == batch_size - 1:
print(f"Epoch: {epoch} Loss D.: {loss_discriminator}")
print(f"Epoch: {epoch} Loss G.: {loss_generator}")