0% found this document useful (0 votes)
12 views

ML Final Prac

Machine Learning Lab file, 2024, semester 5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ML Final Prac

Machine Learning Lab file, 2024, semester 5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Machine Learning Final Lab File

By. Harshanth Raja, CSE-2


Roll no.: 2022UCS1631
COCSC17
Group -2
Lab Activity – 1

Aim: To understand the basic concepts of the libraries in python:


1. NumPy library for working with numbers
2. Pandas library for working with frames and layouts

Introduction:
We know Python provides us with many libraries for us to work in different
domains to solve different problems and come up with various solutions.
We will study about the two libraries NumPy and Pandas which is mainly used
for understanding the concepts of Machine learning today.

Datatypes in Python:
1. Integer or int: Eg: 1,2,3,4,5
2. String: “Hello World”, “Machine 123”
3. Float: 0.1, 0.11, 0.111
4. Tuple: (12, 34, “Power scheme of India”, 7.41)
5. Dictionary:
data = {'Name': ['John', 'Mary', 'Peter', 'Tom'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'Canada', 'Australia', 'UK']}
6.List: My_list = [1, 2, 3, "hello", True]

NumPy
NumPy is a library for numerical computing in Python. It provides fast and efficient
multidimensional array operations. Arrays are the main data structure in NumPy.
They can be created using the array() function.
Pandas
Pandas is a library for data manipulation and analysis in Python. It provides a fast
and efficient way to work with structured data. Data frames are the main data
structure in Pandas. They can be created using the DataFrame() function.

Exercises:

1. Create a Python function that takes two integers as inputs and returns their
sum.

print(“Enter two numbers: “)


a=int(input(a))
b=int(input(b))
print(a+b)
2. Create a NumPy array with 10 random integers between 1 and 100.

import numpy as np

# Create a NumPy array with 10 random integers


between 1 and 100
random_integers = np.random.randint(1, 101, size=10)
print(random_integers)
3. Create a Pandas data frame with the following data:

Name Age Country Food Choice

John 25 USA Burger

Mary 30 Canada Pizza

Peter 35 Australia Pizza

Tom 40 UK Noodles

import pandas as pd

# Data for the DataFrame


data = {
'Name': ['John', 'Mary', 'Peter', 'Tom'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'Canada', 'Australia', 'UK'],
'Food Choice': ['Burger', 'Pizza', 'Pizza', 'Noodles']
}

# Create the DataFrame


df = pd.DataFrame(data)

# Display the DataFrame


print(df)
4. Filter the Pandas data frame from Exercise 3 to include only the rows where
Age is greater than 30.

import pandas as pd

# Data for the DataFrame


data = {
'Name': ['John', 'Mary', 'Peter', 'Tom'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'Canada', 'Australia', 'UK'],
'Food Choice': ['Burger', 'Pizza', 'Pizza', 'Noodles']
}

# Create the DataFrame


df = pd.DataFrame(data)

# Filter the DataFrame to include only rows where Age


is greater than 30
filtered_df = df[df['Age'] > 30]
# Display the filtered DataFrame
print(filtered_df)

5. Calculate the median age of the Pandas data frame from Exercise 3
import pandas as pd

# Data for the DataFrame


data = {
'Name': ['John', 'Mary', 'Peter', 'Tom'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'Canada', 'Australia', 'UK'],
'Food Choice': ['Burger', 'Pizza', 'Pizza',
'Noodles']
}

# Create the DataFrame


df = pd.DataFrame(data)

# Calculate the median age


median_age = df['Age'].median()
# Display the median age
print("The median age is:", median_age)
Lab Assignment 3 Linear Regression

1. Test with cv = 10 and scoring method as Mean


absolute Error using Linear Regression with sample
size of 30%.

import pandas as pd
import numpy as np
import matplotlib.pyplot as ply
from sklearn.datasets import fetch_california_housing
df = fetch_california_housing()
pd.DataFrame(df.data)
dataset.columns = df.feature_names
X=dataset
Y=df.target
#train test split
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size = 0.3)
#Standardizing the data using scaler class
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
#importing training model and cross validation score to predict
accuracy of the model
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
#making objects
reg = LinearRegression()
reg.fit(X_train, Y_train)
#check cross value score
mean_score = cross_val_score(reg, X_train, Y_train,
scoring='neg_mean_absolute_error', cv=10)
np.mean(mean_score)
#prediction
reg_pred = reg.predict(X_test)
#plotting displot for cv = 10 and scoring = 'neg_mean_absolute_error'
import seaborn as sns
sns.displot(Y_test - reg_pred,kind='kde').

2. Test with cv = 10 and scoring method as Mean


absolute Error using Linear Regression with sample
size of 40%.

import pandas as pd
import numpy as np
import matplotlib.pyplot as ply
from sklearn.datasets import fetch_california_housing
df = fetch_california_housing()
pd.DataFrame(df.data)
dataset.columns = df.feature_names
X=dataset
Y=df.target
#train test split
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size = 0.4)
#Standardizing the data using scaler class
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
#importing training model and cross validation score to predict
accuracy of the model
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
#making objects
reg = LinearRegression()
reg.fit(X_train, Y_train)
#check cross value score
mean_score = cross_val_score(reg, X_train, Y_train,
scoring='neg_mean_absolute_error', cv=10)
np.mean(mean_score)
#prediction
reg_pred = reg.predict(X_test)
#plotting displot for cv = 10 and scoring = 'neg_mean_absolute_error'
import seaborn as sns
sns.displot(Y_test - reg_pred,kind='kde').

Lab activity – 4 Logistic Regression

Implementaion code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df=sns.load_dataset('iris')
df.head()
df
df.species.unique()
df.isnull().sum()
sns.scatterplot(x='petal_length', y='species', data=df,
hue='species')
plt.show()
sns.scatterplot(x='petal_width', y='species', data=df,
hue='species')

plt.show()
sns.scatterplot(x='sepal_width', y='species', data=df,
hue='species')

plt.show()
sns.scatterplot(x='sepal_length', y='species', data=df,
hue='species')

plt.show()
df[df['species']!='setosa']
df=df[df['species']!='setosa']
df
df['species'].map({'versicolor':0,'virginica':1})
X=df.iloc[:,:-1]
y=df.iloc[:,-1]
X
y
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.25,
random_state=42)
x_train
x_test
len(x_test)
len(X)
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(x_train,y_train)
y_pred=model.predict(x_test)
test
from sklearn.metrics import
accuracy_score,classification_report
score=accuracy_score(y_test,y_pred)
print(score)
print(classification_report(y_test,y_pred))

LAB ASSIGNMENT – 5

Principal Component Analysis

We are going to perform Principal Component Analysis


(PCA) on the Breast Cancer Dataset available in scikit-learn

1.

We are familiar with matlotplib, numpy, and seaborn.


%matplotlib inline : It is a command specific to Jupyter Notebooks. It ensures that
all the plots you create using matplotlib will be displayed directly inside the Jupyter
Notebook, instead of in a separate window.

2.
We import the breast cancer dataset from scikit-learn and store it in cancer_data.
cancer_data is a dictionary like object.
The keys() method returns a list of all the keys in the cancer_data object. These
keys represent different parts of the dataset, such as:
● 'data': The features of the dataset (i.e., the input variables).
● 'target': The labels (i.e., the output variable, indicating whether the tumor is
malignant or benign).
● 'target_names': The names corresponding to the labels.
● 'feature_names': The names of the features.
● 'DESCR': A description of the dataset.
● 'filename': The path to the dataset file (if available).
● 'frame': DataFrame representation of the data (if pandas is

available). 3.
This is a description of the dataset

Dataset Characteristics

Characteris*c Details

Number of 569
instances

Number of 30 numeric features and 1 target variable (class)


a3ributes

Missing Values None

Classes 2 (Malignant and Benign)

Class DistribuCon Malignant : 212 and Benign : 357

Date Collected November, 1995

Donor Nick Street

Original Source UCI Machine Learning Repository

There are 30 numerical features. These features are grouped into 3 categories
for each of the 10 original features.
● Mean
● Standard Error
● Worst (Largest Value)
The 10 original features are :
● Radius
● Texture
● Perimeter
● Area
● Smoothness
● Compactness
● Concavity
● Concave Points
● Symmetry
● Fractal Dimension

4.

First 5 rows of the dataset will be displayed

5.

from sklearn.preprocessing import StandardScaler : This line imports the


StandardScaler class from the sklearn.preprocessing module. StandardScaler is
used to standardize features by removing the mean and scaling to unit variance.
We want to ensure that each feature contributes equally to the model.

scaler = StandardScaler() : We create an object of StandarScaler() class


called scaler.
scaler.fit(df) : Mean and Standard Deviation is calculated for each feature and
stored in scaler object.
scaled_data = scaler.transform(df) : We standardize each feature using the mean
and standard deviation calculated during fit() so that each feature has a mean of 0
and a standard deviation of
1. scaled value = (original value - mean)/standard deviation

scaled_data will store the NumPy array containing all the scaled
features.

scaled_data : We display the scaled features.

6.

Principal Component Analysis reduce the number of dimensions (features)


whilst retaining the most important information.

pca = PCA(n_components = 2) : Here , we apply PCA to reduce the number of


dimensions to 2. The first principal component captures the most variance, the
second component capture the next most variance . The 2 principal components are
a combination of the 30 features.

pca.fit(scaled_data) : It compute the principal components of the scaled


data.

data_pca = pca.transform(scaled_data) : It applies the PCA to the data to


reduce the dimensions from 30 to 2. A 2D array will be created and stored in
data_pca.

data_pca : It displays the data in the form of 2 principal components.

7.

It displays the variance explained by the 2 principal components.


The higher the variance explained by a component, the more information from
the original dataset is captured by that component.
8.

First, we set the figure size to 8 inches wide and 6 inches tall
We represent the First Principal Component on x axis and Second
Principal Component on y axis of the scatter plot.

c = cancer_data.target : The color of the points in the scatter plot is determined


by the target variable from the breast cancer dataset. We have 2 classes in the target
(Malignant and Benign).

cmap = 'plasma' : The cmap parameter specifies the colormap to be used. The
'plasma' colormap will apply a gradient of colors to the points based on their class.
This plot is used to visualize how well the PCA transformation has separated the
different classes in the data. If the points form distinct clusters, it indicates that
PCA has captured the essential structure of the data.

Experimenting with PCA


With 3 principal components, we can capture more variance from the original
dataset compared to using only 2 components. This means the 3D representation of
the data will be a more accurate reflection of the original data's structure.

We can see that the array contains 3 elements i.e. 3 principal


componentsThis is a 3D scatter plot consisting of the 3 principal
components.

2. We will retain all 30 dimensions to see how PCA performs without dimension reduction.

We see that the first few principal components capture most of the
variance. The later components capture much less, indicating that they
contribute less to the overall structure of the data.
Hence, we can understand why reducing dimensions (to 2 or 3) might still retain
most of the information.

Linear Discriminant Analysis

We are going to perform Linear Discriminant Analysis on the Wine Dataset


available in scikit-learn.

1.We import the wine dataset from scikit-learn and store it in wine_data.
wine_data is a dictionary like object.
The keys() method returns a list of all the keys in the wine_data object. These
keys represent different parts of the dataset.
2. We store data (features for all samples in a 2D array) in x.
We store target label for each sample in y.
We then get the unique classes using np.unique(y)

There are 3 unique classes. There are 13 features. There are 178 samples

3. We create an object of the Linear Discriminant class and store it in ‘LDA’.


LDA_transformed = LDA.fit_transform(x,y) performs 2 functions :
● fit(x, y): This part of the code trains the LDA model using the feature data x and
target labels y. The LDA algorithm learns how to project the data into a new space
where the different classes are well-separated.
● transform(x): After fitting, this part projects the feature data x onto the new
lower-dimensional space defined by LDA. This transformation reduces the number
of dimensions to ‘Number of classes - 1’ i.e 2 while maximizing the separation
between classes.

4. We get an array in which each value represents the proportion of the total variance that is
explained by each linear discriminant.
It’s a measure of how much information (or variance) about class separability
is retained in the new dimensions

5.First, we set the figure size to 8 inches wide and 6 inches tall.
We represent the First Principal Component on x axis and Second
Principal Component on y axis of the scatter plot.
We have already discussed ‘c’ and ‘cmap’ parameters in PCA.
edgecolors = ‘y’ adds a yellow edge around each point in the scatter
plot.

Each point represents a wine sample.


The color of each point corresponds to its class.
We see that LDA has separated the different classes in the reduced
dimensional space.
1. What will happen if we apply datasets which are not clearly separated ?
Answer : PCA tries to simplify your data by finding the directions where the data
varies the most. It does this without considering what the classes are.PCA might
show you clusters of data, but it won’t necessarily show you clear boundaries
between classes. The plot might show that the data is spread out in certain
directions, but it won’t tell you how well different classes are separated.
LDA looks at the class labels and tries to make sure that data points from different
classes are as far apart as possible. If the classes are not clearly separable (i.e., they
overlap), LDA might still show some separation but won’t completely separate
them. If the classes are too similar or overlap a lot, the plot might still show
overlapping regions.
2. Try with different dataset such as the load_digit dataset.
Answer : load_digits dataset is used for classifying handwritten digits.
Here, we import the load_digits dataset from scikit-learn and store it in digit_data keys()
method return the keys in the dictionary object digits_data.

load_digits contains images of handwritten digits (0-9) that are 8*8 pixels in size. The name
‘pixel_i_j’ refers to the pixel in the ith row and jth column. Since the images are 8*8, there are
64 pixels.
Each pixel has a value that indicates the intensity (brightness) of that pixel, typically
on a scale from 0 (black) to 16 (white).
Hence, the pixels are the feature names.
We store the features in x and the target labels in y.
There are 10 unique classes : 0,1,2,3,4,5,6,7,8,9.
Each class represents a digit from 0 to 9.
There are 64 features i.e pixels.
There are 1797 samples.

We apply the LDA to load_digits dataset to reduce the number of dimensions


to ‘Number of classes – 1’ i.e 9.
We get an array in which each value represents the proportion of the total variance
that is explained by each linear discriminant.
It’s a measure of how much information (or variance) about class separability is
retained in the new dimensions.For visualization purpose, we are using a 2D
scatter plot with 2 Linear Discriminants
3. What is the difference between LDA and PCA ?
Answer :

Principal Component Analysis Linear Discriminant Analysis

Reduces dimensionality by identifying


Reduces dimensionality by finding the
directions (principal components) that
linear combinations of features that best
maximize the variance in the data.
separate different classes.
Used when we want to reduce the number
Used when we want to maximize class
of features while preserving as much
separability and reduce the feature space.
variance (information) as possible.

It does not consider any class labels and


It considers class labels and finds the axes
focuses solely on the variance of the data.
that maximize the separation between
multiple classes.
The principal components are ranked by The linear discriminants are ranked by
how much variance they capture. The first their ability to separate the classes. The
component captures the most variance, first discriminant provides the best
the second captures the next most, and so separation, the second provides the next
on. best, and so on.

4. Why is there only one argument in pca_transform but lda_transform takes 2


arguments ?
Answer : PCA doesn’t need to know anything about class labels since it focuses on
features (dimensions). Hence, only 1 argument i.e features, is required for
pca_transform.
LDA requires 2 arguments i.e features and classes, to separate the classes based on the
target labels ‘y’ . It requires 2 arguments for fit() but only requires the features for
transform().

t-Distributed Stochastic Neighbour Embedding (t-SNE)

We are going to perform t-distributed stochastic neighbour embedding on the Wine Dataset
and the Breast Cancer Dataset available in scikit-learn.

1.

We create an object ‘tsne’ of TSNE class.


t-SNE reduces the number of dimensions in the data while trying to maintain the
structure of the data. The goal is to capture the relationships between data points,
such that points that are close in high-dimensional space stay close in the
lower-dimensional space.
We are reducing the data to 2 dimensions.
random_state = 0 ensures that the t-SNE algorithm produces the same result every time we
try to run it.
tsne.fit_transform(x) first fits the t-SNE model to our data ‘x’ and then transforms
the data into lower dimensional space.
We store the transformed data in tsne_transormed.

Hence, we geta a 2D array where rows represent data points and the 2
columns represent the reduced dimensions.

2. We plot the 2 features of the t-SNE model as a scatter plot to see


Clusters suggest that the data points within each cluster are similar to each other
based on the original features. This could indicate different classes of wine that
share similar characteristics.
Well-separated clusters indicate that the different classes of wine are distinct from
each other based on their features. If clusters overlap, it suggests that the features of
those classes are similar.
If any data points are isolated from the main clusters, these could be outliers and
may represent rare or unusual type of wine.
3. We learnt about standardization of data in the Breast Cancer Dataset .
We have applied the same concept here.
scaled_data is the data after standardization i.e the mean of each feature is 0
and standard deviation is 1.

5. # Plot on next page since there is not enough space on this page
Here, we can see that there is better separation between clusters since all the features contribute
equally to the distance calculations in t-SNE.

Questions based on Standardization


1. What does standardization do to a data ?
Answer : Standardization scales each feature of a dataset such that each feature has a
mean of 0 and a standard deviation of 1.
All features contribute equally to the model.
Visualization is more clear and the effect of different features can be clearly
seen. scaled_data = (original_data – mean)/standard deviation

2. What are Mean and Standard Deviation after standardization ?


Answer : Mean is 0 and Standard Deviation is 1 after Standardization.

3. Should the mean of scaled_data be 0 above ? Why/ Why not ? If yes why is it not zero
above ?
Answer : Here, we are talking about the scaled data of the Breast Cancer Dataset. Ideally, the
mean of scaled_data should be 0 so that each feature can be centered around the origin and
hence contribute equally in the model and it becomes easier to compare the features since
they have the same scale now.

Here, the mean is not exactly 0 but it is very close to 0.


It may be due to the floating point arithmetic and numerical precision limitations of
the computer.

We are asked to apply classification model on the reduced dimensions after applying
the 3 techniques.
Lab Activity – 6 ANN
What is an Artificial Neural Network ?
The term "Artificial neural network" refers to a biologically inspired
sub-field of artificial intelligence modeled after the brain. An Artificial
neural network is usually a computational network based on
biological neural networks that construct the structure of the human
brain. Similar to how a human brain has neurons interconnected to
each other, artificial neural networks also have neurons that are
linked to each other in various layers of the networks.
UNDERSTANDINGS FROM THE NOTEBOOK

Importing the necessary libraries and csv file named “Churn_Modelling” .

Splitting the dataset into two data sets:

X (independent variables) :

● X represents the features or variables that will be used to predict the


outcome.

● dataset.iloc[:, 3:13] is selecting columns 3 to 12 (from index 3 up to,


but not including, 13) from all rows ( : ).

These columns are chosen because the comment suggests that


RowNumber, CustomerId, and Surname (columns 0, 1, 2) are not
useful for predicting if a customer will leave the bank. y (dependent
variable) :

● y is the target or label we want to predict.

● dataset.iloc[:, 13] selects column 13, which likely contains the


information about whether the customer will leave the bank or not
(this is the outcome we want to predict).

This transformation helps machine learning models work better by turning


text categories into numbers.

It converts the "Gender" column into dummy variables


(e.g., Male = 1, Female = 0) and helps to turn text (Gender) into numbers for
machine learning models.

It converts "Geography" and "Gender" columns into numeric dummy


variables and drops the first category in each to avoid redundancy.

It removes the "Geography" and "Gender" columns from the DataFrame X


and axis=1 indicates its dropping columns (not rows).
The code merges the DataFrames horizontally into one larger DataFrame.

The code first splits the data into training and test sets, then it standardizes
the training and test feature sets to ensure they are on the same scale for
better model performance.

APPLYING ANN

The code builds and compiles a basic neural network with an input layer,
two hidden layers, and an output layer for binary classification, using the
adam optimizer and binary cross-entropy loss.
This code trains using 33% of the data for validation, processes the data in
batches of 10 samples, and repeats the training process 50 times, with an
option for early stopping not shown here.

This code plots a graph showing how the model's accuracy on both training
and validation data changes over each epoch of training.
This code plots a graph showing how the model's loss on both training and
validation data changes over each epoch of training.

This code predicts the results for the test data using the model and converts
the predictions to binary values (0 or 1) based on whether they are greater
than 0.5.
Accuracy score is 86.3%.

Exercise :

Q.1] How would we know at which epoch the accuracy is best. Find
a way (write/modify in the above code) so that epochs stop by
themselves when the model reaches peak accuracy.

Answer :
To stop training when the model reaches its best accuracy we can use a
feature called “Early Stopping”. This feature watches the model's
performance on a validation set. It prevents wasting time on more epochs
when the model is already performing well.

It can be implement by making changes to the code as shown below:

Q.2] How can we know the weights that are used by the above
models in their different iterations?

Answer
To see the weights used by models at di\erent times:
1. Save Weights: Save the model's weights to files during training.
2. Load Weights: Load these files to check the weights.
3. Access Weights: Use functions in your machine learning library to
view or extract weights from the model.

Q.3] What is the confusion matrix? What are the values in the
confusion matrix? Find the accuracy analytically by the formula of
accuracy given below and compare it with the accuracy given by
ANN.

ANSWER
A confusion matrix is a table used to evaluate the performance of a
classification model. It shows how well the model's predictions match the
actual labels.

Values in the Confusion Matrix

1. True Positives (TP) : Correctly predicted positive cases.

2. True Negatives (TN) : Correctly predicted negative cases.

3. False Positives (FP) : Incorrectly predicted as positive.

4. False Negatives (FN) : Incorrectly predicted as negative.


Accuracy Formula

The accuracy of a model is calculated using:

Accuracy = ( TP + TN ) / ( TP + TN + FN + FP )
Q.4] Why do we need an activation function? Name some

activation functions with their formulas. Answer

An activation function is needed to:

1. Introduce Non-linearity : It helps the model to learn complex patterns


that are not just linear.

2. Control Output : It determines the output of a neuron and helps in


scaling the output to a specific range.

3. Enable Learning: It allows the neural network to make decisions and


learn from data more e\ectively.

Common activation functions are Sigmoid , Tanh , ReLU (Rectified Linear


Unit) , Leaky ReLU .

Q.5] What is Adam optimizer?

Answer :
Adaptive Moment Estimation is an algorithm for optimization technique for
gradient descent. The method is really e\icient when working with large
problems involving a lot of data or parameters. It requires less memory and
is e\icient.
Lab Activity- 7 CNN

What is CNN ?

A Convolutional Neural Network (CNN) is a neural


network designed to analyze images. It uses layers like
convolution and pooling to automatically detect patterns,
such as edges and objects, in images. CNNs are powerful
for tasks like image classification, object detection, and
pattern recognition in visual data.
Understandings from the notebook

Importing the necessary libraries and loading the dataset.

datasets.cifar10.load_data(): It is a method in Tensorflow used to automatically


download the CIFAR10 dataset.
It contains 60,000 32x32 color images (RGB), split into 10 classes as 'airplane',
'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'.

The dataset is split into:


● train_images : 50,000 training images.

● train_labels : Corresponding labels for the training images.


● test_images : 10,000 testing images.

● test_labels : Corresponding labels for the testing images.

The pixel values in images typically range from 0 to 255 (as they are 8-bit RGB images).
To normalize them between 0 and 1, each pixel value is divided by 255.0.

Loading the first 25 images from the dataset along with their class names.

● Three convolutional layers ( Conv2D ) with ReLU activations, responsible


for feature extraction.

● Two max pooling layers (MaxPooling2D ) to downsample the input


and reduce the spatial dimensions.
● The first layer accepts input images of size 32x32x3 .

The model.summary() will display a detailed description of the architecture,


including layer shapes and parameters.
model.add(layers.Flatten()):

● Converts the multi-dimensional output (from the convolutional layers) into a 1D


vector, so it can be fed into fully connected layers.

model.add(layers.Dense(64, activation='relu')):

● Adds a fully connected layer with 64 neurons and ReLU activation. This layer
learns patterns from the flattened features. model.add(layers.Dense(10)):

● Adds a final output layer with 10 neurons (one for each class in CIFAR-10). The
raw outputs (logits) from this layer are typically passed to a softmax function to
predict class probabilities.
Training the model

Evaluating the model


Accuracy of the model is 70.70%.

Questions
Q.1] An input image has been converted into a matrix of size 12 X 12 along with a filter
of size 3 X 3 with a Stride of 1. Determine the size of the convoluted matrix.

Answer :

Size of the convoluted matrix is: =((Input Size - Filter size + 2 *


Padding)/S)+ 1

On solving size of the convoluted matrix is: 10

Q.2] In the above question do you think there will be a loss of information?
Why / Why not ? What measures can we take to prevent that?

Answer:

Yes, there will be a loss of information because the filter is smaller than the input
matrix, and there is no padding applied. It will lead to loss in the edge information.
Measures to prevent loss:
1. Use Padding: Add zero-padding around the input matrix to maintain the input
size. This ensures that the edges of the input are considered during convolution.
2. Use larger filters or smaller strides: Smaller strides or larger filters capture more
detailed patterns.

Q.3] Explain the significance of the RELU Activation function in the Convolution Neural
Network.

Answer:

Significance of Rectified Linear Unit in CNN is to

1. Non-linearity : ReLU introduces non-linearity into the model, which allows the
network to learn complex patterns. Without this, the CNN would behave like a
linear model, limiting its learning capability.

2. Ejiciency : ReLU is computationally ejicient because it involves simple


thresholding, which speeds up the training process.

3. Sparse Activation: ReLU outputs zero for all negative inputs, which introduces
sparsity, making the model more ejicient and potentially less prone to
overfitting.

Q.4] What is the dijerence between a convolution layer and a pooling layer?

Answer:

1. Operation:

Convolution Layer: Performs element-wise multiplication between the input


and a set of learnable filters followed by a summation, generating feature maps.

Pooling Layer: Takes small regions of the feature map (like 2x2 or 3x3) and
applies a down-sampling operation, such as selecting the maximum value (max
pooling) or averaging (average pooling).

2. Learning :

Convolution Layer: The weights of the filters are learned during the training
process.
Pooling Layer: No learning takes place; it's a fixed operation used to reduce the
size of the feature map.

3. Impact on Feature Maps :

Convolution Layer: Helps extract local patterns from the input.

Pooling Layer: Helps in reducing the feature map size, making the network more
computationally ejicient and reducing overfitting.

Q.5] A formula is given for evaluating the size of the convoluted matrix, but it precludes
stride and padding. Give a general formula for computing size of convoluted matrix,
which takes into account stride and padding also.

Answer :

Size of the convoluted matrix is :


((Input size - Filter size + 2 * Padding)/S)+ 1
Lab activity – 8 Naïve Bayes

Q. Applying naive bayes on Social_network_ads.csv ?


Code:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Load the dataset


df = pd.read_csv('Social_Network_Ads.csv')

# Drop 'User ID' column


df = df.drop(['User ID'], axis=1)

# Encode 'Gender' column


labelencoder = LabelEncoder()
df['Gender'] = labelencoder.fit_transform(df['Gender'])

# Split data into features (X) and target (y)


X = df[['Gender', 'Age', 'EstimatedSalary']].values
y = df['Purchased'].values

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=0)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and train the Naive Bayes classifier


classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predict the test set results


y_pred = classifier.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification report
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)
Output:-
Q. Why we are getting 3X3 confusion matrix rather than 2X2?
In a confusion matrix, the dimensions are determined by the
number of classes in the target variable.
For the Wine dataset, there are 3 classes representing di`erent
types of wine, typically labeled as 0, 1, and 2. When you apply a
classifier to this dataset, the confusion matrix will reflect this by
being of size 3 x 3, corresponding to the three di`erent classes.
Explanation of a 3x3 Confusion Matrix
If your confusion matrix is 3x3, it means:
● Rows represent the actual classes of the wine samples.
● Columns represent the predicted classes by the classifier.
A 3x3 matrix looks like this:

Actual False True False


Class 1 Positives Positives Negatives
(FP) (TP) (FN)

Actual False False True


Class 2 Positives Positives Positives
(FP) (FP) (TP)

Predicted Predicted Predicted


Class 0 Class 1 Class 2
Actual True False False
Class 0 Negatives Negatives Negatives
(TN) (FN) (FN)

● Diagonal entries (e.g., [0,0], [1,1], [2,2]) represent


correctly classified samples.
● OM-diagonal entries represent misclassifications.

Why You Might Expect a 2x2 Matrix


If you're familiar with binary classification problems, where the
target has only two classes (e.g., 0 and 1), you might expect a 2x2
confusion matrix. However, with multi-class datasets like the
Wine dataset, the confusion matrix grows to n x n, where n is the
number of classes in the dataset.
Lab activity – 9 Decision Trees

Q1: How is Post-Pruning DiHerent from Pre-Pruning?


● Pre-Pruning (Early Stopping): This is the process of
stopping the growth of a decision tree before it becomes
overly complex. Pre-pruning places constraints on the tree-
building process, such as:
○ Limiting the maximum depth of the tree.
○ Setting a minimum number of samples required to split
a node.
○ Restricting the minimum number of samples required
in leaf nodes.
● By restricting the tree's growth, we aim to avoid overfitting
early on. Pre-pruning is computationally more e`icient
because it stops unnecessary splits during training.
● Post-Pruning (Pruning after Tree Growth): This involves
growing the full decision tree without any constraints,
allowing it to overfit, and then pruning it back. During post-
pruning, nodes or branches that add little predictive power
are removed based on a validation set or cross-validation to
reduce overfitting. Post-pruning typically improves accuracy
by simplifying the tree after it’s fully grown, though it can be
more computationally intensive than pre-pruning.

Q2: How Do We Stop the Depth of the Decision Tree at,


Let's Say, 2?
If you're using scikit-learn in Python, you can control the depth of
the decision tree by setting the max_depth parameter when
creating the model. Setting max_depth=2 will stop the tree from
growing beyond two levels of depth.
Code:-
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Create a Decision Tree with max depth of 2


clf = DecisionTreeClassifier(max_depth=2, random_state=42)
clf.fit(X_train, y_train)

# Check the depth of the trained tree


print("Tree Depth:", clf.get_depth())

Output:-

You might also like