CS601 Machine Learning Unit 3
CS601 Machine Learning Unit 3
Course Outcome:
Student will be able to design the CNN algorithms to solve related real-life
problems.
Introduction to Convolution Neural Network
Convolutional neural networks, also known as convnet, or CNNs, are a special
kind of neural network for processing data that has a known grid-like topology
like time series data(1D) or images(2D).
Introduction to Convolution Neural Network
Layers in Convolutional neural networks (CNN),
1. Convolution Layer
2. Pooling Layer
3. Fully Connected Layer
Inspiration:
• Visual Cortex of our brain.
Introduced by
• By Yann Lecun in 1998 at AT & T Lab (For Bank cheque scanning purpose).
• Further Microsoft developed many OCR and handwritten character recognition tools.
• Now a days from CNN used in Facial Recognition and Self driving cars.
Why not use ANN?
• 1. High Computation Cost
• 2. Overfitting
• 3. Loss of imp info like spatial
arrangement of pixels
CNN Intuition
Convolution Operation
Basics of Images
6x6 3x3
Low Resolution B/W Image Filter/Kernal
Working with colour images
Convolution Operation
Convolution Operation
Padding
We can observe that the size of output is smaller than input. To
maintain the dimension of output as in input, we use padding.
Padding is a process of adding zeros to the input matrix symmetrically.
In the following example, the extra grey blocks denote the padding. It is
used to make the dimension of output same as input.
Padding
Formula After Padding
The original image is scanned with multiple convolutions and ReLU layers for
locating the features`
Pooling in CNN
• In Convolutional Neural Networks (CNNs), pooling layers downsample feature maps, reducing
spatial dimensions while retaining crucial information, which helps in reducing computational
complexity and preventing overfitting
Pooling in CNN
There are two types of poolings that are used:
1. Max pooling: This works by selecting the maximum value from every pool. Max
Pooling retains the most prominent features of the feature map, and the returned
image is sharper than the original image.
2. Average pooling: This pooling layer works by getting the average of the pool.
Average pooling retains the average values of features of the feature map. It
smoothes the image while keeping the essence of the feature in an image.
Flattening
The next step in the process is flattening, where all the resulting 2D
arrays from the pooled feature maps are converted into a single,
continuous linear vector.
This flattened vector is then passed as input to the fully connected
layer for image classification.
The flatten layer
The flatten layer is a component of the convolutional neural networks (CNN's). A
complete convolutional neural network can be broken down into two parts:
• CNN: The convolutional neural network that comprises the convolutional layers.
The flatten layer lies between the CNN and the ANN, and its job is to convert the
output of the CNN into an input that the ANN can process, as we can see in the
diagram below.
Dense layer
• In neural networks, a "dense layer," also known as a "fully connected layer," is a
layer where each neuron receives input from every neuron in the preceding layer,
forming a complete network of connections
Softmax/Logistic Layer
The Softmax or Logistic layer is the final layer in a CNN, positioned at the end of
the fully connected layer. This layer plays a key role in classification.
• Logistic Function: Used for binary classification tasks, where there are only two
possible classes (e.g., cat or not cat). It outputs a probability score between 0 and
1, helping to decide which of the two classes the input belongs to.
• Softmax Function: Used for multi-class classification tasks, where there are
more than two possible classes (e.g., identifying digits 0–9 in MNIST). Softmax
converts the output into a probability distribution across all classes, with the
highest probability indicating the predicted class.
In summary, the choice between Logistic and Softmax depends on the type of
classification required — binary or multi-class.
Output Layer
The output layer provides the final prediction as a one-hot encoded label.
In one-hot encoding, each possible class is represented as a unique vector
where only one element is “1” (indicating the chosen class) and all others
are “0.” For example, in a classification task with three classes (say, “cat,”
“dog,” and “bird”), the output for each class would be represented as
follows:
• Cat: [1, 0, 0]
• Dog: [0, 1, 0]
• Bird: [0, 0, 1]
Loss Layer
In Convolutional Neural Networks (CNNs), the loss layer, or loss function, quantifies
the difference between the model's predictions and the actual (ground truth) labels,
guiding the training process by measuring how well the model is performing and
enabling adjustments to minimize errors.
1x1 Convolution
Initially 1x1 convolutions were proposed at Network-in-network(NiN). After they were highly used in
GoogleNet architecture. Main features of such layers:
• Reduce or increase dimensionality
• Apply nonlinearity again after convolution
• Can be considered as “feature pooling”
1x1 Convolution
Inception Network
Building a powerful deep neural network is possible by increasing the number of
layers in a network Two problems with the above approach are that:
1. Increasing the number of layers of a neural network may lead to overfitting
especially if you have limited labeled training data and
2. Increase in the computational requirement.
Inception networks were created with the idea of increasing the capability of a
deep neural network while efficiently using computational resources.
Inception Network
Inception Network
Input channels
In convolutional neural networks (CNNs), input channels are the number of feature
maps (or channels) in the input data (e.g., RGB images have 3 input channels), and
output channels are the number of feature maps generated by a convolutional
layer, determined by the number of filters used.
Example:
If you have an input image with 3 channels (RGB) and use a convolutional layer with
32 filters, the output will have 32 channels (feature maps).
These 32 feature maps will be the input to the next layer in the network.
Transfer Learning (Fine Tuning vs Feature
Extraction)
Problem with training your own model
1. Models are data hungry means you need millions of labeled data.
2. Takes lot of time to train the model.
Solution
• Use pre trained data sets like ImageNet, MNIST
Transfer Learning (Fine Tuning vs Feature
Extraction)
Transfer learning is a machine learning technique in which knowledge gained through one task or
dataset is used to improve model performance on another related task and/or different dataset.
In other words, transfer learning uses what has been learned in one setting to improve
generalization in another setting.
One Shot Learning
One-shot learning is a machine learning approach where a model learns to
classify or recognize objects from a single training example (or very few
examples) per class, making it particularly useful in scenarios with limited
data.
How it works:
One-shot learning often employs techniques like Siamese Networks, which
train two neural networks to compare input examples and learn a similarity
metric.
Other names:
One-shot learning is also known as few-shot learning when the model learns
from a small number of examples, and zero-shot learning when the model
learns from no examples.
One Shot Learning
Dimension Reduction
Dimensionality reduction is a method for representing a given dataset using a
lower number of features (that is, dimensions) while still capturing the original
data's meaningful properties.
Implementation of CNN with TensorFlow, Keras
Handwritten Digit (0-9) Classification
mnist = tf.keras.datasets.mnist
x_train
x_train.shape
x_train[0].shape
Implementation of CNN with TensorFlow, Keras
Handwritten Digit (0-9) Classification
x_train[0]
y_train
y_train.shape
y_train[0].shape
y_train[0]
x_test
x_test.shape
Implementation of CNN with TensorFlow, Keras
Handwritten Digit (0-9) Classification
x_test[0].shape
x_test[0]
y_test
y_test.shape
y_test[0].shape
y_test[0]
x_test = x_test/255
x_train = x_train/255
Implementation of CNN with TensorFlow, Keras
Handwritten Digit (0-9) Classification
model.summary()
predictions = model.predict([x_testr])
print(predictions)
print(np.argmax(predictions[0]))
plt.imshow(x_test[0])
import cv2
import numpy as np
import matplotlib.pyplot as plt
Implementation of CNN with TensorFlow, Keras
Handwritten Digit (0-9) Classification
img = cv2.imread('two.png’)
plt.imshow(img)
img.shape
gray.shape
resized.shape
newimg.shape
predictions = model.predict(newimg)
print(np.argmax(predictions))