0% found this document useful (0 votes)
2 views

Lecture 3 Updated

The document discusses Convolutional Neural Networks (CNNs), highlighting their advantages over traditional neural networks by reducing the number of parameters and computational time through convolution operations. It explains key concepts such as convolution kernels, padding, stride, and pooling layers, which contribute to feature extraction and dimensionality reduction. Additionally, it describes the structure of CNNs, including fully connected layers for final output predictions.

Uploaded by

roycetheebanedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 3 Updated

The document discusses Convolutional Neural Networks (CNNs), highlighting their advantages over traditional neural networks by reducing the number of parameters and computational time through convolution operations. It explains key concepts such as convolution kernels, padding, stride, and pooling layers, which contribute to feature extraction and dimensionality reduction. Additionally, it describes the structure of CNNs, including fully connected layers for final output predictions.

Uploaded by

roycetheebanedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

EC 9170

Deep Learning for Electrical &


Computer Engineers

Convolutional networks

Faculty of Engineering, University of Jaffna


Why Convolutional Networks?

Input (RGB image) --> 1000 x 1000 pxl


Input features = 1000 x 1000 x 3 = 3 x 106
If 1st hidden layer has 1000 perceptron, then
Total No. of Weights = 3x 106 x 103 = 3 x 109

Drawbacks of traditional neural networks


• Needs to train a lot of parameters
• Thus, number of operations is high. So, the GPU/CPU is not enough
• Computational time is too long
• Overfitting can occur and it will reduce the performance of the model
Why Convolutional Networks?

• Some patterns are much smaller than the whole image

A neuron does not have to see the whole image to


discover the pattern
Can represent a small region with fewer parameters

“beak” detector
Why Convolutional Networks?
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors and each detector
must “move around”.
“upper-left
beak” detector

They can be compressed


to the same parameters.

“middle beak”
detector
Convolutional Networks
• Convolutional networks also known as convolutional neural
networks or CNNs
• The name “convolutional neural network” indicates that the
network employs a mathematical operation called convolution.
• Convolution is a specialized kind of linear operation.
• Convolutional networks are neural networks that use convolution
instead of general matrix multiplication in at least one layer.
• A convolutional layer has a number of filters that do convolutional
operations.
Convolutional Networks
• The main idea of CNNs is to use kernels or filters

Convolution Kernels
• A kernel is a small 2D matrix whose contents are based upon the operations to be performed.
• A kernel maps on the input image by simple matrix multiplication and addition, the output
obtained is of lower dimensions and therefore easier to work with.
• For input images with 3 or more channels such as RGB a filter is applied
• Filters are one dimension higher than kernels and can be seen as multiple kernels stacked on
each other where every kernel is for a particular channel.
A Convolution Operation
Input
Grey scale image
21 19 17 25 28

71 76 73 68 59

153 164 164 157 155

200 201 190 185 180

205 210 215 230 232

5x5
A Convolution Operation

3x3

(n x n) * (f x f) = (n-f+1) x (n-f+1)
5x5

3x3
21*(-1) + 71*(-1) + 153*(-1) + 19*(-1) +76*8 + 164 *(-1) + 17 *(-1) + 73 *(-1) + 164 *(-1) = (-74)

19*(-1) + 76*(-1) + 164*(-1) + 17*(-1) +73 + 164 *(-1) + 25 *(-1) + 68 *(-1) + 157 *(-1) = (-96)
➢ The convolution operation is responsible for detecting edges and features
from the images

Above is an example of a kernel for applying Sharpen image (enhance the depth of
edges) and edge detection.
Convolution with two filters
These are the network
parameters to be learned.

1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1

……
6 x 6 image
Each filter detects a
small pattern (3 x 3).
1 -1 -1
-1 1 -1 Filter 1
stride=1
-1 -1 1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
Stride is a parameter of the convolution operation that refers to the number of pixels by
which the filter matrix moves across the input matrix.
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2

1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1
-1 1 -1
-1 1 -1 Filter 2
-1 1 -1
Repeat this for each filter
stride=1

1 0 0 0 0 1 3 -1 -3 -1
-1 -1 -1 -1
0 1 0 0 1 0
-3 1 0 -3
0 0 1 1 0 0 -1 -1 -2 1
Feature
1 0 0 0 1 0
-3 -3 Map0 1
0 1 0 0 1 0 -1 -1 -2 1
0 0 1 0 1 0 3 -2 -2 -1
-1 0 -4 3
6 x 6 image
Two 4 x 4 images
Forming 4 x 4 x 2 matrix

For a grey scale image →(n x n x 1) * (f x f x c) = (n-f+1) x (n-f+1) x c


Color image: RGB 3 channels

(n x n x 3) * (f x f x 3) = (n-f+1) x (n-f+1) x 1
Color image: RGB 3 channels

(n x n x 3) * (f x f x 3) = (n-f+1) x (n-f+1) x 1
Padding in Convolutional Neural Networks
• Process of adding additional layers of pixels around the border of an image.
• When we perform a convolution operation, we slide a filter across the image. If we only slide the
filter across the original pixels of the image, the resulting output will be smaller than the input
image. i.e. (n x n) * (f x f) = (n-f+1) x (n-f+1)
• In some cases, it’s beneficial to maintain the spatial dimensions (i.e., the width and the height)
of the output the same as the input.
• By adding a layer of zeros around the border of the image, we can apply the filter to more
positions, effectively preserving the spatial dimensions of the output.
Padding in Convolutional Neural Networks
There are two main types of padding:
1. Valid Padding (No padding): In this case, the filter is applied only to valid positions inside the
image, not going beyond the border. This results in smaller output dimensions.
2. Same Padding: The image is padded with enough zeros around the border so that the output
dimensions after the convolution operation are the same as the input dimensions.
(n x n) → (n+2p) x (n+2p) ;p – number of padding

Padding serves several purposes:


1. Dimension Preservation: As mentioned, padding can help maintain the spatial dimensions of
the input through the layers of the network.
2. Border Information: Padding allows the network to take into account information at the borders
of the image. Without padding, the filters would mostly be applied to the central pixels of the
image, which could lead to losing information from the edges of the image.
Stride in Convolutional Neural Networks
• Stride is a parameter of the convolution operation that refers to the number of pixels by which the
filter matrix moves across the input matrix.
• When the stride is 1, the filter moves across the input matrix 1 pixel at a time. When the stride is 2,
the filter jumps 2 pixels at a time as we slide it around.

𝑛−𝑓 𝑛−𝑓
(n x n ) * (f x f ) = ( +1) x ( +1)
𝑠 𝑠

Floor function – greatest integer which is less than or EQUAL TO the given number
n x n – image size
f x f – filter size
s - stride length
Stride in Convolutional Neural Networks
The stride’s value impacts the model in a few ways:

1. Dimensionality Reduction: Larger stride values result in smaller output dimensions, effectively
performing a form of dimensionality reduction.
2. Computation Speed: Larger strides can also speed up computation, as the filter needs to be
applied fewer times.
3. Model Capacity: However, larger strides may result in the model losing some detailed information
because the filter doesn’t cover every single pixel; it’s skipping over some. This could potentially
reduce model accuracy, particularly in tasks that require capturing fine-grained details.
The whole CNN
cat dog ……
Convolution

Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times

Max Pooling

Flattened
Pooling layer
• In practice, (max) pooling layers are placed after convolutional
layers in a CNN.
• After a convolutional layer extracts features from the input image,
the max pooling layer reduces the spatial size of the convolved
feature map, keeping only the most salient information.
• This process is repeated for multiple convolutional and pooling
layers, allowing the network to learn a hierarchy of features at
various levels of abstraction.
Why Pooling
1. Pooling layers are used to reduce the dimensions of the feature maps. Thus, it
reduces the number of parameters to learn and the amount of computation
performed in the network.
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
Why Pooling
2. Enhances Features
• Types of Pooling:

1. Max Pooling
• Max pooling is a pooling operation that selects the maximum element from
the region of the feature map covered by the filter. Thus, the output after
max-pooling layer would be a feature map containing the most prominent
features of the previous feature map.
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3

3 0 -1 1

3 1 0 3
2. Average Pooling
• Average pooling computes the average of the elements present in the region
of feature map covered by the filter. Thus, while max pooling gives the most
prominent feature in a particular patch of the feature map, average pooling
gives the average of features present in a patch.
3. Global Pooling
• Global pooling reduces each channel in the feature map to a single value.
Thus, an nh x nw x nc feature map is reduced to 1 x 1 x nc feature map.
• This is equivalent to using a filter of dimensions nh x nw i.e. the dimensions of
the feature map.
• Further, it can be either global max pooling or global average pooling.
3
Flattening
0

1
3 0
-1 1 3

3 1 -1
0 3 Flattened

1 Fully Connected
Feedforward network

3
Fully connected (FC) layers (Dense layer)
• Used in artificial neural networks where each neuron or node from
the previous layer is connected to each neuron of the current
layer.
• FC layers are typically found towards the end of a neural network
architecture and are responsible for producing final output
predictions.
Fully connected (FC) layers (Dense layer)
• Key Features:
• In CNNs, FC layers often follow the convolutional and pooling layers. They are
used to flatten the 2D spatial structure of the data into a 1D vector and process it
for tasks like classification.
• The weights and biases in FC layers are learned during the training process,
making them adapt to the specific problem at hand.
• The number of neurons in the final FC layer usually matches the number of
output classes in a classification problem. For instance, in a 10-class digit
classification problem, there would be 10 neurons in the final FC layer, each
outputting a score for one class.
Convolution v.s. Fully Connected

1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image

x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected




0 1 0 0 1 0
0 0 1 0 1 0
x36
• The flattened array will be used as input to the fully connected layer.
• Every neuron of the layer is connected to all the neurons in the previous layer and the next
layer. Thus, it is called a “Fully Connected Layer.”
• The final FC layer/ output layer has neurons equal to labels.
• In the output layer, softmax activation will be used to classify the image
• For binary classification, sigmoid activation will be used
The whole CNN
3 0
-1 1 Convolution

3 1
0 3
Max Pooling
A new image Can
repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling

is the number of filters


The whole CNN
cat dog ……
Convolution

Max Pooling

Fully Connected A new image


Feedforward network
Convolution

Max Pooling

Flattened A new image


CNN for image classification
Layer #1 Layer #2

{
{
CNN in Keras
input

Convolutional
1 -1 -1 layer
-1 1 -1
-1 1 -1
-1 1 -1 … There are
25 3x3 Max Pooling
-1 -1 1
-1 1 -1 … layer
filters.
Input_shape = ( 28 , 28 , 1)

28 x 28 pixels Convolutional
1: black/white, 3: RGB
layer

Max Pooling
3 -1 3 layer

-3 1
CNN in Keras
Input
28 x 28 x 1

Convolution

How many parameters for each filter? 9 26 x 26 x 25 How many total


parameters computed?
Max Pooling
13 x 13 x 25

Convolution
How many parameters 225=
for each filter? 11 x 11 x 50
25x9
Max Pooling
5 x 5 x 50
CNN in Keras
Input
28 x 28 x 1

Output Convolution

26 x 26 x 25
Fully connected Max Pooling
feedforward network
13 x 13 x 25

Convolution
11 x 11 x 50

Max Pooling
1250 5 x 5 x 50
Flattened
Image classification using CNN
Speech recognition using CNN

The filters move in the


Frequency
CNN frequency direction.

Image Time
Spectrogram
Text classification using CNN

Source of image:
http://citeseerx.ist.psu.edu/viewdoc/downlo
ad?doi=10.1.1.703.6858&rep=rep1&type=p
df
The popular CNN

• LeNet, 1998
• AlexNet, 2012
• VGGNet, 2014
• ResNet, 2015
VGGNet

• 16 layers
• Only 3*3
convolutions
• 138 million
parameters
ResNet

• 152 layers
• ResNet50
Computational complexity
• The memory bottleneck
• GPU, a few GB
Question - 1
The CNN architecture has 3 layers, a flattened layer and 3 fully connected layers. The 3 x 3 filters are
32, 64 and 128, used in the 1st, 2nd and 3rd layers. The pooling layer has a 2 x 2 filter. 1024, 512 and
10 neurons are adopted in fully connected layers. This CNN architecture is used to classify the RGB
image with a size of 256 x 256. Unless specified, assume no padding and stride 1 where appropriate.

a. Draw a CNN architecture described above.

b. State the dimensions of outputs of each layer in CNN

c. How many total parameters are in all the layers?

𝑛−𝑓+2𝑃 𝑛−𝑓+2𝑃
(n x n ) * (f x f ) = ( +1) x ( +1)
𝑠 𝑠
Floor function – greatest integer which is less than or EQUAL TO the given number
n x n – image size s - stride length
f x f – filter size P - padding
Activation Volume Bias term for each filter
Layer Number of parameters
Dimensions
Input 256 × 256 × 3 0 Adding up all parameters:
CONV3-32 254 x 254 x 32 (3 × 3 × 3 + 1) × 32 = 896
ReLU 254 x 254 x 32 0 896 + 18,496 + 73,856 +
0
117,965,824 + 524,800 +
POOL-2 127 x 127 x 32
5,130
CONV3-64 125 x 125 x 64 (3 × 3 × 32 + 1) × 64 = 18,496 =118,589,002
ReLU 125 x 125 x 64 0
POOL-2 62 x 62 x 64 0 This is the total number of
(3 × 3 × 64 + 1) × 128 = 73,856
weights and biases in the
CONV3-128 60 x 60 x 128
model.
ReLU 60 x 60 x 128 0
POOL-2 30 x 30 x 128 0
FLATTEN 115200 0
FC-1024 1024 (115200 + 1) × 1024 = 117,965,824
FC-512 512 (1024 + 1) × 512 = 524,800
FC-10 10 (512 + 1) × 10 = 5,130
Question - 2
Consider the convolutional neural network defined by the layers in the left column below. Fill in the
shape of the output volume and the number of parameters at each layer. You can write the
activation shapes in the format (H, W, C), where H, W, C are the height, width and channel
dimensions, respectively. Unless specified, assume padding 1 and stride 1 where appropriate.
Activation Volume
Layer Number of parameters
Dimensions
Input 32 × 32 × 3 0 Adding up all parameters:
CONV3-8 32 x 32 x 8 (3 × 3 × 3 + 1) × 8 = 224
ReLU 32 x 32 x 8 0 224 + 32 + 1,168 + 10,250
=11,674
POOL-2 16 x 16 x 8 0
16 x 16 x 8 8 scales (γ) + 8 shifts (β) + 8 This is the total number of
BATCHNORM
mean + 8 variance = 32 weights and biases in the
CONV3-16 16 x 16 x 16 (3 × 3 × 8 + 1) × 16 = 1,168 model.
ReLU 16 x 16 x 16 0
POOL-2 8 x 8 x 16 0
FLATTEN 1024 0
FC-10 10 (1024 + 1) × 10 = 10,250
Question - 3 assume padding 1 and stride 2 where appropriate.

Activation Volume
Layer Number of parameters
Dimensions
Input 32 × 32 × 3 0
Total Parameters Calculation:
CONV3-8 16 x 16 x 8 (3 × 3 × 3 + 1) × 8 = 224
ReLU 16 x 16 x 8 0 •Convolutional + BatchNorm Layers:
224+32+1,168=1,424
POOL-2 (s = 4) 4x4x8 0
4x4x8 8 scales (γ) + 8 shifts (β) + 8 •Fully Connected Layers:
BATCHNORM 8,704+131,328+2,570=142,602
mean + 8 variance = 32
CONV3-16 2 x 2 x 16 (3 × 3 × 8 + 1) × 16 = 1,168
•Total Parameters: 144,026
ReLU 2 x 2 x 16 0
•Trainable Parameters: 144,010
POOL-2 (s = 4) 1 x 1 x 16 0
FLATTEN 16 0 •Non-Trainable Parameters: 16 (from
FC-512 512 (16 + 1) × 512 = 8,704 batch normalization)
FC-256 256 (512 + 1) × 256 = 131,328
FC-10 10 (256 + 1) × 10 = 2,570
References
1. Goodfellow, I., Bengio, Y., and A., C., “Deep Learning,” MIT Press, 2016.

2. Slides: 6.S191, Dana Erlich, Param Vir Singh, David Gifford, Alexander Amini,
Ava Soleimany.

3. Hung-yi Lee, “Convolutional Neural Network”.

4. https://d2l.ai/chapter_convolutional-neural-networks/index.html.
Thank you!

You might also like