Convolutional Neural Networks
Convolutional Neural Networks
Introduction
A convolutional neural network (or ConvNet) is a type of feed-forward artificial neural network
The architecture of a ConvNet is designed to take advantage of the 2D structure of an input image.
VS
A ConvNet is comprised of one or more convolutional layers (often with a pooling step) and then
followed by one or more fully connected layers as in a standard multilayer neural network.
Page 2
Motivation behind ConvNets
Consider an image of size 200x200x3 (200 wide, 200 high, 3 color channels)
– a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 200*200*3
= 120,000 weights.
– Due to the presence of several such neurons, this full connectivity is waste and the huge number of
parameters would quickly lead to overfitting
However, in a ConvNet, the neurons in a layer will only be connected to a small region of the layer
before it, instead of all of the neurons in a fully-connected manner.
– the final output layer would have dimensions 1x1xN, because by the end of the ConvNet architecture we will
reduce the full image into a single vector of class scores (for N classes), arranged along the depth
dimension
Vanishing Gradient Problem in MLP
Page 3
MLP VS ConvNet
Input Input
Hidden Hidden
Output Output
Multilayered Convolutional
Perceptron: Neural Network
All Fully With Partially
Connected Connected
Layers Convolution Layer
Page 4
MLP vs ConvNet
Page 5
How ConvNet Works
For example, a ConvNet takes the input as an image which can be classified as „X‟ or „O‟
A two-dimensional
array of pixels
CNN X or O
In a simple case, „X‟ would look like:
Page 6
How ConvNet Works
CNN
X
CNN
O
Page 7
How ConvNet Works – What Computer Sees
-1 -1 -1 -1 -1 -1 -1 -1 -1
? -1 -1 -1 -1 -1 -1 -1 -1 -1
=
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Page 8
How ConvNet Works
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
=
x
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Page 9
How ConvNet Works – What Computer Sees
Since the pattern doesnot match exactly, the computer will not be able to classify this as „X‟
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Page 10
ConvNet Layers (At a Glance)
CONV layer will compute the output of neurons that are connected to local regions in the input,
each computing a dot product between their weights and a small region they are connected to in
the input volume.
RELU layer will apply an elementwise activation function, such as the max(0,x) thresholding at
zero. This leaves the size of the volume unchanged.
POOL layer will perform a downsampling operation along the spatial dimensions (width, height).
FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1xN],
where each of the N numbers correspond to a class score, such as among the N categories.
Page 11
Recall – What Computer Sees
Since the pattern doesnot match exactly, the computer will not be able to classify this as „X‟
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
What got changed?
Page 12
Convolutional Layer
Convolution layer will work to identify patterns (features) instead of individual pixels
=
Page 13
Convolutional Layer - Filters
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
Page 14
Multiple Filters
Convolutional Layer – Filters – Computation Example
Output Feature -1 -1 -1 -1 -1 -1 -1 -1 -1
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-1 1 -1 -1 -1 -1 -1 1 -1
Map of One -1 -1 1 -1 -1 -1 1 -1 -1
1 -1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
-1 -1 -1 1 -1 1 -1 -1 -1
complete
=
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
convolution: -1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1 -1 1 -1 -1 -1 1 -1 -1
– Filters: 3 -1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-0.11
0.33
0.11
-0.11
-0.11
0.55
0.33
0.33
-0.11
0.11
1.00
-0.11
-0.11
0.77
-1 1 -1 -1 -1 -1 -1 1 -1
– Stride: 1 -1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
=
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11
-1 -1 -1 1 -1 1 -1 -1 -1
Conclusion: -1 -1 1 -1 -1 -1 1 -1 -1
1 -1 1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1 -1 -1 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
7X7X3 -1 1 -1 -1 -1 -1 -1 1 -1
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
-1 -1 1 -1 -1 -1 1 -1 -1
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
-1 1 -1 -1 -1 -1 -1 1 -1
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
-1 -1 -1 -1 -1 -1 -1 -1 -1
Page 28
Rectified Linear Units (ReLUs)
Page 29
Rectified Linear Units (ReLUs)
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33
Page 30
Pooling Layer
Its function is to progressively reduce the spatial size of the representation to reduce the amount of
parameters and computation in the network
The pooling layer often uses the Max operation to perform the downsampling process
Page 31
Pooling
1.00
Page 32
Pooling
1.00 0.33
Page 33
Pooling
Page 34
Pooling
Page 35
Pooling
0.33
Page 36
Pooling
Page 37
Pooling Layer : Average Pooling
Pooling
Page 39
Layers get stacked
Page 40
Layers Get Stacked - Example
CONVOLUTION POOLING
WITH 64 FILTERS (DOWNSAMPLING)
Page 41
Deep stacking
1.00 0.55
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.55 1.00
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 1.00 0.55
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.55 0.55
-1 -1 -1 -1 -1 -1 -1 -1 -1
0.55 1.00
1.00 0.55
Page 42
Fully connected layer
0.55
1.00
0.55
1.00 0.55
1.00
1.00
0.55
Page 43
Fully connected layer
A summation of product of inputs and weights at each output node determines the final prediction
0.55
1.00
1.00
0.55
0.55
X
0.55
0.55
0.55
1.00
0.55
O
0.55
1.00
Page 44
Putting it all together
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
X
-1 -1 -1 1 -1 1 -1 -1 -1
O
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Page 45
Example of CNN:
ReLu Function
Pooling Layer
Stacking Up The Layers
Example of ConvNet
(37+0-5/2) + 1
= 16 + 1 = 17
Different CNN Architectures
LeNet
Alexnet
VGG-19
ResNet
LeNet5
Sample Exercises:
CNN architecture for image classification: The input images are RGB images with
dimensions 128x128 pixels.
Convolution
– Filter Size
– Number of Filters
– Padding
– Stride
Pooling
– Window Size
– Stride
Fully Connected
– Number of neurons
Page 58
Case Studies
LeNet – 1998
AlexNet – 2012
ZFNet – 2013
VGG – 2014
GoogLeNet – 2014
ResNet – 2015
Page 59
Deep vs Shallow Networks
What happens when we continue stacking deeper layers on a “plain”
convolutional neural network?
56-layer
Training error
56-layer
Test error
20-layer
20-layer
Iterations Iterations
𝑎[𝑙+1]
𝑎[𝑙] 𝑎 [𝑙+2]
𝑎[𝑙+1]
𝑎[𝑙] 𝑎[𝑙+2]