0% found this document useful (0 votes)

9 views

Convolutional Neural Networks

Uploaded by

Yasha Wakhle

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Convolutional Neural Networks

Uploaded by

Yasha Wakhle

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Convolutional Neural Networks

Introduction

 A convolutional neural network (or ConvNet) is a type of feed-forward artificial neural network
 The architecture of a ConvNet is designed to take advantage of the 2D structure of an input image.

 
 A ConvNet is comprised of one or more convolutional layers (often with a pooling step) and then
followed by one or more fully connected layers as in a standard multilayer neural network.

Page  2
Motivation behind ConvNets

 Consider an image of size 200x200x3 (200 wide, 200 high, 3 color channels)
– a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 200*200*3
= 120,000 weights.
– Due to the presence of several such neurons, this full connectivity is waste and the huge number of
parameters would quickly lead to overfitting

 However, in a ConvNet, the neurons in a layer will only be connected to a small region of the layer
before it, instead of all of the neurons in a fully-connected manner.
– the final output layer would have dimensions 1x1xN, because by the end of the ConvNet architecture we will
reduce the full image into a single vector of class scores (for N classes), arranged along the depth
dimension
 Vanishing Gradient Problem in MLP

Page  3
MLP VS ConvNet

Input Input
Hidden Hidden

Output Output

Multilayered Convolutional
Perceptron: Neural Network
All Fully With Partially
Connected Connected
Layers Convolution Layer

Page  4
MLP vs ConvNet

 A regular 3-layer Neural

Network.

 A ConvNet arranges its

neurons in three dimensions
(width, height, depth), as
visualized in one of the
layers.

Page  5
How ConvNet Works

 For example, a ConvNet takes the input as an image which can be classified as „X‟ or „O‟

A two-dimensional
array of pixels

CNN X or O
 In a simple case, „X‟ would look like:

Page  6
How ConvNet Works

 What about trickier cases?

CNN
X

CNN
O

Page  7
How ConvNet Works – What Computer Sees

-1 -1 -1 -1 -1 -1 -1 -1 -1

? -1 -1 -1 -1 -1 -1 -1 -1 -1

=
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

Page  8
How ConvNet Works

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

=
x
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

Page  9
How ConvNet Works – What Computer Sees

 Since the pattern doesnot match exactly, the computer will not be able to classify this as „X‟

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1

Page  10
ConvNet Layers (At a Glance)

 CONV layer will compute the output of neurons that are connected to local regions in the input,
each computing a dot product between their weights and a small region they are connected to in
the input volume.

 RELU layer will apply an elementwise activation function, such as the max(0,x) thresholding at
zero. This leaves the size of the volume unchanged.

 POOL layer will perform a downsampling operation along the spatial dimensions (width, height).

 FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1xN],
where each of the N numbers correspond to a class score, such as among the N categories.

Page  11
Recall – What Computer Sees

 Since the pattern doesnot match exactly, the computer will not be able to classify this as „X‟

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
 What got changed?

Page  12
Convolutional Layer

 Convolution layer will work to identify patterns (features) instead of individual pixels

=
Page  13
Convolutional Layer - Filters

 The CONV layer‟s parameters consist of a set of learnable filters.

 Every filter is small spatially (along width and height), but extends through the full depth of the input
volume.
 During the forward pass, we slide (more precisely, convolve) each filter across the width and height
of the input volume and compute dot products between the entries of the filter and the input at any
position.

1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1

Page  14
Multiple Filters
Convolutional Layer – Filters – Computation Example

Input Size (W): 9

Filter Size (F): 3 X 3
Stride (S): 1
9X9 Filters: 1 7X7
Padding (P): 0
-1 -1 -1 -1 -1 -1 -1 -1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

-1 -1 1 -1 -1 -1 1 -1 -1
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
-1 -1 -1 1 -1 1 -1 -1 -1 1 -1 -1
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1
-1 -1 1
= 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1 -1 1 -1 -1 -1 1 -1 -1
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

Feature Map Size = 1+ (W – F + 2P)/S

= 1+ (9 – 3 + 2 X 0)/1 = 7
Page  23
Convolutional Layer – Filters – Output Feature Map

 Output Feature -1 -1 -1 -1 -1 -1 -1 -1 -1
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-1 1 -1 -1 -1 -1 -1 1 -1
Map of One -1 -1 1 -1 -1 -1 1 -1 -1
1 -1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

-1 -1 -1 1 -1 1 -1 -1 -1
complete
=
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

convolution: -1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1 -1 1 -1 -1 -1 1 -1 -1
– Filters: 3 -1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-0.11

0.33
0.11

-0.11
-0.11

0.55
0.33

0.33
-0.11

0.11
1.00

-0.11
-0.11

0.77

– Filter Size: 3 X 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 1 -1 -1 -1 -1 -1 1 -1
– Stride: 1 -1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

=
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11

-1 -1 -1 1 -1 1 -1 -1 -1
 Conclusion: -1 -1 1 -1 -1 -1 1 -1 -1
1 -1 1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11

-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

-1 1 -1 -1 -1 -1 -1 1 -1
– Input Image:
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
-1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

9X9 -1 1 -1 -1 -1 -1 -1 1 -1 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
-1 -1 1 -1 -1 -1 1 -1 -1
– Output of -1 -1 1
=
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
Convolution: -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1 -1 -1 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

7X7X3 -1 1 -1 -1 -1 -1 -1 1 -1
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-1 -1 -1 -1 -1 -1 -1 -1 -1
Page  24
Convolutional Layer – Output

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

-1 -1 1 -1 -1 -1 1 -1 -1
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11

-1 -1 -1 1 -1 1 -1 -1 -1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11

-1 -1 1 -1 -1 -1 1 -1 -1
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
-1 1 -1 -1 -1 -1 -1 1 -1
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
-1 -1 -1 -1 -1 -1 -1 -1 -1

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

Page  25
Rectified Linear Units (ReLUs)

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

Page  28
Rectified Linear Units (ReLUs)

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

Page  29
Rectified Linear Units (ReLUs)

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

Page  30
Pooling Layer

 The pooling layers down-sample the previous layers feature map.

 Its function is to progressively reduce the spatial size of the representation to reduce the amount of
parameters and computation in the network

 The pooling layer often uses the Max operation to perform the downsampling process

Page  31
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00

Page  32
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00 0.33

Page  33
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00 0.33 0.55

Page  34
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00 0.33 0.55 0.33

Page  35
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00 0.33 0.55 0.33

0.33

Page  36
Pooling

 Pooling Filter Size = 2 X 2, Stride = 2

1.00 0.33 0.55 0.33

0.33 1.00 0.33 0.55

0.55 0.33 1.00 0.11

0.33 0.55 0.11 0.77

Page  37
Pooling Layer : Average Pooling
Pooling

1.00 0.33 0.55 0.33

0.33 1.00 0.33 0.55

0.55 0.33 1.00 0.11

0.33 0.55 0.11 0.77

0.55 0.33 0.55 0.33

0.33 1.00 0.55 0.11

0.55 0.55 0.55 0.11

0.33 0.11 0.11 0.33

0.33 0.55 1.00 0.77

0.55 0.55 1.00 0.33

1.00 1.00 0.11 0.55

0.77 0.33 0.55 0.33

Page  39
Layers get stacked

1.00 0.33 0.55 0.33

0.33 1.00 0.33 0.55

0.55 0.33 1.00 0.11

-1 -1 -1 -1 -1 -1 -1 -1 -1
0.33 0.55 0.11 0.77
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
0.55 0.33 0.55 0.33
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 0.33 1.00 0.55 0.11

-1 -1 -1 1 -1 1 -1 -1 -1 0.55 0.55 0.55 0.11

-1 -1 1 -1 -1 -1 1 -1 -1 0.33 0.11 0.11 0.33

-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
0.33 0.55 1.00 0.77

0.55 0.55 1.00 0.33

1.00 1.00 0.11 0.55

0.77 0.33 0.55 0.33

Page  40
Layers Get Stacked - Example

224 X 224 X 3 224 X 224 X 64 112 X 112 X 64

CONVOLUTION POOLING
WITH 64 FILTERS (DOWNSAMPLING)
Page  41
Deep stacking

1.00 0.55
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.55 1.00
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 1.00 0.55
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.55 0.55
-1 -1 -1 -1 -1 -1 -1 -1 -1

0.55 1.00

1.00 0.55

Page  42
Fully connected layer

 Fully connected layers are the normal flat

feed-forward neural network layers. 1.00

0.55

 These layers may have a non-linear 0.55

1.00 0.55
activation function or a softmax activation
1.00
in order to predict classes. 0.55 1.00

1.00

1.00 0.55 0.55

 To compute our output, we simply re- 0.55 0.55 0.55
arrange the output matrices as a 1-D
array. 0.55 1.00
0.55

0.55
1.00 0.55
1.00

1.00

0.55

Page  43
Fully connected layer

 A summation of product of inputs and weights at each output node determines the final prediction

0.55

1.00

0.55

0.55
X
0.55

0.55

1.00

0.55
O
0.55

1.00

Page  44
Putting it all together

-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
X
-1 -1 -1 1 -1 1 -1 -1 -1

O
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1

Page  45
Example of CNN:
ReLu Function
Pooling Layer
Stacking Up The Layers
Example of ConvNet

(37+0-5/2) + 1
= 16 + 1 = 17
Different CNN Architectures

LeNet
Alexnet
VGG-19
ResNet
LeNet5
Sample Exercises:

CNN architecture for image classification: The input images are RGB images with
dimensions 128x128 pixels.

Design a CNN architecture with the following components:

• Two convolutional layers with 3x3 filters, ReLU activation, and 32 filters each.
• Max pooling (2x2) after each convolutional layer.
• A fully connected layer with 128 neurons and ReLU activation.
• Output layer with 10 neurons and softmax activation for multiclass classification.
• Calculate the total number of parameters in the convolutional layers, the fully
connected layer, and the entire network. Draw architecture and show your
calculations step by step.
Hyperparameters (knobs)

 Convolution
– Filter Size
– Number of Filters
– Padding
– Stride
 Pooling
– Window Size
– Stride
 Fully Connected
– Number of neurons

Page  58
Case Studies

 LeNet – 1998
 AlexNet – 2012
 ZFNet – 2013
 VGG – 2014
 GoogLeNet – 2014
 ResNet – 2015

The ImageNet Large Scale Visual Recognition

Challenge (ILSVRC)

Page  59
Deep vs Shallow Networks
What happens when we continue stacking deeper layers on a “plain”
convolutional neural network?

56-layer
Training error

56-layer

Test error
20-layer

20-layer

Iterations Iterations

56-layer model performs worse on both training and test error

-> The deeper model performs worse, but it‟s not caused by overfitting!
Deeper models are harder to optimize
The deeper model should be able to perform at least as well as the shallower
model.
A solution by construction is copying the learned layers from the shallower
model and setting additional layers to identity mapping.
Challenges
• Deeper Neural Networks start to degrade in performance.
• Vanish/Exploding Gradient – May lead for extremely
complex parameters initializations to make it work. Still
might suffer from Vanish/Exploding even for the best
parameters.
• Long training times – Due to too many training parameters.
Partial Solutions for Vanish/Exploding gradients
• Batch Normalization – To rescale the weights over some batch.
• Smart Initialization of weights – Like for example Xavier initialization.
• Train portions of the network individually.
Related Prior Work - Highway networks

• Adding features from previous time steps has been used in

various tasks
• Most notable of these are Highway networks proposed by
Srivastava et al.
• Highway networks feature residual connections Residual
networks have the form
Y = f(x ) + x
• Highway networks have the form

Y = f(x )sigmoid(Wx + b) + x(1 —sigmoid (Wx + b))

Highway networks – cont.
Highway networks enabled information flow from the past but due to the
gating function
ResNet

• A specialized network introduced by Microsoft.

• Connects inputs of layers into farther part of that network to
allow “shortcuts”.
• Simple idea – great improvements with both performance and
train time.
Plain Network

𝑎[𝑙+1]
𝑎[𝑙] 𝑎 [𝑙+2]

𝑧 [𝑙+2] = 𝑊 [𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2]

𝑧 [𝑙+1] = 𝑊 [𝑙+1] 𝑎[𝑙] + 𝑏 [𝑙+1]
“output”
“linear”

𝑎[𝑙+1] = 𝑔(𝑧 [𝑙+1] )

“relu”
𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2
“relu on output”
Residual Blocks

𝑎[𝑙+1]
𝑎[𝑙] 𝑎[𝑙+2]

𝑎[𝑙+1] = 𝑔(𝑧 [𝑙+1] )

“relu”
𝑧 [𝑙+1] = 𝑊 [𝑙+1] 𝑎[𝑙] + 𝑏 [𝑙+1]
“linear”
𝑧 [𝑙+2] = 𝑊 [𝑙+2] 𝑎[𝑙+1] + 𝑏 [𝑙+2]
“output”
𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙
“relu on output plus input”
Skip Connections “shortcuts”

• Such connections are referred as skipped connections or shortcuts. In general

similar models could skip over several layers.
• They refer to residual part of the network as a unit with input and output.
• Such residual part receives the input as an amplifier to its output – The
dimensions usually are the same.
• Another option is to use a projection to the output space.
• Either way – no additional training parameters are used.
Residual Blocks (skip connections)
Deeper Bottleneck Architecture (Cont.)

• Addresses high training time of very deep networks.

• Keeps the time complexity same as the two layered
convolution
• Allows us to increase the number of layers
• Allows the model to converge much faster.
• 152-layer ResNet has 11.3 billion FLOPS while VGG-16/19
nets has 15.3/19.6 billion FLOPS.
Why Do ResNets Work Well?

• Having a “regular” network that is very deep might actually hurt

performance because of the vanishing and exploding gradients
• In most cases, ResNets will simply stop improving rather then decrease
in performance
• 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 = 𝑔(𝑤 𝑙+1 𝑎 𝑙+1 + 𝑏 𝑙 + 𝑎[𝑙] )
• If the layer is not “useful”, L2 regularization will bring it’s parameters
very close to zero, resulting in 𝑎[𝑙+2] = 𝑔 𝑎[𝑙] = 𝑎[𝑙] (when using ReLU)
Why Do ResNets Work Well? (Cont)

• In theory ResNet is still identical to plain networks, but in

practice due to the above the convergence is much faster.
• No additional training parameters introduced.
• No addition complexity introduced.
Training ResNet in practice

• Batch Normalization after every CONV layer.

• Xavier/2 initialization from He et al.
• SGD + Momentum (0.9)
• Learning rate: 0.1, divided by 10 when validation
error plateaus.
• Mini-batch size 256.
• Weight decay of 1e-5.
• No dropout used.
Loss Function

• For measuring the loss of the model a combination of cross-entropy

and softmax were used.
• The output of the cross-entropy was normalized using softmax
function.
Reduce Learning Time with Random Layer Drops

• Dropping layers during training, and using the full network in

testing.
• Residual block are used as network’s building block.
• During training, input flows through both the shortcut and the
weights.
• Training: Each layer has a “survival probability” and is randomly
dropped.
• Testing: all blocks are kept active.
• Re-calibrated according to its survival probability during training.
Thank you!!

Invisible Selling Machine - Ryan Deiss PDF
No ratings yet
Invisible Selling Machine - Ryan Deiss PDF
88 pages
Man 41151 en 06
100% (1)
Man 41151 en 06
56 pages
Fidic Letters by Contractor
No ratings yet
Fidic Letters by Contractor
77 pages
Small Construction Company Profile
100% (4)
Small Construction Company Profile
17 pages
CNNS, Part 1: An Introduction To Convolutional Neural Networks
No ratings yet
CNNS, Part 1: An Introduction To Convolutional Neural Networks
17 pages
Lecture W7ab
No ratings yet
Lecture W7ab
167 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
66 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
[Fall 2024] Images and Convolutions
No ratings yet
[Fall 2024] Images and Convolutions
69 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
CNN-Slides-v1.pptx
No ratings yet
CNN-Slides-v1.pptx
157 pages
CNN-Slides-Part2.pptx
No ratings yet
CNN-Slides-Part2.pptx
69 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Lecture_3
No ratings yet
Lecture_3
48 pages
04-CNN PDF
No ratings yet
04-CNN PDF
170 pages
Lecture-25 - Building - Training CNN
No ratings yet
Lecture-25 - Building - Training CNN
26 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
02 Cnn Slides
No ratings yet
02 Cnn Slides
77 pages
Deep Learning_Lecture 4_CNNs
No ratings yet
Deep Learning_Lecture 4_CNNs
53 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
convolution operation
No ratings yet
convolution operation
23 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
CS 601 Machine Learning Unit 3
No ratings yet
CS 601 Machine Learning Unit 3
37 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
29 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
NN 07
No ratings yet
NN 07
24 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
No ratings yet
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
15 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
mod5
No ratings yet
mod5
96 pages
MLP and CNN
No ratings yet
MLP and CNN
56 pages
Iii Unit - Deeplearning
No ratings yet
Iii Unit - Deeplearning
93 pages
Lect 12 21062023 043845pm 1 03062024 111022am
No ratings yet
Lect 12 21062023 043845pm 1 03062024 111022am
79 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
53 pages
Topic 3ii - Convolutional Neural Network
No ratings yet
Topic 3ii - Convolutional Neural Network
43 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
55 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
Unit-4
No ratings yet
Unit-4
19 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Theory of CNN (Convolutional Neural Network)
No ratings yet
Theory of CNN (Convolutional Neural Network)
4 pages
Why Convolutions?: Till Now in MLP
No ratings yet
Why Convolutions?: Till Now in MLP
38 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
Lec 8
No ratings yet
Lec 8
60 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
1.5+Convolutional+Neural+Networks (1)
No ratings yet
1.5+Convolutional+Neural+Networks (1)
9 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
Unit III
No ratings yet
Unit III
89 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
ANN Unit 4
No ratings yet
ANN Unit 4
66 pages
Unit III
No ratings yet
Unit III
89 pages
What is a Convolutional Neural Network-unit3.docx
No ratings yet
What is a Convolutional Neural Network-unit3.docx
12 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Risk Based Inspection Jacketed Platforms
No ratings yet
Risk Based Inspection Jacketed Platforms
19 pages
Hetronic RX bms2 PWM Operator S Manual 21
No ratings yet
Hetronic RX bms2 PWM Operator S Manual 21
21 pages
Factoring GCF and Difference of Two Squares
No ratings yet
Factoring GCF and Difference of Two Squares
23 pages
Max Allowable Pressure of Pipes and Pipellines CER 04022013 1 (V. S. Kumar Unprotected)
No ratings yet
Max Allowable Pressure of Pipes and Pipellines CER 04022013 1 (V. S. Kumar Unprotected)
74 pages
EEE 311 Course Note
No ratings yet
EEE 311 Course Note
14 pages
SUBMITTED TO: Ashok Prasad BY: Kriti Yadav Varuni Singh Richa Jaipuriar Aditi Singh
No ratings yet
SUBMITTED TO: Ashok Prasad BY: Kriti Yadav Varuni Singh Richa Jaipuriar Aditi Singh
9 pages
10 Cojuangco JR V Palma
No ratings yet
10 Cojuangco JR V Palma
11 pages
Chapter-3 (Research Methodology)
No ratings yet
Chapter-3 (Research Methodology)
13 pages
Jep 26 4 147
No ratings yet
Jep 26 4 147
26 pages
Tang Shiu Kin Victoria Government Secondary School Mock Examination 2019 - 2020 S.6 Mathematics Paper 1 Question-Answer Book
No ratings yet
Tang Shiu Kin Victoria Government Secondary School Mock Examination 2019 - 2020 S.6 Mathematics Paper 1 Question-Answer Book
22 pages
LightHouse2 Charts Installation Instructions 82320-1-EN
No ratings yet
LightHouse2 Charts Installation Instructions 82320-1-EN
6 pages
Kinds of Assessment
No ratings yet
Kinds of Assessment
6 pages
Guidance Sheet Reflective Discussion
No ratings yet
Guidance Sheet Reflective Discussion
4 pages
Types of Admixtures Available in Pakistan and Their
67% (3)
Types of Admixtures Available in Pakistan and Their
10 pages
Class 11 Notes
No ratings yet
Class 11 Notes
5 pages
BIR Ruling (DA-243-00)
No ratings yet
BIR Ruling (DA-243-00)
3 pages
Dikson Limbu Proposal Day...... Last
No ratings yet
Dikson Limbu Proposal Day...... Last
13 pages
Richard Marlowe
No ratings yet
Richard Marlowe
1 page
Booking Confirmation On IRCTC, Train: 15028, 04-Jan-2022, SL, GKP - CPR
No ratings yet
Booking Confirmation On IRCTC, Train: 15028, 04-Jan-2022, SL, GKP - CPR
1 page
BBA Business Statistics
No ratings yet
BBA Business Statistics
4 pages
X0231-Uae-Ecb-Po-00001-03 1
No ratings yet
X0231-Uae-Ecb-Po-00001-03 1
88 pages
Key Words For Resume and Cover Letter Construction CLC 11
No ratings yet
Key Words For Resume and Cover Letter Construction CLC 11
1 page
Marketing Plan A. Description of The Business
No ratings yet
Marketing Plan A. Description of The Business
14 pages
Toyota Forklift 8hbw30 8hbe30 8hbe40 8hbc30 8hbc40 8tb50 Wiring Diagram
No ratings yet
Toyota Forklift 8hbw30 8hbe30 8hbe40 8hbc30 8hbc40 8tb50 Wiring Diagram
22 pages
E Commerce Presentation
100% (1)
E Commerce Presentation
30 pages
Taking A Gamble?: The Valuation Edition
No ratings yet
Taking A Gamble?: The Valuation Edition
20 pages