0% found this document useful (0 votes)

7 views

CNN Normalization

Uploaded by

kaushalmeena3003

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

CNN Normalization

Uploaded by

kaushalmeena3003

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

Lecture 7:

Convolutional Networks

Justin Johnson Lecture 7 - 1 January 31, 2022

Lecture Format

Justin Johnson Lecture 7 - 2 January 31, 2022

Lecture Format

Justin Johnson Lecture 7 - 3 January 31, 2022

Lecture Format
- We will remain remote for at least another 2-3 weeks
- Idea: book a conference room for “watch parties?”
Or just use lecture hall
- COVID in MI have (hopefully!) peaked? If they continue to
drop we will consider in-person OH in the next 1-2 weeks
- May revisit after Spring Break
- Feel free to raise hand to ask questions in Zoom!
- Midterm will be remote (but still working on exact format)

Justin Johnson Lecture 7 - 4 January 31, 2022

Reminder: A2

Due last Friday

Justin Johnson Lecture 7 - 5 January 31, 2022

Will be released tonight, covering:

- Backpropagation with modular API

- Different update rules (Momentum, RMSProp, Adam, etc)
- Batch Normalization
- Dropout
- Convolutional Networks

Justin Johnson Lecture 7 - 6 January 31, 2022

Last Time: Backpropagation
During the backward pass, each node in
Represent complex expressions the graph receives upstream gradients
as computational graphs and multiplies them by local gradients to
compute downstream gradients
x
s (scores)
*
hinge
loss
+
L

W
R

Downstream
gradients
f
Local
gradients
Forward pass computes outputs
Upstream
gradient
Backward pass computes gradients

Justin Johnson Lecture 7 - 7 January 31, 2022

f(x,W) = Wx
Problem: So far our classifiers don’t
respect the spatial structure of images!

Stretch pixels into column

56 231
231

Input: 24 2
x W1 h W2 s 24
3072
Output: 10 Input image
Hidden layer: 2
(2, 2)
100
(4,)

Justin Johnson Lecture 7 - 8 January 31, 2022

f(x,W) = Wx
Problem: So far our classifiers don’t
respect the spatial structure of images!
Solution: Define new computational
nodes that operate on images!
Stretch pixels into column

56 231
231

Input: 24 2
x W1 h W2 s 24
3072
Output: 10 Input image
Hidden layer: 2
(2, 2)
100
(4,)

Justin Johnson Lecture 7 - 9 January 31, 2022

Components of a Fully-Connected Network
Fully-Connected Layers Activation Function

x h s

Justin Johnson Lecture 7 - 10 January 31, 2022

Components of a Convolutional Network
Fully-Connected Layers Activation Function

x h s

Convolution Layers Pooling Layers Normalization

𝑥!,# − 𝜇#
𝑥!!,# =
𝜎#$ + 𝜀

Justin Johnson Lecture 7 - 11 January 31, 2022

Components of a Convolutional Network
Fully-Connected Layers Activation Function

x h s

Convolution Layers Pooling Layers Normalization

𝑥!,# − 𝜇#
𝑥!!,# =
𝜎#$ + 𝜀

Justin Johnson Lecture 7 - 12 January 31, 2022

Fully-Connected Layer
32x32x3 image -> stretch to 3072 x 1

Input Output
1 1
10 x 3072
3072 10
weights

Justin Johnson Lecture 7 - 13 January 31, 2022

Fully-Connected Layer
32x32x3 image -> stretch to 3072 x 1

Input Output
1 1
10 x 3072
3072 10
weights
1 number:
the result of taking a dot
product between a row of W
and the input (a 3072-
dimensional dot product)

Justin Johnson Lecture 7 - 14 January 31, 2022

Convolution Layer
3x32x32 image: preserve spatial structure

32 height

32 width
3 depth /
channels
Justin Johnson Lecture 7 - 15 January 31, 2022
Convolution Layer
3x32x32 image

3x5x5 filter

Convolve the filter with the image

32 height i.e. “slide over the image spatially,
computing dot products”

32 width
3 depth /
channels
Justin Johnson Lecture 7 - 16 January 31, 2022
Convolution Layer Filters always extend the full
depth of the input volume
3x32x32 image

3x5x5 filter

Convolve the filter with the image

32 height i.e. “slide over the image spatially,
computing dot products”

32 width
3 depth /
channels
Justin Johnson Lecture 7 - 17 January 31, 2022
Convolution Layer
3x32x32 image

3x5x5 filter

1 number:
32 the result of taking a dot product between the filter
and a small 3x5x5 chunk of the image
(i.e. 3*5*5 = 75-dimensional dot product + bias)
32
3 𝑤!𝑥 + 𝑏
Justin Johnson Lecture 7 - 18 January 31, 2022
Convolution Layer 1x28x28
activation map
3x32x32 image

3x5x5 filter
28

convolve (slide) over

32 all spatial locations

28
32
1
3

Justin Johnson Lecture 7 - 19 January 31, 2022

Convolution Layer two 1x28x28
Consider repeating with activation map
3x32x32 image
a second (green) filter:
3x5x5 filter
28 28

convolve (slide) over

32 all spatial locations

28
32
1 1
3

Justin Johnson Lecture 7 - 20 January 31, 2022

Convolution Layer 6 activation maps,
each 1x28x28
3x32x32 image Consider 6 filters,
each 3x5x5

Convolution
Layer
32

32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 21 January 31, 2022
Convolution Layer 6 activation maps,
each 1x28x28
3x32x32 image Also 6-dim bias vector:

Convolution
Layer
32

32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 22 January 31, 2022
Convolution Layer 28x28 grid, at each
point a 6-dim vector
3x32x32 image Also 6-dim bias vector:

Convolution
Layer
32

32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 23 January 31, 2022
Convolution Layer 2x6x28x28
2x3x32x32 Batch of outputs
Batch of images Also 6-dim bias vector:

Convolution
Layer
32

32 6x3x5x5
3 filters

Justin Johnson Lecture 7 - 24 January 31, 2022

Convolution Layer N x Cout x H’ x W’
N x Cin x H x W Batch of outputs
Batch of images Also Cout-dim bias vector:

Convolution
Layer
H

W Cout x Cinx Kw x Kh
Cout
Cin filters

Justin Johnson Lecture 7 - 25 January 31, 2022

Stacking Convolutions

32 28 26

Conv Conv Conv ….

W1: 6x3x5x5 W2: 10x6x3x3 W3: 12x10x3x3

32 b1: 6 28 b2: 10 26
b3: 12
3 6 10
Input: First hidden layer: Second hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 N x 10 x 26 x 26
Justin Johnson Lecture 7 - 26 January 31, 2022
Q: What happens if we stack
Stacking Convolutions two convolution layers?

32 28 26

Conv Conv Conv ….

W1: 6x3x5x5 W2: 10x6x3x3 W3: 12x10x3x3

32 b1: 6 28 b2: 10 26
b3: 12
3 6 10
Input: First hidden layer: Second hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 N x 10 x 26 x 26
Justin Johnson Lecture 7 - 27 January 31, 2022
Q: What happens if we stack (Recall y=W2W1x is
Stacking Convolutions two convolution layers? a linear classifier)
A: We get another convolution!

32 28 26

Conv Conv Conv ….

W1: 6x3x5x5 W2: 10x6x3x3 W3: 12x10x3x3

32 b1: 6 28 b2: 10 26
b3: 12
3 6 10
Input: First hidden layer: Second hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 N x 10 x 26 x 26
Justin Johnson Lecture 7 - 28 January 31, 2022
Q: What happens if we stack (Recall y=W2W1x is
Stacking Convolutions two convolution layers? a linear classifier)
A: We get another convolution!

32 28 Solution: Add
26
activation function
between conv layers
Conv ReLU Conv ReLU Conv ReLU ….

W1: 6x3x5x5 W2: 10x6x3x3 W3: 12x10x3x3

32 b1: 6 28 b2: 10 26
b3: 12
3 6 10
Input: First hidden layer: Second hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 N x 10 x 26 x 26
Justin Johnson Lecture 7 - 29 January 31, 2022
What do convolutional filters learn?

32 28 26

Conv ReLU Conv ReLU Conv ReLU ….

W1: 6x3x5x5 W2: 10x6x3x3 W3: 12x10x3x3

32 b1: 6 28 b2: 10 26
b3: 12
3 6 10
Input: First hidden layer: Second hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 N x 10 x 26 x 26
Justin Johnson Lecture 7 - 30 January 31, 2022
What do convolutional filters learn?

32 28 Linear classifier: One template per class

Conv ReLU

W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 31 January 31, 2022
What do convolutional filters learn?
MLP: Bank of whole-image templates

32 28

Conv ReLU

W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 32 January 31, 2022
What do convolutional filters learn?
First-layer conv filters: local image templates
(Often learns oriented edges, opposing colors)
32 28

Conv ReLU

W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 AlexNet: 64 filters, each 3x11x11

Justin Johnson Lecture 7 - 33 January 31, 2022

A closer look at spatial dimensions

32 28

Conv ReLU

W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 34 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3

7
Justin Johnson Lecture 7 - 35 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3

7
Justin Johnson Lecture 7 - 36 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3

7
Justin Johnson Lecture 7 - 37 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3

7
Justin Johnson Lecture 7 - 38 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
Output: 5x5

7
Justin Johnson Lecture 7 - 39 January 31, 2022
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
Output: 5x5
In general: Problem: Feature
7 maps “shrink”
Input: W
Filter: K with each layer!
Output: W – K + 1

7
Justin Johnson Lecture 7 - 40 January 31, 2022
A closer look at spatial dimensions
0 0 0 0 0 0 0 0 0
Input: 7x7
0 0
Filter: 3x3
0 0
Output: 5x5
0 0
In general: Problem: Feature
0 0
Input: W maps “shrink”
0 0
Filter: K with each layer!
0 0 Output: W – K + 1
0 0
Solution: padding
0 0 0 0 0 0 0 0 0 Add zeros around the input
Justin Johnson Lecture 7 - 41 January 31, 2022
A closer look at spatial dimensions
0 0 0 0 0 0 0 0 0
Input: 7x7
0 0
Filter: 3x3
0 0
Output: 5x5
0 0

0 0
In general: Very common:
Input: W Set P = (K – 1) / 2 to
0 0
Filter: K make output have
same size as input!
0 0 Padding: P
0 0 Output: W – K + 1 + 2P
0 0 0 0 0 0 0 0 0

Justin Johnson Lecture 7 - 42 January 31, 2022

Receptive Fields
For convolution with kernel size K, each element in the
output depends on a K x K receptive field in the input

Input Output

Justin Johnson Lecture 7 - 43 January 31, 2022

Receptive Fields
Each successive convolution adds K – 1 to the receptive field size
With L layers the receptive field size is 1 + L * (K – 1)

Input Output
Be careful – ”receptive field in the input” vs “receptive field in the previous layer”
Hopefully clear from context!

Justin Johnson Lecture 7 - 44 January 31, 2022

Receptive Fields
Each successive convolution adds K – 1 to the receptive field size
With L layers the receptive field size is 1 + L * (K – 1)

Input Problem: For large images we need many layers Output

for each output to “see” the whole image image

Justin Johnson Lecture 7 - 45 January 31, 2022

Receptive Fields
Each successive convolution adds K – 1 to the receptive field size
With L layers the receptive field size is 1 + L * (K – 1)

Input Problem: For large images we need many layers Output

for each output to “see” the whole image image
Solution: Downsample inside the network

Justin Johnson Lecture 7 - 46 January 31, 2022

Strided Convolution
Input: 7x7
Filter: 3x3
Stride: 2

Justin Johnson Lecture 7 - 47 January 31, 2022

Strided Convolution
Input: 7x7
Filter: 3x3
Stride: 2

Justin Johnson Lecture 7 - 48 January 31, 2022

Strided Convolution
Input: 7x7
Filter: 3x3 Output: 3x3
Stride: 2

Justin Johnson Lecture 7 - 49 January 31, 2022

Strided Convolution
Input: 7x7
Filter: 3x3 Output: 3x3
Stride: 2
In general:
Input: W
Filter: K
Padding: P
Stride: S
Output: (W – K + 2P) / S + 1
Justin Johnson Lecture 7 - 50 January 31, 2022
Convolution Example