0% found this document useful (0 votes)
2 views

TensorFlow CNN

The document provides an overview of Convolutional Neural Networks (CNNs), which are effective for processing 2D data, particularly in image classification. It explains the concept of local receptive fields and the two main types of layers in CNNs: convolution and pooling. Additionally, it describes how images are represented as matrices and the application of convolution using sliding window functions.

Uploaded by

Surya Bhoi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

TensorFlow CNN

The document provides an overview of Convolutional Neural Networks (CNNs), which are effective for processing 2D data, particularly in image classification. It explains the concept of local receptive fields and the two main types of layers in CNNs: convolution and pooling. Additionally, it describes how images are represented as matrices and the application of convolution using sliding window functions.

Uploaded by

Surya Bhoi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

Convolutional Neural Networks in TensorFlow

Over view

Convolutional NNs are one kind of NN


architecture which work well with 2D data

Modeled on the visual cortex, they are amazing at


image classification
How Do We See?
Viewing an Image

All neurons in the eye don’t see the entire image


Viewing an Image

Each neuron has its own local receptive field


Viewing an Image

It reacts only to visual stimuli located in its receptive field


Viewing an Image

Some neurons react to more complex patterns that are


combinations of lower level patterns
Neural Net works

Layer 2
Layer 1

Layer N

Sounds like a classic neural network problem
Two Kinds of Layers in CNNs

Convolution Pooling

Local receptive field Subsampling of inputs


Convolution
Two Kinds of Layers in CNNs

Convolution Pooling

Local receptive field Subsampling of inputs


Convolution
In this context, a sliding window function applied to
a matrix
Convolution
In this context, a sliding window function applied to
a matrix

e.g. a matrix of pixels representing


an image
Convolution
In this context, a sliding window function applied to
a matrix

Often called a kernel or filter


Convolution
In this context, a sliding window function applied to
a matrix

Kernel is applied element-wise in sliding-


window fashion
Representing Images as Matrices
28

28

= 784 pixels
Representing Images as Matrices
6

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0


6
0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

= 36 pixels
Representing Images
3

0 0 0 0 0 0
1 0 1
0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0


0 1 0
0.3 0.8 0.7 0.8 0.9 0 3

0 0 0 0.2 0.8 0
1 0 1
0 0 0 0.2 0.2 0

Matrix Kernel
Convolution
3
0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 x1 x0 x1

0.2 0.9 0 0.3 0.8 0


3 x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0 x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Kernel
Convolution
4
0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0
4 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4
x1 x0 x1

0 0 0 0.2 0.8 0
x0 x1 x0

x1 x0 x1
1.0 1.8 2.0 1.8
0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 x1 0x0 0 x1 0 0 0

0.2
x0 0.8x1 0x0 0.3 0.6 0

0.2x1 0.9x0 0 x1 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 x1 0x0 0 x1 0 0 0

0.2
x0 0.8x1 0x0 0.3 0.6 0 1
0.2x1 0.9x0 0 x1 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 x1 0x0 0 x1 0 0

0.2 0.8
x0 0 x1 0.3
x0 0.6 0 1
0.2 0.9x1 0x0 0.3x1 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 x1 0x0 0 x1 0 0

0.2 0.8
x0 0 x1 0.3
x0 0.6 0 1 1.2
0.2 0.9x1 0x0 0.3x1 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 x1 0x0 0 x1 0

0.2 0.8 0x0 0.3x1 0.6


x0 0 1 1.2
0.2 0.9 0 x1 0.3
x0 0.8x1 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 x1 0x0 0 x1 0

0.2 0.8 0x0 0.3x1 0.6


x0 0 1 1.2 1.1
0.2 0.9 0 x1 0.3
x0 0.8x1 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 x1 0x0 0 x1

0.2 0.8 0 0.3


x0 0.6x1 0x0 1 1.2 1.1
0.2 0.9 0 0.3x1 0.8
x0 0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 x1 0x0 0 x1

0.2 0.8 0 0.3


x0 0.6x1 0x0 1 1.2 1.1 0.9
0.2 0.9 0 0.3x1 0.8
x0 0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2x1 0.8
x0 0 x1 0.3 0.6 0 1 1.2 1.1 0.9
0.2
x0 0.9x1 0x0 0.3 0.8 0

0.3x1 0.8
x0 0.7x1 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2x1 0.8
x0 0 x1 0.3 0.6 0 1 1.2 1.1 0.9
0.2
x0 0.9x1 0x0 0.3 0.8 0 1.9
0.3x1 0.8
x0 0.7x1 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8x1 0x0 0.3x1 0.6 0 1 1.2 1.1 0.9


0.2 0.9
x0 0 x1 0.3
x0 0.8 0 1.9
0.3 0.8x1 0.7x0 0.8x1 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8x1 0x0 0.3x1 0.6 0 1 1.2 1.1 0.9


0.2 0.9
x0 0 x1 0.3
x0 0.8 0 1.9 2.7
0.3 0.8x1 0.7x0 0.8x1 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 x1 0.3


x0 0.6x1 0 1 1.2 1.1 0.9
0.2 0.9 0x0 0.3x1 0.8
x0 0 1.9 2.7
0.3 0.8 0.7x1 0.8
x0 0.9x1 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 x1 0.3


x0 0.6x1 0 1 1.2 1.1 0.9
0.2 0.9 0x0 0.3x1 0.8
x0 0 1.9 2.7 2.5
0.3 0.8 0.7x1 0.8
x0 0.9x1 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3x1 0.6


x0 0 x1 1 1.2 1.1 0.9
0.2 0.9 0 0.3
x0 0.8x1 0x0 1.9 2.7 2.5
0.3 0.8 0.7 0.8x1 0.9
x0 0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3x1 0.6


x0 0 x1 1 1.2 1.1 0.9
0.2 0.9 0 0.3
x0 0.8x1 0x0 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8x1 0.9
x0 0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2x1 0.9x0 0 x1 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3
x0 0.8x1 0.7
x0 0.8 0.9 0

0 x1 0x0 0 x1 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2x1 0.9x0 0 x1 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3
x0 0.8x1 0.7
x0 0.8 0.9 0 1.0
0 x1 0x0 0 x1 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9x1 0x0 0.3x1 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8
x0 0.7x1 0.8
x0 0.9 0 1.0
0 0 x1 0x0 0.2x1 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9x1 0x0 0.3x1 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8
x0 0.7x1 0.8
x0 0.9 0 1.0 2.1
0 0 x1 0x0 0.2x1 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 x1 0.3
x0 0.8x1 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7x0 0.8x1 0.9x0 0 1.0 2.1
0 0 0 x1 0.2
x0 0.8x1 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 x1 0.3
x0 0.8x1 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7x0 0.8x1 0.9x0 0 1.0 2.1 2.4
0 0 0 x1 0.2
x0 0.8x1 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3x1 0.8
x0 0 x1 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8
x0 0.9x1 0x0 1.0 2.1 2.4
0 0 0 0.2x1 0.8
x0 0 x1

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3x1 0.8
x0 0 x1 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8
x0 0.9x1 0x0 1.0 2.1 2.4 1.4
0 0 0 0.2x1 0.8
x0 0 x1

0 0 0 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3x1 0.8
x0
0.7x1 0.8 0.9 0 1.0 2.1 2.4 1.4
0x0 0 x1 0x0 0.2 0.8 0

0 x1 0x0 0 x1 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3x1 0.8
x0
0.7x1 0.8 0.9 0 1.0 2.1 2.4 1.4
0x0 0 x1 0x0 0.2 0.8 0 1.0
0 x1 0x0 0 x1 0.2 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8x1 0.7x0 0.8x1 0.9 0 1.0 2.1 2.4 1.4
0 0x0 0 x1 0.2x0 0.8 0 1.0
0 0 x1 0x0 0.2x1 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8x1 0.7x0 0.8x1 0.9 0 1.0 2.1 2.4 1.4
0 0x0 0 x1 0.2x0 0.8 0 1.0 1.8
0 0 x1 0x0 0.2x1 0.2 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7x1 0.8
x0
0.9x1 0 1.0 2.1 2.4 1.4
0 0 0x0 0.2x1 0.8
x0 0 1.0 1.8
0 0 0 x1 0.2
x0
0.2x1 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7x1 0.8
x0
0.9x1 0 1.0 2.1 2.4 1.4
0 0 0x0 0.2x1 0.8
x0 0 1.0 1.8 2.0
0 0 0 x1 0.2
x0
0.2x1 0

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8x1 0.9x0 0 x1 1.0 2.1 2.4 1.4
0 0 0 0.2
x0
0.8x1 0x0 1.0 1.8 2.0
0 0 0 0.2x1 0.2x0 0 x1

Matrix Convolution Result


Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9
0.3 0.8 0.7 0.8x1 0.9x0 0 x1 1.0 2.1 2.4 1.4
0 0 0 0.2
x0
0.8x1 0x0 1.0 1.8 2.0 1.8
0 0 0 0.2x1 0.2x0 0 x1

Matrix Convolution Result


Choice of Kernel Function
Averaging neighbouring pixels ~ Blurring

x0 x0 x0 Subtracting neighbouring pixels ~ Edge detection


x0 x1 x1 x0
Positive middle, negative neighbours ~ Sharpen
x0 x1 x1 x0

x0 x1 x1 x0
Negative corners, zero elsewhere ~ Edge enhance
x0 x0

More complex patterns ~ Emboss


Choice of Kernel Function

http://aishack.in/tutorials/image-convolution-examples/
Blur
Line Detection
Horizontal Lines
Edge Detection
Zero-padding, Stride Size
Narrow vs. Wide Convolution

Input matrix i.e. image


Narrow vs. Wide Convolution

Convolution result
Narrow vs. Wide Convolution

Narrow Convolution Wide Convolution

Little zero padding; output narrower than Lots of zero padding; output wider than
input input
Without Zero Padding
6
4
0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0


6 0.3 0.8 0.7 0.8 0.9 0
x1

x0
x0

x1
x1

x0
4
x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution Result


Zero Padding
10
8
0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0.2 0.8 0 0.3 0.6 0 0 0 8


0 0 0.2 0.9 0 0.3 0.8 0 0 0
10 0 0 0.3 0.8 0.7 0.8 0.9 0 0 0
x1

x0
x0

x1
x1

x0

0 0 0 0 0 0.2 0.8 0 0 0 x1 x0 x1

0 0 0 0 0 0.2 0.2 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Matrix Convolution Result


Zero Padding
12
10
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0.2 0.8 0 0.3 0.6 0 0 0 0
10
12 0 0 0 0.2 0.9 0 0.3 0.8 0 0 0 0 x1 x0 x1

0 0 0 0.3 0.8 0.7 0.8 0.9 0 0 0 0 x0 x1 x0

0 0 0 0 0 0 0.2 0.8 0 0 0 0 x1 x0 x1

0 0 0 0 0 0 0.2 0.2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0

Matrix Convolution Result


Zero Padding

x0 x0 x0 With zero-padding, every element of matrix will be passed into


x0 x1 x1 x0 filter
x0 x1 x1 x0

x0 x1 x1 x0
Can decide number of zero columns to pad with

Use to get output larger than input


x0 x0
Stride Size

0 x1 0x0 0 x1 0 0 0

0.2
x0 0.8x1 0x0 0.3 0.6 0

0.2x1 0.9x0 0 x1 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0
Stride Size

0 0 x1 0x0 0 x1 0 0

0.2 0.8
x0 0 x1 0.3
x0 0.6 0

0.2 0.9x1 0x0 0.3x1 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Horizontal stride of 1
Stride Size

0 x1 0x0 0 x1 0 0 0

0.2
x0 0.8x1 0x0 0.3 0.6 0

0.2x1 0.9x0 0 x1 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0
Stride Size

0 0 0 0 0 0

0.2x1 0.8x0 0 x1 0.3 0.6 0

0.2
x0 0.9x1 0x0 0.3 0.8 0

0.3x1 0.8x0 0.7x1 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Vertical stride of 1
Stride Size

0 0 0 0 0 0

0.2x1 0.8x0 0 x1 0.3 0.6 0

0.2
x0 0.9x1 0x0 0.3 0.8 0

0.3x1 0.8x0 0.7x1 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Stride size is an important hyper parameter in


CNNs
Convolutional Neural Net works
Neural Net works for Image Classification

Layer 2
Layer 1

Layer N

Corpus of Layers in a neural network ML-based Classifier
Images
Neural Net works for Image Classification

Pixels Processed groups of pixels

Corpus of Each layer consists of individual ML-based Classifier


Images interconnected neurons
Parameter Explosion

Consider a 100 x 100 pixel image (10,000 pixels)

If first layer = 10,000 neurons

Interconnections ~ O(10,000 * 10,000)

100 million parameters to train neural network!


Parameter Explosion

Dense, fully connected neural networks can’t cope

Convolutional neural networks to the rescue


CNNs Introduced

Eye perceives visual stimulus in 2D visual field

Eye sends 2D image to visual cortex

Visual cortex adds depth perception

Individual neurons in cortex focus on small field

“Local receptive field”


CNNs Introduced

CNNs perform spectacularly well at many tasks

Particularly at image recognition

Dramatically fewer parameters than DNN with similar


performance
Inspirations for CNNs

Two Dimensions Local Receptive Fields

Data comes in expressed in 2D Neurons focus on narrow portions


CNN Layers

Convolution layers - zoom in on specific bits of input

Successive layers aggregate inputs into higher level features

Pixels >> Lines >> Contours/Edges >> Object


Convolutional Layers
Feature Maps

Image Pixels Feature Map


Feature Maps

Neurons

Pixels Convolutional Layer


Feature Maps
Local Receptive
Neuron i
Field of Neuron i

Pixels Convolutional Layer


Feature Maps
Number of neurons in
receptive field = kernel Neuron i
size

Pixels Convolutional Layer


Kernel Size

The convolutional kernel size is usually expressed in terms of


width and height of receptive area

Use small convolutional kernels, more efficient

Stacking 2 3x3 kernels is preferable to 1 9x9 kernel


Feature Maps

Stride: Distance between


successive receptive
fields

Pixels Convolutional Layer


Feature Maps

Horizontal Stride

Pixels Convolutional Layer


Feature Maps

Vertical Stride

Pixels Convolutional Layer


Feature Maps

Pixels Convolutional Layer


Feature Maps

Pixels Convolutional Layer


Zero padding may Feature Maps
be needed at the
edges

Convolutional Layer
Feature Maps

All neurons in a feature map have the same weights and


biases

Two big advantages over DNNs

- Dramatically fewer parameters to train

- CNN can recognise feature patterns independent of location


Feature Maps

The parameters of all neurons in a feature map are collectively


called the filter

Why filter?

Because weights highlight (filter) specific patterns from the


input pixels
Filters

Horizontal Filter Vertical Filter

Neuron will detect horizontal Neuron will detect vertical lines


lines in input in input
Feature Maps

Notice also that neurons are not connected to all pixels

CNNs are sparse neural networks


Convolutional Layer

Each convolutional layer consists of several feature maps of


equal sizes

The different feature maps have different parameters


Convolutional Layer

Each neuron’s receptive field includes the feature maps of all


previous layers

This is how aggregated features are picked up

The CNN as a whole consists of multiple convolutional (and


pooling) layers

More on pooling layers in a bit


CNNs

Feature Map Convolutional


Layer CNN
RGB Channels

Feature Map Convolutional


RGB Layer
RGB Channels

Feature Map Convolutional


RGB Layer
Output of a Convolution Layer Neuron

Input Image Layer 1 Layer 2 Layer L


Output of a Convolution Layer Neuron
Map m,
Column c,
Row r

Input Image Layer 1 Layer 2 Layer L


Output of a Convolution Layer Neuron

Map m,
Column c,
Row r
Input Image Layer 1 Layer 2 Layer L
Neuron output depends on corresponding* neurons from
each preceding layer
(*corresponding: same receptive field and feature maps,
different layers)
Pooling Layers
Two Kinds of Layers in CNNs

Convolutional Pooling

Local receptive field Subsampling of inputs


Convolution
6
4
0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


0.2 0.9 0 0.3 0.8 0
4 1.9 2.7 2.5 1.9
6 0.3 0.8 0.7 0.8 0.9 0
x1 x0 x1 1.0 2.1 2.4 1.4
0 0 0 0.2 0.8 0
x0 x1 x0

x1 x0 x1
1.0 1.8 2.0 1.8
0 0 0 0.2 0.2 0

Matrix Convolution Result


Two Kinds of Layers in CNNs

Convolutional Pooling

Local receptive field Subsampling of inputs


Pooling Layers

Neurons in a pooling layer have no weights or biases

A pooling neuron simply applies some aggregation function to


all inputs

Max, sum, average…


Max Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 Max,
0.3 0.8 0.8 0.9
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Max Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 Max,
0.3 0.8 0.8 0.9
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Max Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 Max,
0.3 0.8 0.8 0.9
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Max Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 Max,
0.3 0.8 0.8 0.9
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Max Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 Max,
0.3 0.8 0.8 0.9
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling Layers

Why use them?

- greatly reduce memory usage during training

- mitigate overfitting (via subsampling)

- make NN recognise features independent of location


(location invariance)
Pooling Layers

Pooling layers typically act on each channel independently

So, usually, output area < input area but

Output depth = Input depth


CNNs for Classification
Typical CNN Architecture

Convolutional Pooling Convolutional

Alternating groups of convolutional and pooling layers


Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

Each group of convolutional layers usually followed by a


ReLU layer
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

The output of each layer is also an image


Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

However successive outputs are smaller and smaller (due


to pooling layers)
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

As well as deeper and deeper (due to feature maps in the


convolutional layers)
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

This entire set of layers is then fed into a regular, feed-


forward NN
Typical CNN Architecture

ReLU
ReLU
Convolutional Pooling Convolutional Feed-forward Layers

CNN Layers

This entire set of layers is then fed into a regular, feed-


forward NN
Typical CNN Architecture

Fully Connected

Fully Connected
ReLU

ReLU
CNN Layers

Feed-forward Layers

This feed-forward has a few fully connected layers with


ReLU activation
Typical CNN Architecture

Fully Connected

Fully Connected

Prediction
SoftMax
ReLU

ReLU
CNN Layers

Feed-forward Layers

Finally a SoftMax prediction layer


Logistic Regression with One Neuron
W1
X1
X2
W2 Affine W2 Softmax P(Y = True)
Wi Transformation W1x + b1 Function

Xi P(Y = False)

Wn

Xn b1 b2
SoftMax for Digit Classification

P(Y = 0)

P(Y = 1)
Softmax
Function …

P(Y = 9)
SoftMax for Image Classification

P(Y = “cat”)

P(Y = “bird”)
Softmax
Function …

P(Y = “car”)
Typical CNN Architecture

P(Y = 0)

Fully Connected

Fully Connected
P(Y = 1)

Prediction
SoftMax
ReLU

ReLU
CNN Layers

Feed-forward Layers P(Y = 9)

This is the output layer, emitting probabilities


Typical CNN Architectures
Pooling

Alternating groups of convolutional and pooling layers

Convolutional Each group of convolutional layers usually followed by a


ReLU layer

Pooling Image gets smaller and smaller (due to pooling)

Also deeper and deeper (due to convolution)

Convolutional
Typical CNN Architectures
At output end of CNN, regular feedforward NN
Dense Feed- stacked on
forward Layers
- Few fully connected layers

- Input into these are small images


Convolutional - ReLU activations
Layers
- Finally, a Softmax prediction layer
Typical CNN Architectures
P(Y=0) P(Y=9)

CNN Input is an image

Outputs are probabilities

You might also like