465-Lecture 5-6
465-Lecture 5-6
Lecture 5 & 6
CNN – Convolutional Neural Network
CNN in a nutshell
Images/videos are just matrix with numbers
What do we want to do?
How to categorize images?
• Use features
• Occlusion
• Blocking partially
• Different illumination
• Amount of light/brightness change
• Scale variation/deformation
• Viewpoint variation
How can we use machine to learn features?
Implementation so far
• The number of pixels the filter slides over the image is called stride
• For example, to slide the convolution filter one pixel at a time, the strides value is 1
• If we want to jump two pixels at a time, the strides value is 2
• Strides of 3 or more are uncommon and rare in practice
• Jumping pixels produces smaller output volumes spatially
• Strides of 1 will make the output image roughly the same height and width of the
input image, while strides of 2 will make the output image roughly half of the input
image size
Pooling
• The goal of the pooling layer is to down sample the feature maps produced by the
convolutional layer into a smaller number of parameters, thus reducing
computational complexity
• Pooling filters do not have weights or any values
• All they do is slide over the feature map created by the previous convolutional layer and select
the pixel value to pass along to the next layer, ignoring the remaining values
Pooling
• Parameters
• Values inside the filters
• W matrix
• b vector
• Number of operations
• Number of multiplications for the CNN operations
• How does it change for different values of the padding or stride
• Number of additions for the CNN operations
• Number of filters
1X1 Convolution
Why 1X1 Convolution
cat dog ……
Convolution
Max Pooling
The
Fully Connected Feature
Feedforward network
Convolution extractor
Max Pooling
Flattened
A CNN compresses a fully connected network
• Learn weights for convolutional filters and fully connected layers using
backpropagation and the log loss (cross-entropy loss) function
CNN Application: Self driving cars/drones