0% found this document useful (0 votes)
6 views

Week 11 - Convolutional

This document discusses Convolutional Neural Networks (CNNs) and their applications in image classification, object detection, and image segmentation. It covers key concepts such as invariance, equivariance, convolutional layers, and the differences between fully connected networks and convolutional networks. The document also highlights the architecture and performance of various CNN models, including AlexNet and VGG, along with techniques like data augmentation and transfer learning.

Uploaded by

sawerayaseen654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Week 11 - Convolutional

This document discusses Convolutional Neural Networks (CNNs) and their applications in image classification, object detection, and image segmentation. It covers key concepts such as invariance, equivariance, convolutional layers, and the differences between fully connected networks and convolutional networks. The document also highlights the architecture and performance of various CNN models, including AlexNet and VGG, along with techniques like data augmentation and transfer learning.

Uploaded by

sawerayaseen654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Week 11

Convolutional Neural Networks

Dr. Muhammad Wasim


Slides adopted from Prof. Simon Prince (Understanding Deep Learning Book)
Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Convolutional network for MNIST 1D
Image classification

• Multiclass classification problem (discrete classes, >2 possible classes)


• Convolutional network
Object detection
Image segmentation

• Multivariate binary classification problem (many outputs, two discrete classes)


• Convolutional encoder-decoder network
Networks for images
• Problems with fully-connected networks
1. Size
• 224x224 RGB image = 150,528 dimensions
• Hidden layers generally larger than inputs
• One hidden layer = 150,520x150,528 weights -- 22 billion
2. Nearby pixels statistically related
• But could permute pixels and relearn and get same results with FC
3. Should be stable under transformations
• Don’t want to re-learn appearance at different parts of image
Convolutional networks
• Parameters only look at local image patches
• Share parameters across image
Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Convolutional network for MNIST 1D
Invariance
• A function f[x] is invariant to a transformation t[] if:

i.e., the function output is the same even after the transformation is
applied.
Invariance example
e.g., Image classification
• Image has been translated, but we want our classifier to give the same result
Equivariance
• A function f[x] is equivariant to a transformation t[] if:

i.e., the output is transformed in the same way as the input


Equivariance example
e.g., Image segmentation
• Image has been translated and we want segmentation to translate with it
Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Convolutional network for MNIST 1D
Convolution* in 1D
• Input vector x:

• Output is weighted sum of neighbors:

• Convolutional kernel or filter:


Kernel size = 3

* Not really technically convolution


Convolution with kernel size 3
Convolution with kernel size 3
Convolution with kernel size 3

Equivariant to translation of input


Zero padding

Treat positions that are beyond end of the input as zero.


“Valid” convolutions

Only process positions where kernel falls in image (smaller output).


Stride, kernel size, and dilation
• Stride = shift by k positions for each output
• Decreases size of output relative to input
• Kernel size = weight a different number of inputs for each output
• Combine information from a larger area
• But kernel size 5 uses 5 parameters
• Dilated or atrous convolutions = intersperse kernel values with zeros
• Combine information from a larger area
• Fewer parameters
1
1 1
1 1 1
1 1 1 2
Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Receptive fields
• Convolutional network for MNIST 1D
Convolutional layer
Special case of fully-connected
network
Convolutional network:

Fully connected network:


Special case of fully-connected
network
Convolutional network:

3 weights, 1 bias

Fully connected network:

weights, D biases
Special case of fully-connected
network

Fully connected network


Special case of fully-connected
network

Fully connected network Convolution, kernel 3,


stride 1, dilation 1
Special case of fully-connected
network

Fully connected network Convolution, size 3, stride 1, Convolution, size 3, stride 2,


dilation 1, zero padding dilation 1, zero padding
Question 1

• Kernel size?
• Stride?
• Dilation?
• Zero padding / valid?
Question 2

• Kernel size?
• Stride?
• Dilation?
• Zero padding / valid?
Question 3

• Kernel size?
• Stride?
• Dilation?
• Zero padding / valid?
Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Convolutional network for MNIST 1D
Channels
• The convolutional operation averages together the inputs
• Plus passes through ReLU function
• Has to lose information
• Solution:
• apply several convolutions and stack them in channels
• Sometimes also called feature maps
Two output channels, one input
channel
Two output channels, one input
channel
Two input channels, one output
channel
How many parameters?
• If there are input channels and kernel size K

• If there are input channels and output channels


Convolutional networks
• Networks for images
• Invariance and equivariance
• 1D convolution
• Convolutional layers
• Channels
• Convolutional network for MNIST 1D
MNIST 1D Dataset
MNIST-1D results for fully-connected
network
Convolutional network
• Four hidden layers
• Three convolutional layers
• One fully-connected layer
• Softmax at end
• Total parameters = 2050
• Trained for 100,000 steps with SGD, LR = 0.01, batch size 100
MNIST-1D convolutional network
Fully connected network
• Exactly same number of layers and hidden units
• All fully-connected layers
• Total parameters = 150,185
Performance
MNIST 1D Dataset
Why?
• Better inductive bias
• Forced the network to process each location similarly
• Shares information across locations
• Search through a smaller family of input/ouput mappings, all of which
are plausible
Convolution #2
• 2D Convolution
• Downsampling and upsampling, 1x1 convolution
• Image classification
• Object detection
• Semantic segmentation
2D Convolution
• Convolution in 2D
• Weighted sum over a K x K region
• K x K weights
• Build into a convolutional layer by adding bias and passing through
activation function
2D Convolution
2D Convolution
Channels in 2D convolution

Kernel size, stride, dilation all


work as you would expect
How many parameters?
• If there are input channels and kernel size K x K

• If there are input channels and output channels


Convolution #2
• 2D Convolution
• Downsampling and upsampling, 1x1 convolution
• Image classification
• Object detection
• Semantic segmentation
Downsampling

Sample every other


position (equivalent to
stride two)
Downsampling

Sample every other Max pooling


position (equivalent to (partial invariance to
stride two) translation)
Downsampling

Sample every other Max pooling Mean pooling


position (equivalent to (partial invariance to
stride two) translation)
Upsampling

Duplicate
Upsampling

Duplicate Max-upsampling
Upsampling

Duplicate Max-upsampling Bilinear interpolation


Convolution #2
• 2D Convolution
• Downsampling and upsampling, 1x1 convolution
• Image classification
• Object detection
• Semantic segmentation
ImageNet database

• 224 x 224 images


• 1,281,167 training images, 50,000 validation images, and 100,000 test images
• 1000 classes
AlexNet (2012)

Almost all the 60 million


parameters
parameters are in fully
connected layers
Data augmentation

• Data augmentation a factor of 2048 using (i) spatial transformations


and (ii) modifications of the input intensities.
Dropout

• Dropout was applied in the fully connected layers


Details
• At test time average results from five different cropped and
mirrored versions of the image
• SGD with a momentum coefficient of 0.9 and batch size of 128.
• L2 (weight decay) regularizer used.
• This system achieved a 16.4% top-5 error rate and a 38.1%
top-1 error rate.
VGG (2015)
Details
• 19 hidden layers
• 144 million parameters
• 6.8% top-5 error rate, 23.7% top-1 error rate
Convolution #2
• 2D Convolution
• Downsampling and upsampling, 1x1 convolution
• Image classification
• Object detection
• Semantic segmentation
• Residual networks
• U-Nets and hourglass networks
You Only Look Once (YOLO)

• Network similar to VGG (448x448 input)


• 7×7 grid of locations
• Predict class at each location
• Predict 2 bounding boxes at each location
• Five parameters –x,y, height, width, and confidence
• Momentum, weight decay, dropout, and data augmentation
• Heuristic at the end to threshold and decide final boxes
Object detection (YOLO)
Transfer learning

Transfer learning from ImageNet classification


Results
Convolution #2
• 2D Convolution
• Downsampling and upsampling, 1x1 convolution
• Image classification
• Object detection
• Semantic segmentation
Semantic Segmentation (2015)

Encoder Decoder
Semantic segmentation results

You might also like