0% found this document useful (0 votes)
6 views

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), highlighting their advantages over fully-connected networks for image data, including sparse connectivity and shared weights. It details the architecture of CNNs, which typically includes input, convolution, pooling, and fully connected layers, along with essential components like activation functions and downsampling operations. Additionally, it discusses techniques such as batch normalization, dropout, and data augmentation to enhance training and model robustness.

Uploaded by

quan.tran220401
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), highlighting their advantages over fully-connected networks for image data, including sparse connectivity and shared weights. It details the architecture of CNNs, which typically includes input, convolution, pooling, and fully connected layers, along with essential components like activation functions and downsampling operations. Additionally, it discusses techniques such as batch normalization, dropout, and data augmentation to enhance training and model robustness.

Uploaded by

quan.tran220401
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Convolutional Neural

Networks
Convolutional Neural Networks
Why CNN?

Problem of fully-connected neural networks on handling such image data:


● The number of input values are generally quite large
● The number of weights grows substantially as the size of the input images
● Pixels in distance are less correlated
Why CNN?

CNN:
● Sparse connectivity (local connectivity): a hidden unit is only connected to a local patch
(weights connected to the patch are called filter or kernel)
Why CNN?

CNN:
● Growing Receptive Fields: units in the deeper layers may indirectly interact with a larger
portion of the input.
Why CNN?

CNN:
● Shared weights at different spatial locations: Hidden nodes at different locations share the
same weights → reduces the number of parameters
Architecture of CNN
A typical CNN has 4 layers
● Input layer
● Convolution layer
● Pooling layer
● Fully connected layer
Building blocks of convolutional neural networks
Essential components of a CNN:
● the convolutional layers for feature extraction
● the activations to support learning of non-linear interactions
● the downsampling operations (pooling or striding)
● fully connected layers to transform the network output and Softmax layer
Optional components: batch normalization to speed up training and dropout to prevent overfitting
Convolution layer
A convolution matrix is used in image processing for tasks such as edge detection,
blurring, sharpening, etc. → producing feature maps
Convolution filter
A convolution matrix is used in image processing for tasks such as edge detection,
blurring, sharpening, etc. → producing feature maps
Convolution operator parameters
● Filter size
● Padding
● Stride
● Dilation
● Activation function
Filter size

● Filter size can be 5 by 5, 3 by 3, and so on

● Larger filter sizes should be avoided

● As learning algorithm needs to learn filter values (weights)

● Odd sized filters are preferred to even sized filters

● Nice geometric property of all input pixels being around output pixel
Padding
Image shrinks after applying convolutional operation → after many steps → a very small output.
Pixels on the corners or edges are used much less than pixels in the middle → Loss
information from the edges
Padding
→ Padding the image with additional border(s), set pixel values to 0 on the border
Type of padding:
● Valid Padding: no padding
● Same Padding: add ‘p’ padding layers such that output has the same dimensions as input.
● Padding with “p” layer: add ‘p’ padding layers
3 by 3 filter with padding of 1
Stride
Stride controls how far filter shifts at each step → Increase the stride if we want receptive fields
to have less overlaps and if we want smaller output dimensions → down sampling
3 by 3 filter with stride of 2
Dilation (Dilated Convolution)

● Dilation: To have a larger


receptive field (portion of
image affecting filter’s
output)
● If dilation set to 2, instead of
contiguous 3 by 3 subset of
image, every other pixel of a
5 by 5 subset of image
affects output
3 by 3 filter with dilation of 2
Activation function
After filter applied to whole image, apply activation function to output to introduce non-linearlity
Preferred activation function in CNN is ReLU
Relu activation function
ReLU leaves outputs with positive values as is, replaces negative values with 0
Relu activation function
2D Convolution Summary
Multiple input channels
● have a kernel for each channel → sum results over channels
Convolutions Over Channels
Convolution layer
Pooling
● Pooling layer is used to reduce the spatial size of
representation
● Pooling layer is usually attached after a convolutional
layer
● It helps to reduce the amount of parameters and speed
up the computation.
● Types:
- Max Pooling (most popular)
- Average Pooling
- L2 norm of a rectangular neighborhood
● It has hyperparameters but no parameters to learn
Max Pooling
Average Pooling
Pooling layer
Fully-connected layer
● Last layer in a CNN
● Connect all nodes from previous layer to this fully connected layer
○ Which is responsible for classification of the image
Batch Normalization
Feature vectors of length C at each pixel location of the 2D feature map P × Q × C is treated as
a sample to calculate the sample mean and sample standard deviation for normalization
→ training faster and more stable
Dropout
Dropout layer acts as a mask, eliminating some neurons’ contributions to the subsequent layer
while maintaining the functionality of all other neurons → reduce overfitting
Data Augmentation
Helps with improving model robustness and reducing overfitting.
Methods: Horizontal flips, random crops/scales, translation, color jitter, rotation,…
Example
import tensorflow as tf
def generate_model():
model = tf.keras.Sequential([

# first convolutional layer


tf.keras.layers.Conv2D(32, filter_size=3, activation='relu’),
tf.keras.layers.MaxPool2D(pool_size=2, strides=2),

# second convolutional layer


tf.keras.layers.Conv2D(64, filter_size=3, activation='relu’),
tf.keras.layers.MaxPool2D(pool_size=2, strides=2),

# fully connected classifier


tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024, activation='relu’),
tf.keras.layers.Dense(10, activation=‘softmax’)
# 10 outputs
])
return model

You might also like