0% found this document useful (0 votes)
10 views

ch4_CNN

The document provides an overview of Convolutional Neural Networks (CNNs) and their applications in deep learning, including object detection, image segmentation, and facial recognition. It discusses the architecture of CNNs, including convolution layers, pooling layers, and hyperparameters like stride and padding, as well as techniques such as transfer learning and fine-tuning. Additionally, it highlights notable CNN architectures like LeNet, AlexNet, VGG-Net, ResNet, and Inception models.

Uploaded by

hmercha.sarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

ch4_CNN

The document provides an overview of Convolutional Neural Networks (CNNs) and their applications in deep learning, including object detection, image segmentation, and facial recognition. It discusses the architecture of CNNs, including convolution layers, pooling layers, and hyperparameters like stride and padding, as well as techniques such as transfer learning and fine-tuning. Additionally, it highlights notable CNN architectures like LeNet, AlexNet, VGG-Net, ResNet, and Inception models.

Uploaded by

hmercha.sarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CH4-CNN

DEEP LEARNING
CNN USE CASES

Object
detection

Image segmentation Facial Recognition


DEEP LEARNING 2
INTRODUCTION TO IMAGE

DEEP LEARNING 3
RGB IMAGE

Shape( n*m*3)
n*m : image resolution
3= number of channels (Red Green Blue)

DEEP LEARNING 4
GRAYSCALE IMAGE

Pixel value in [0,255]


Shape( n*m*1) or quite simply (n*m)
1= one color channel
DEEP LEARNING 5
INTRODUCTION TO CNN

 The first convolutional neural network (CNN) has been introduced


in 1990 (LeNet).

 Why CNN ?
 Images have big pixels!
 Fully-connected neural network would have too many parameters!
 Translational invariance in images

Yann LeCun

DEEP LEARNING 6
WHAT IS A CNN?

 A CNN is a type of artificial neural network usually designed to extract features from data and to classify given
high dimensional data.
 Basic Architecture :

DEEP LEARNING 7
CONVOLUTION LAYER : 1 CHANNEL

 Convoluting a 5x5x1 image with a 3x3x1 kernel (or filter) to get a 3x3x1 convolved feature
 By convention, the value of ‘f,’ i.e. filter size, is usually odd in computer vision.

DEEP LEARNING 8
CONVOLUTION LAYER : 3 CHANNELS

 In the case of images with multiple channels (e.g. RGB), the Kernel has the same depth as that of the input image.

DEEP LEARNING 9
IMAGE KERNELS

 https://setosa.io/ev/image-kernels/

DEEP LEARNING 10
MULTIPLE CONVOLUTION LAYERS

DEEP LEARNING 11
CONVOLUTIONAL LAYER- HYPER PARAMETERS : STRIDE

 The stride indicates the pace by which the filter moves horizontally & vertically over the pixels of the input image during
convolution.
 Stride value :
 For several fine-grained features : small stride size
 If we are only interested in the macro-level of features : large stride size.

 Input image size : n x n


 Filter size : f x f
 Stride: s
𝑛−𝑓 𝑛−𝑓
 Output image size : ( + 1) × ( + 1)
𝑠 𝑠

DEEP LEARNING
Stride = 2 12
CONVOLUTIONAL LAYER- HYPER PARAMETERS : PADDING

 Downsides of convolution :
 After a convolution operator, the image shrinks : loose of a lot of information.
 During convolution, the pixels in the corners & the edges are considered only once :
a lot of information near the edge of the image are thrown away
 Solution : ‘pad’ the image.
 Padding : p
𝑛+2𝑝−𝑓 𝑛+2𝑝−𝑓
 Output image size : ( + 1) × ( + 1)
𝑠 𝑠

DEEP LEARNING 13
POOLING LAYERS

 A Pooling layer is added after the Convolutional layer(s)


 Pooling is like sub-sampling
 Pooling filter size usually is 2x2 (or 2n x 2n)
 Usually reduce the size to 1/N per each side
(e.g. N=2 for 2x2)

DEEP LEARNING 14
POOLING LAYERS

 Pooling has the advantage of making the representation more compact by reducing the spatial size of the feature
maps, thereby reducing the number of parameters to be learnt.
 The pooling layer has ‘NO PARAMETERS’ i.e. ‘ZERO TRAINABLE PARAMETERS’.
 Illustrations :

DEEP LEARNING 15
EXAMPLE MAXPOOLING

DEEP LEARNING 16
TRANSFER LEARNING

DEEP LEARNING 17
TRANSFER LEARNING

 A technique where knowledge acquired from solving one task is reused to enhance performance on a
related task.
 key points about transfer learning:
 Transfer learning has been studied since the 1970s.
 Transfer learning finds applications in various domains, including cancer subtype discovery, text
classification, medical imaging, and spam filtering.
 By reusing information from previously learned tasks, transfer learning significantly improves
learning efficiency.

DEEP LEARNING 18
BENEFITS OF USING TRANSFER LEARNING

 Reduces the amount of training time required for a new task.


 The knowledge of the pre-trained dataset can be generalized in their understanding of different
domain-related tasks.
 Small datasets are prone to overfitting, by using the transfer learning approach helps to mitigate this
issue by starting with the learned features.
 Building a model from scratch is computationally expensive and transfer learning helps to reduce the
training time.

DEEP LEARNING 19
IMPLEMENTING TRANSFER LEARNING

1. Get the pre-trained model: The first step is to obtain the pre-trained model adapted to the problem.
2. Create a base model: Instantiate the basic model using one of the known architectures.
3. Freeze layers so they don't change during training: base_model.trainable = False
4. Add new trainable layers
5. Train the new layers on the dataset
6. Enhance the model with fine tuning

DEEP LEARNING 20
FINE TUNING

 Fine-tuning refers to taking a pre-trained model and further training it on a new dataset.
 Fine-tuning involves training the entire model, including the initial layers.
 Fine-tuning is performed by unfreezing the base model or part of it and retraining the entire model on
the data set at a very low learning rate.
 The later layers make use of a higher learning rate to adapt to the new dataset.

DEEP LEARNING 21
FINE TUNING

DEEP LEARNING 22
WAYS TO FINE TUNE THE MODEL

 Feature extraction : remove the output layer and then use the entire network as a fixed feature
extractor for the new data set.
 Use the Architecture of the pre-trained model : use architecture of the model while initializing
all the weights randomly and train the model according to the dataset again.
 Train some layers while freeze others : keep the weights of initial layers of the model frozen
while retraining only the higher layers.

DEEP LEARNING 23
WAYS TO FINE TUNE THE MODEL

 retain the architecture of the model and


 its best to train the neural network from the initial weights of the model. Then
scratch according to the used data. retrain this model using the weights as
Size of the dataset

initialized in the pre-trained model.

 freeze the initial k layers of the pretrained


 customize and modify the output layers
model and train just the remaining(n-k)
according to our problem statement. We use
layers again. The top layers would then be the pretrained model as a feature extractor.
customized to the new data set.

DEEP LEARNING Data similarity 24


COMMON ARCHITECTURES

 LeNet-5: 1998
 AlexNet: 2012
 VGG-Net : 2014
 Inception-v1 to v3
 ResNet: 2015

DEEP LEARNING 25
LENET-5

 Proposed by Yann LeCun and others in the year 1998


 A multi-layer convolution neural network for image classification.
 Used for recognizing the handwritten and machine-printed characters.

DEEP LEARNING 26
ALEXNET

▪ Alex Krizhevsky released AlexNet


▪ won the Imagenet large-scale visual recognition challenge in 2012.
▪ AlexNet is a deeper and much wider version of the LeNet.
▪ The use of the relu as an activation function accelerated the speed of the training process by almost six times.
▪ The dropout layers, prevented the model from overfitting.
▪ The use of padding prevent the size of the feature maps from reducing drastically.
▪ The model is trained on the Imagenet dataset.
▪ The Imagenet dataset has almost 14 million images across a thousand classes.

DEEP LEARNING 27
ALEXNET

DEEP LEARNING 28
VGG-NET

 The VGG-Net is one of the most popular pre-


trained models for image classification.
 Introduced in the famous ILSVRC 2014
Conference.
 Developed at the Visual Graphics Group at the
University of Oxford,
 VGG-16 beat the standard of AlexNet and was
quickly adopted by researchers and the
industry for their image Classification Tasks.

DEEP LEARNING 29
RESNET : RESIDUAL BLOCKS

 VGG-Net method works with a small number of convolutional layers.


 Subsequent research discovered that increasing the number of layers could significantly improve
CNN performance.
 The ResNet architecture introduces the simple concept of adding an intermediate input to the
output of a series of convolution blocks.
 This technique smooths out the gradient flow during backpropagation, enabling the network to
scale to 50, 100, or even 150 layers.

DEEP LEARNING 30
RESNET

DEEP LEARNING 31
INCEPTION-1 :

 Problem :
 Because of this huge variation in the location of the information, choosing the right kernel size for
the convolution operation becomes tough
 Very deep networks are prone to overfitting. It also hard to pass gradient updates through the
entire network.
 Naively stacking large convolution operations is computationally expensive.
 Solution : filters with multiple sizes that operate on the same level

DEEP LEARNING 32
INCEPTION-1 :

 Inception is a deep convolutional neural network architecture that was introduced in 2014.
 It won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC14).
 It was mostly developed by Google researchers.

DEEP LEARNING 33
INCEPTION-1 : GOOGLENET

 GoogLeNet has 9 inception modules stacked linearly.


 It is 22 layers deep (27, including the pooling layers).
 It uses global average pooling at the end of the last inception module.

DEEP LEARNING 34
INCEPTION-3

▪ Inception-v3 incorporated the above upgrades :


1. RMSProp Optimizer.
2. Factorized 5x5 and 7x7 convolutions.
3. BatchNorm in the Auxillary Classifiers.
4. …

DEEP LEARNING 35

You might also like