0% found this document useful (0 votes)
8 views

Lecture2 Advanced CNN

CNN

Uploaded by

Quang Uy Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture2 Advanced CNN

CNN

Uploaded by

Quang Uy Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Advanced Convolutional Neural Networks

Nguyen Quang Uy

1
Outline
1. Alexnet

2. VGGnet

3. Googlenet

4. Resnet

5. Mobilenet

6. Efficientnet
2
Legends

3
Layers

4
Activation functions

5
Modules/Blocks

6
Repeated layers

7
Alexnet

8
Overview
• Paper: ImageNet Classification with Deep Convolutional Neural Networks
• Published in: NeurIPS 2012.
• Considered to be the most impact in computer vision.

9
Novelties
• Use Rectified Linear Units (ReLUs) as activation functions.
• Use Dropout layer.
• Use data augmentation.

10
Architecture
• AlexNet has 8 layers — 5 convolutional and 3 fully-connected.
• AlexNet Has 60M parameters.

11
Results
• Top-1 error rates is 37.5%
• Top-5 error rates 17.0%

12
VGG

13
Overview
• VGG: Visual Geometry Group
• Paper: Very Deep Convolutional Networks for Large-Scale Image
Recognition
• Published in arXiv 2014

14
Novelties
• Designing of deeper networks (roughly twice as deep as AlexNet). This was done by
stacking uniform convolutions.
• They use only 3x3 kernels, as opposed to AlexNet 11x11. This design decreases the
number of parameters.

15
Architecture
• VGG has 13 convolutional and 3 fully-connected layers.
• This network stacks more layers onto AlexNet.
• It consists of 138M parameters.

16
VGG result
• Top-1 accuracy is 71.5%
• Top-1 accuracy 90.1%

17
Googlenet

18
Overview
• Also known as Inception-v1
• Paper: Going Deeper with Convolutions
• Published in: 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
• Achieve competitive result compared to human

19
Novelties
• Building networks using modules/blocks, instead of stacking convolutional layers.
• 1×1 conv are used for dimensionality reduction to remove computational bottlenecks.
• Have parallel convolutions with filters at 1×1, 3×3 and 5×5, followed by concatenation.
• Use two auxiliary classifiers to encourage discrimination in the lower stages.

20
Architecture

21
Architecture
• Stem and Inception module.

22
Results
• Top-1 accuracy is 78.2%
• Top-5 accuracy is 94.1%
• Human error is 5%-8%.

23
Resnet

24
Overview
• Paper: Deep Residual Learning for Image Recognition.
• Published in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
• The first network achieves better result then human.

25
Novelties
• Popularise skip connections (they weren’t the first to use skip connections).
• Design even deeper CNNs (up to 152 layers) without compromising model’s
generalisation power
• Among the first to use batch normalisation.

26
Architecture
• Conv block and Identity module.

27
Architecture
• Conv block and Identity module.

28
Resnet result
• Top-1 accuracy is 87.0%.
• Top-5 accuracy 96.3%.
• Top-5 human accuracy: 95.0%

29
Mobilenet

30
Overview
• Paper: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
Applications
• Published in: 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).
• Specially designed to be used in mobile devices.

31
Novelties
• MobileNet uses depthwise separable convolutions. It significantly reduces the number of
parameters.
• It introduces two shrinking hyperparameters that efficiently trade off between latency and
accuracy

32
Architecture

33
Architecture
• Deepwise separable convolution.

34
Architecture
• Deepwise convolution:
• In a normal convolution, all channels of a kernel are used to produce a feature map.
• In a depthwise convolution, each channel of a kernel is used to produce a feature map.

35
Architecture
• Pointwise convolution.
• In a normal convolution, we just have to use 256 filters of size 5x5x3.
• In a pointwise convolution, we just have to use 256 filters of size 1x1x3.

36
Computation cost
• Standard convolution

• The computational cost can be calculated as

• Where DF is the dimensions of the input feature map and DK is the


size of the convolution kernel, M and N are the number of input and
output channels respectively.
37
Computation cost
• Depthwise convolution

• The computational cost can be calculated as

38
Computation cost
• Depthwise convolution

• The computational cost can be calculated as

39
Computation cost
• The total computational cost of Depthwise separable convolutions can be
calculated as.

• Comparing it with the computational cost of standard convolution, we get


the reduction in computation.

40
Results
• Mobilenet is better than Googlenet and VGG with much lower number of
operators and parameters.

41
Efficientnet

42
Overview
• Paper: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
• Published in: International Conference on Machine Learning, 2019.
• It is considered as the state-of-the-art until today.

43
Novelties
• Compound Scaling from B0 to B7.
• The EfficientNet Architecture (developed using Neural Architecture Search)

44
Architecture

45
Compound scaling
• The most common way to scale up ConvNets was either depth (number of layers), width
(number of channels) or image resolution (image size).
• EfficientNets perform Compound Scaling to scale all three dimensions while mantaining a
balance between all dimensions of the network.

46
Compound scaling
• This idea of compound scaling makes sense because if the input image is
bigger then the network needs more layers (depth) and more channels
(width) to capture more fine-grained patterns.

47
Neural Architecture Search
• This is a reinforcement learning based approach used to develop Efficient-B0 by
leveraging a multi-objective search that optimizes for both Accuracy and FLOPS.

48
Neural Architecture Search
• The objective function can formally be defined as:

49
Mobile inverted bottleneck convolution (MBConv)
• MBConv without squeeze and excitation operation

50
Mobile inverted bottleneck convolution (MBConv)
• MBConv with squeeze and excitation operation

51
Squeeze and excitation operation
• Access to global information
• Modelling channel interdependencies
• Which can be regarded as a self-attention function on channels

52
Scaling Efficient-B0 to get B1-B7
• Let the network depth(d), widt(w) and input image resolution(r) be:

• We then fix α, β, γ as constants and scale up baseline network with


different φ using Equation 3, to obtain EfficientNet-B1 to B7
53
Results

54
Q&A
Thank you!

55

You might also like