CNN Architectures - Transfer Learning
CNN Architectures - Transfer Learning
Network (CNN)
Day 3
CNN Architectures Transfer Learning
CNN
Architectures
CNN Architecture Decisions
➢ Number of Layers
➢ Number of filters
➢ Filter or Kernel Size
➢ Pooling
➢ Stride
➢ Fully Connected Layers
➢ Regularizers e.g. Batch Norm, Dropout
What are the
Best Practices?
- ~22K Categories
- Human labeled
image-net.org
Top 5% Error Rate
CNN Based
FC 4096
FC 4096
Input 227x227x3
AlexNet
SoftMax
FC 1000
FC 4096
FC 4096
Conv 2
Conv 1
Pool 3x3 S:2 ? Overlapping
? ?
Input Image Max Pool 256
Conv 256 3x3, S:1, P:1 96
227x227x3 5x5
11x11 3x3
Stride=2 Stride = 1
Conv 384 3x3, S:1, P:1 Stride = 4
Padding = 2
1 4 5 2 7
5 3 6 3 6
7 2 1 1 4 - 3 x 3 Filter
- Stride 2
3 9 4 6 7
4 2 5 1 2
Overlapping Max Pool
1 4 5 2 7
5 3 6 3 6
7 2 1 1 4 - 3 x 3 Filter
- Stride 2
3 9 4 6 7
4 2 5 1 2
AlexNet - Overlapping Max Pool
1 4 5 2 7
5 3 6 3 6
7 2 1 1 4 - 3 x 3 Filter
- Stride 2
3 9 4 6 7
4 2 5 1 2
Relu vs tanh
Horizontal Flip
Data Augmentation
Random Crop
Inference Augmentation
CNN Based
FC 4096
FC 4096
Conv 1024 3x3, S:1 - Smaller filter size but more filters
Conv 512 3x3, S:1
- GTX580 , 11-12 Days
Pool 3x3 S:2
Input 224x224x3
Building
Deeper Networks
VGG (2014)
researchgate.net
SoftMax
FC 1000
FC 4096
FC 4096
Pool 3x3
Conv 3x3, 512
All Conv filters : 3x3 stride 1 pad 1
-
Conv 3x3, 128
Conv 3x3, 128
Pool
Conv 3x3, 64
Conv 3x3, 64
Input
What should be Filter Size?
Smaller size
filter OR
Smaller
Receptive field
➢ Pooling
x x x x x
5x5 Filter
With Relu
x
Multi-layer 3x3 Filter
x x x x x
3x3 Filter
With Relu
x x x
3x3 Filter
With Relu
Input Input
30x30x64 30x30x64
Conv 1 Conv 1
64, 3x3 64, 5x5,
S=1, Relu S=1, Relu
Conv 1
64, 3x3, S=1, Relu
3x3x64xx64 + 5x5x64xx64
3x3x64x64 = 25x64x64
= 18x64x64
Increasing Filters with Depth
researchgate.net
Ensembles
Model # 1
Average of
Model # 2 Multiple
Predictions
Model # n
Ensembles in VGG
- VGG16
- VGG19
Reduces Overfitting,
Improves Accuracy
Summary - VGG (2014)
8. Inception
Stacked Inception modules
7. Inception
9 Inception Modules
GoogLeNet
6. Inception
Error rate of 6.7%
5. Inception
4. Inception
3. Inception
2. Inception
1. Inception
-
-
Pool
Conv
Conv
Pool
Conv
Input
Convolution OR Pooling?
Concatenation
Previous Layer
28x28x(128+192+96+256)
=28x28x672
128 1x1 Conv 192 3x3 Conv 96 5x5 Conv 3x3 MaxPool, 5x5 Conv : 28x28x96x5x5x256
S: 1, P:0 S:1, P:1 S:1, P:2 S:1, P:1
Total: 854M
28x28x256
Input
Computationally very very complex
Power of 1x1 Convolution
28x28x256 28x28x32
Reduces depth
Efficient Inception Module
Concatenation
Previous Layer
Efficient Inception Module
28x28x256
Input
GoogLeNet Architecture
Auxiliary Loss
Conv Conv
7 x 7 x 1024 7 x 7 x 1024
Earlier Network
approaches GoogLeNet
approach
FC Layer
Global Average
Pooling
1024
1024 1 x 1 x 1024
Conv Conv
7 x 7 x 1024 7 x 7 x 1024
Earlier Network
approaches GoogLeNet
approach
FC Layer
Global Average
Pooling
1024
Deeper Deep?
Networks
ResNet (2015)
Pool
3x3 Conv 64
3x3 Conv 64
- Ultra deep : 152 Layers
Conv 128 3x3
Pool
Conv 64 7x7
Input
Residual Block
H(X relu
F(X) + X +
)
Conv Conv
X
relu relu
Conv Conv
X X
relu
F(X) + X +
H(x) = F(X) + X
Conv
X
relu F(x) = H(X) - X
Conv
Smaller value,
easier to Optimize
X
Residual Block
Summary - ResNet (2015)
SoftMax
FC 1000
FC 4096
FC 4096
Pool 3x3
Conv 3x3, 512
Conv 3x3, 512
Conv 3x3, 512
Conv 3x3, 512
Pool 3x3
VGGNet
Pool
Conv 3x3, 256
Conv 3x3, 256
Pool
Conv 3x3, 128
Conv 3x3, 128
Pool
Conv 3x3, 64
Conv 3x3, 64
Input
Identifying Flowers
Daisy Roses
Dandelion
Tulips
Sunflowers
Applying Transfer Learning
Daisy
Roses
Fully
Fully
Connected
Connected Dandelion
5
ResNet 200
(SoftMax)
(Frozen Layers) Tulips
Sunflowers
Flatten
Do we keep all Layers
Frozen?
More Options