0% found this document useful (0 votes)

13 views

V05 SS24 DL CNNs Lecture2

Uploaded by

junrunchen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

V05 SS24 DL CNNs Lecture2

Uploaded by

junrunchen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Deep Learning For Computer Vision

Vorlesung SS 2024
Prof. Dr.-Ing. Rainer Stiefelhagen, Dr. Saquib Sarfraz, Dr. Simon Reiß
Maschinensehen für MMI, Institut für Anthropomatik & Robotik
Zentrum für digitale Barrierefreiheit und Assistive Technologien (ACCESS@KIT)
Institut für Anthropomatik und Robotik, Fakultät für Informatik

KIT – Universität des Landes Baden-Württemberg und

nationales Forschungszentrum in der Helmholtz-Gemeinschaft www.kit.edu
Today

VGG (from last week)

CNNs as feature extractor

Newer / better architectures

GoogleNet – Inception Modules
ResNet

Very recent adaptations

Wide ResNet
ResNext
MobileNet

3 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Last Week

Basics of Convolutional Neural Networks (CNNs)

Convolutional Layers
Pooling
Normalization (Batch Normalization)
Non-Linearity: sigmoid, tanh, ReLU

AlexNet (2012) – 8 layers

4 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Modern CNN revolution

Large Scale Visual Recognition challenege (ILSVRC) winners

Deeper Networks

Figure copyright : Keiming He 2016

12 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
image

Going deeper conv3-64

conv3-64
maxpool
16-19 weight layers conv3-128
simple filters conv3-128
maxpool
small receptive fields: 3x3
conv3-256
small filters, reduces #weights conv3-256
conv1-256 convolutional layers
maxpool

conv3-512
top-5 error rate: 7.1% conv3-512
conv1-512
compare against 12-14% maxpool

conv3-512
conv3-512
conv1-512
maxpool
K. Simonyan, A. Zisserman
FC-4096
FC-4096 Very Deep Convolutional Networks for
fully connected layers FC-1000 Large-scale Image Recognition
softmax ICLR 2015

13 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Some results (1)

21 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Some results (2)

22 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Neuron visualization

Filter visualization for Conv 1

Other layers, mean image of top 100 images with largest response
Object blobs

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, A. Oliva

Learning Deep Features for Scene Recognition using Places Database, NIPS 2014.

23 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Using CNN as Feature Extractors: DeCaf

Donahue J., et al.

DeCAF: A Deep Convolutional Activation Feature for Generic Visual
Recognition. ICML 2014

24 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Deep networks as feature extractors

Describe (feature) the image, don’t just classify what is in it

Deep networks automatically learn good features

Hierarchy of filters going from simple edges, to object parts, to objects

Last layer of CNNs, typically the Soft-max

Output of one layer before as feature extractor

Donahue J., et al., DeCAF: A Deep Convolutional Activation Feature for

Generic Visual Recognition. ICML 2014
25 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
DeCAF

Train network end-to-end on image classification (e.g., ImageNet)

Use pre-trained network for
classification for other tasks (e.g., scene recognition)
switch last layer for new task, re-run training for few epochs
feature extractor
remove last layer, use hidden unit values as feature

DeCAF7 features
(4096 dim).

Classifier
Soft-max layer

26 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
More data sets and problems

Fine-grained recognition: CUB 200 Birds data set

Image classification: MIT-67 Indoor scenes

It’s all about the features!

SIFT / HOG were similar
breakthroughs

A. Razavian, et al.
CNN Features off-the-shelf:
An Astounding Baseline for Recognition
DeepVision Workshop @ CVPR 2014 Object instance retrieval: 5 data sets!
28 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
DeepFace

Y. Taigman, M. Yang, M.-A. Ranzato, L. Wolf

DeepFace: Closing the Gap to Human-Level Performance in Face
Verification. CVPR 2014

29 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Deep networks as face feature extractors

3D aligned face image: 152x152 pixels

Convolutional layer C1: 32 filters, 11x11x3 (3 RGB layers)
Max-pooling layer M2: 3x3, stride 2
Convolutional layer C3: 16 filters, 9x9x32
Locally connected filters L4, L5, L6: 4096 dim representation
Fully connected F7, F8: feature representation
No more max pooling, since images are already aligned and only faces

Y. Taigman et al., DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR 2014
30 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Performance

Almost as good as humans!

LFW
face image verification
restricted setting
accuracy: 97.0% (single)
non-deep: 96.3%

YTF
video face verification
accuracy: 91.4%
non-deep: 79.7%

31 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Face Verification
Image verification: Labeled Faces in the Wild (LFW)

Same pair Different pair

Video verification: YouTube Faces (YTF)

Same pair Different pair

32 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Modern CNN revolution

Large Scale Visual Recognition challenege (ILSVRC) winners

Deeper Networks

Figure copyright : Keiming He 2016

37 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
VGGNet (heavy memory more parameters)

38 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014] (ImageNet-Challenge 2014, arxiv 2014)
C. Szegedy et al., Gooing Deeper with Convolutions, CVPR 2015

Deeper networks, with computational

Efficiency

- 22 layers
- Efficient “Inception” module
- No FC layers

- Only 5 million parameters!

12x less than AlexNet

- ILSVRC’14 classification winner

(6.7% top 5 error)

39 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

“Inception module”: design a

good local network topology
(network within a network) and
then stack these modules on
top of each other

Modules inspired by multi-scale

processing

Name inspired by internet meme

40 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

Apply parallel filter operations

on the input from previous
layer:

- Multiple receptive field sizes

for convolution (1x1, 3x3,
5x5)

- Pooling operation (3x3)

Concatenate all filter outputs
together depth-wise
Naive Inception module

41 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

Example Q1: What is the output size of

the 1x1 conv with 128 filters?

Input:
28x28x256
Problem: computational complexity

Naive Inception module

47 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

Example 28x28x(128+192+96+256)= 28x28x672

Conv Ops:
[1x1 conv, 128]
28x28x128 28x28x192 28x28x96 28x28x256 28x28x128x1x1x256
[3x3 conv, 192]
128 192 96 28x28x192x3x3x256
[5x5 conv, 96]
28x28x96x5x5x256
Total: 854M ops
Input:
28x28x256 Very expensive compute
Pooling layer also preserves
feature depth, which means total
Naive Inception module depth after concatenation can
only grow at every layer!

48 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

Example 28x28x(128+192+96+256)= 28x28x672

28x28x128 28x28x192 28x28x96 28x28x256

128 192 96

Input:
28x28x256
Problem: computational complexity

Naive Inception module

49 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
[Szegedy et al. 2014]

Example 28x28x(128+192+96+256)= 28x28x672

28x28x128 28x28x192 28x28x96 28x28x256

128 192 96

Input:
28x28x256
Solution: “bottleneck” layers that
use 1x1 convolutions to reduce
Naive Inception module feature depth

50 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
1 x 1 Conv layer

1x1 CONV
with 32 filters

each filter is 1x1x64, and

performs a 64-dimensional
dot product

51 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
1 x 1 Conv layer

1x1 CONV
with 32 filters

preserves spatial
dimensions, reduces depth!

Projects depth to lower

dimension (combination of
feature maps)

52 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet

1x1 conv
“bottleneck” layers

Naive inception module Inception with dimension reduction

1x1 convolutions included before

expensive 3x3 and 5x5 convolutions

53 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Using same parallel layers as
naive example, and adding “1x1
conv, 64 filter ”bottlenecks”:

Conv Ops:
[1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 128] 28x28x128x1x1x256
[3x3 conv, 192] 28x28x192x3x3x64
[5x5 conv, 96] 28x28x96x5x5x64
[1x1 conv, 64] 28x28x64x1x1x256
Total: 358M ops
Compared to 854M ops for naive
version bottleneck can also reduce
depth after pooling layer

54 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet

Stack Inception modules

with dimension reduction
on top of each other

55 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Full Architecture

Stem Network:
conv - Pool - 2x conv- Pool

56 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Full Architecture

Stacked Inception Modules

57 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Full Architecture

Classifier output:
- no fc layers
- Instead:
Average pooling +
1 linear layer

58 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Global Average Pooling

AlexNet / VGG use two FC-layers towards

end of the network

Now: Global Average Pooling

Average last feature maps to 1x1
Then 1 linear layer and softmax for ~51M params
classification (7x7x1024x1024)

Much less parameters

Better performance than FC layers (+0,6%)

0 params

59 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Full Architecture

Two auxilary loss layers: inject additional gradient at

lower layers:
(AvgPool-1x1Conv-FC-FC-Softmax))

60 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet
Full Architecture

22 total layers with weights (including each parallel

layer in an Inception module)

61 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
GoogleNet Summary

Deeper with
computational efficiency

- 22 layers
- Efficient “Inception”
module

- No FC layers
- 12x less params than
AlexNet

- ILSVRC’14 classification
winner (6.7% top 5 error)

62 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Moderen CNN revolution

Large Scale Visual Recognition challenege (ILSVRC) winners

Depth Revolution

Figure copyright : Keiming He 2016

63 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet
[He et al 2015], K. He et al., Deep Residual Learning for Image
Recognition, CVPR 2016

Very deep networks using

residual connections

- 152-layer model for ImageNet

- ILSVRC’15 classification
winner (3.57% top 5 error)

- Swept all classification and

detection competitions in
ILSVRC’15 and COCO’15

64 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Directly stacking more layers on a plain CNN

Whats Strange ?

Deeper Model performs worse, but its not due to overfitting

65 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Directly stacking more layers on a plain CNN

Deeper Model performs worse, but its not due to overfitting

Test error

(Optimzation Problem) Train error

Deeper models are very hard to optimize.
This would be overfitting

66 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Solution: Use network layers to fit a residual mapping instead of the

direct underlying mapping.

F(x) + x:
element-wise addition

„ … learning residual functions with reference to the layer inputs, instead of learning
unreferenced functions.“
67 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
ResNet

Solution: Use network layers to fit a residual mapping instead of the

direct underlying mapping.

H(x)=F(x)+x

Use layers to fit residual F(x) = H(x) – x instead of H(x) directly

68 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Motivation / Related Work

Hypothesis:
If identity mappings would be optimal (at some late stage in a deep
network), „it would be easier to push the residual to zero, than to fit an
identity mapping by a stack of non-linear layers“
Residual blocks should also help if optimal function is close to an identity
mapping

Modeling residuals, e.g. wrt to some codebooks, has been quite

successful in computer vision
E.g. fisher vectors, VLAD

Shortcut connections have been studied for long time

reduce the vanishing/exploding gradient problem when using many layers
See also LSTMS, later this lecture …
Are also known in biological systems (i.e. in brains)
But most important: it works ☺
69 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
ResNet

Full ResNet architecture

- Stack residual blocks

- Every residual block
has two 3x3 layers

70 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Full ResNet architecture

- Stack residual blocks

- Every residual block
has two 3x3 layers
- Periodically, double # of
filters and downsample
3x3 conv
by 2. 128 filters. /2
with stride 2

3x3 conv 64
filters

71 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Full ResNet architecture

- Stack residual blocks

- Every residual block
has two 3x3 layers
- Periodically, double # of
filters and downsample
3x3 conv
by 2. 128 filters. /2
- Additional conv layer at with stride 2
the begining
3x3 conv 64
filters

72 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Full ResNet architecture

- Stack residual blocks

- Every residual block
has two 3x3 layers
- Periodically, double # of
filters and downsample
3x3 conv
by 2. 128 filters. /2
- Additional conv layer at with stride 2
the begining
- No extra FC layers at 3x3 conv 64
filters
the end.
- Global average pooling
after last conv

73 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet

Total depths tested

34, 50, 101 and 152

layers for Imagenet

74 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet
For deeper nets (50+ layers)
Use bottleneck layer to improve efficiency
Simialr to GoogleNet

75 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
ResNet
ResNet Training

- Batch Normalization after every conv layer

- Xavier initialization
- SGD +Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error
plateaus
- No dropout layer

- ILSCVRC 2015 winner

- Top 5 error 3.6%

- Better than human performance

76 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

77 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison
Inception-v4: ResNet + Inception

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

78 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison
VGG: Highest memory /ops

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

79 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison
GoogleNet: most efficient

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

80 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison
AlexNet: smaller compute
Memory heavy, low accuracy

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

81 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Complexity Comparison
ResNet:moderate compute %
memory, highest accuracy

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, An Anylysis of Deep

Neural Netowrks Models for practical application , 2017 (arxiv)

82 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
More Architectures : Current Improvements

83 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Improving ResNet

Identity Mappings in Deep Residual

Networks [He et al 2016]

➢ Improved ResNet block design from

creators of ResNet

➢ Creates a more direct path for

propagating information throughout
network (moves activation to residual
mapping pathway)

➢ Gives better performance

84 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Improving ResNet

Aggregated Residual Transformations for Deep

Neural Networks (ResNeXt)
[Xie et al 2016]

➢ Also from the

creators of
ResNet

➢ Increases width
of residual block
through multiple
parallel pathways

➢ Parallel pathways
similar in spirit to
Inception module

86 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Beyond ResNets

Densely connected CNN DenseNet

[Huang et al. 2017]

➢ Dense blocks where

each layer is connected
to every other layer in
feedforward fashion

➢ Alleviates vanishing
gradient, strengthens
feature propagation,
encourages feature
reuse

87 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
MobileNetv1

From Google
- Useful for mobile and embedded
vision applications
- Smaller model size (fewer params)
- Smaller complexity (fewer Multiply-
additions)

Main Idea:

- Depthwise Separable Convolution

Howard et al. : MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
Applications. arXiv 2018

89 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
MobileNet - Depthwise Separable Convolution

Separable convolution
Factor the conv kernel by two operations
- Depthwise conv & pointwise conv

90 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
MobileNet - Depthwise Separable Convolution

Compute for a normal convolution to

produce output 8x8x256

- We need 256 5x5x3 kernels

Total compute: 256x5x5x3x8x8 =
1,228,800 multiplications
Depthwise conv with 3 kernels
Compute for a depthwise separable
convolution to produce output 8x8x256

- Depthwise conv: 3 5x5x1 kernels

- Compute: 3x5x5xx8x8 = 4800 Pointwise conv with 256 kernels
- Pointwise conv: 256 1x1x3 kernels
- compute: 256x1x1x3x8x8 =
49152
- Total : 53,952 multiplications

91 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
MobileNet Performance

92 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Take Home Messages : CNN Architectures

VGG, GoogLeNet, ResNet all in wide use, available in different deep

learn platforms

ResNet and its variants current best default (as of ~2017)

Trend towards extremely deep networks

Significant research centers around design of layer / skip connections

and improving gradient flow

More recent trend towards examining necessity of depth vs. width and
residual connections

93 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Take Home Messages : CNN Architectures

VGG, GoogLeNet, ResNet all in wide use, available in different deep

learn platforms

ResNet and its variants are the current best default

Significant research centers around design of layer / skip connections

and improving gradient flow

More recent trend towards examining necessity of depth vs. width and
residual connections

Neural Architecture Search (NAS-Net) search for best building blocks

for a particular application/dataset [see some reading resources here:
[https://github.com/anonymone/Neural-Architecture-Search ]

94 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
References

Key Papers:
GoogleNet: C. Szegedy et al., Gooing Deeper with Convolutions, CVPR
2015 (arxiv, ImageNet-Challenge 2014)
ResNet: K. He et al., Deep Residual Learning for Image Recognition,
CVPR 2016 (arxiv, 2015)

Additional
See the previous slides …

95 Deep Learning in CV - CNNs Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik

Winning Space Race With Data Science
No ratings yet
Winning Space Race With Data Science
46 pages
V04 SS24 DL CNNs Lecture
No ratings yet
V04 SS24 DL CNNs Lecture
68 pages
V02 SS24 DLforCV NN Basics Teil1
No ratings yet
V02 SS24 DLforCV NN Basics Teil1
68 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
5b Dana
No ratings yet
5b Dana
67 pages
CV Ss16 0609 Deep Learning
No ratings yet
CV Ss16 0609 Deep Learning
91 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
Deep_Learning_for_Computer_Vision
No ratings yet
Deep_Learning_for_Computer_Vision
1 page
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
Basics of Machine Learning and Deep Learning
No ratings yet
Basics of Machine Learning and Deep Learning
2 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
CVDL
No ratings yet
CVDL
3 pages
Systematic Evaluation of Convolution Neural Network Advances On The Imagenet-2017
No ratings yet
Systematic Evaluation of Convolution Neural Network Advances On The Imagenet-2017
9 pages
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
No ratings yet
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
235 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
DeepLearning_RobotVision
No ratings yet
DeepLearning_RobotVision
9 pages
DLCV Ch3 Convolutional Neural Network
No ratings yet
DLCV Ch3 Convolutional Neural Network
45 pages
CNN Apps
No ratings yet
CNN Apps
17 pages
CSE Deep Learning Seminar Report
No ratings yet
CSE Deep Learning Seminar Report
4 pages
ch4_CNN
No ratings yet
ch4_CNN
35 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
Tese Abello
No ratings yet
Tese Abello
94 pages
Application of CNN For Image Classification On Pascal VOC Challenge 2012 Dataset
No ratings yet
Application of CNN For Image Classification On Pascal VOC Challenge 2012 Dataset
6 pages
Unit III
No ratings yet
Unit III
58 pages
Literature Review On Image Classification Architecture
No ratings yet
Literature Review On Image Classification Architecture
14 pages
CV Course
No ratings yet
CV Course
33 pages
cnn (1)_unit 3_merged
No ratings yet
cnn (1)_unit 3_merged
14 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
Szegedy Rethinking The Inception CVPR 2016 Paper
No ratings yet
Szegedy Rethinking The Inception CVPR 2016 Paper
9 pages
MA - Koelbl Memoire CNN
No ratings yet
MA - Koelbl Memoire CNN
79 pages
A Simple Single-Scale Vision Transformer For Object Localization
No ratings yet
A Simple Single-Scale Vision Transformer For Object Localization
12 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Master en Creació Multimedia: Video Analytics
No ratings yet
Master en Creació Multimedia: Video Analytics
62 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Chitra k s 2022bcse07aed1011
No ratings yet
Chitra k s 2022bcse07aed1011
21 pages
DL 4
No ratings yet
DL 4
5 pages
CV - Deep Convolutional Neural Networks
No ratings yet
CV - Deep Convolutional Neural Networks
55 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Image Classification Using Convolutional Neural Networks (CNNS)
No ratings yet
Image Classification Using Convolutional Neural Networks (CNNS)
61 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Tiny Object Recognition
No ratings yet
Tiny Object Recognition
8 pages
9 CNN
No ratings yet
9 CNN
28 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
CNN For Computer Vision
No ratings yet
CNN For Computer Vision
81 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
CNN 2
No ratings yet
CNN 2
47 pages
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning for Computer Vision A Brief Review
No ratings yet
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning for Computer Vision A Brief Review
13 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
Very Deep Convolutional Networks For Large-Scale Image Recognition
No ratings yet
Very Deep Convolutional Networks For Large-Scale Image Recognition
14 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Atelier2
No ratings yet
Atelier2
2 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Data Mining Modul 3 Notes
No ratings yet
Data Mining Modul 3 Notes
3 pages
Fake News Detection Using Machine Learning and Natural Language Processing
No ratings yet
Fake News Detection Using Machine Learning and Natural Language Processing
4 pages
CE 2019 Datesheet
No ratings yet
CE 2019 Datesheet
14 pages
Artificial Intelligence September Month Notes
No ratings yet
Artificial Intelligence September Month Notes
17 pages
Change Point Detection in Time Series Data With Random Forests
No ratings yet
Change Point Detection in Time Series Data With Random Forests
13 pages
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
AMLATA2020_044
No ratings yet
AMLATA2020_044
11 pages
Assignment EE5179 ME20B145 Report
No ratings yet
Assignment EE5179 ME20B145 Report
6 pages
Fault Detection in Wireless Sensor Network Based On Deep Learning Algorithms
No ratings yet
Fault Detection in Wireless Sensor Network Based On Deep Learning Algorithms
8 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
ML Supervised Regression
No ratings yet
ML Supervised Regression
70 pages
An Ontology Oriented Service Framework For Social IoT
No ratings yet
An Ontology Oriented Service Framework For Social IoT
11 pages
Module3-Similarity-based Learning-11Mar2024
No ratings yet
Module3-Similarity-based Learning-11Mar2024
34 pages
Synopsis Phase 1
No ratings yet
Synopsis Phase 1
14 pages
Handout 1
No ratings yet
Handout 1
5 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
First Puc PART-1 STATISTICS FOR ECONOMICS
100% (2)
First Puc PART-1 STATISTICS FOR ECONOMICS
47 pages
Asha Internship Body
No ratings yet
Asha Internship Body
22 pages
Documentation For GPML Matlab Code
No ratings yet
Documentation For GPML Matlab Code
10 pages
Python NLP
No ratings yet
Python NLP
6 pages
AI unit 5 notes
No ratings yet
AI unit 5 notes
35 pages
Machine Learning: Chapter 2 Clustering
No ratings yet
Machine Learning: Chapter 2 Clustering
23 pages
Nitin Project1
No ratings yet
Nitin Project1
37 pages
IPC Athlete Classification Code
No ratings yet
IPC Athlete Classification Code
17 pages
Ias and Airness: Train/Test Mismatch
No ratings yet
Ias and Airness: Train/Test Mismatch
12 pages
ch-9 Advanced Classes
No ratings yet
ch-9 Advanced Classes
28 pages
Automatic Fruit Image Recognition System Based On Shape and Color Features
No ratings yet
Automatic Fruit Image Recognition System Based On Shape and Color Features
2 pages
Reshaping 3PL Operations: Machine Learning Approaches To Mitigate and Manage Damage Parameters
No ratings yet
Reshaping 3PL Operations: Machine Learning Approaches To Mitigate and Manage Damage Parameters
12 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages