0% found this document useful (0 votes)

57 views82 pages

GoogleNET and ResNet v4 With Nin and Bias

The document discusses convolutional 3D neural networks and ResNet architectures for image recognition. It describes the components and design of ResNet, including residual blocks, identity mappings, and periodically doubling the number of filters and downsampling spatially using stride.

Uploaded by

5049 Harishchandra Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views82 pages

GoogleNET and ResNet v4 With Nin and Bias

Uploaded by

5049 Harishchandra Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

Convolutional 3D Neural Network (C3D)

Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) winners

Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
• Global Average Pooling is a pooling operation designed to replace fully
connected layers in classical CNNs. The idea is to generate one feature map
for each corresponding category of the classification task in the last
mlpconv layer. Instead of adding fully connected layers on top of the feature
maps, we take the average of each feature map, and the resulting vector is
fed directly into the softmax layer.
• One advantage of global average pooling over the fully connected layers is
that it is more native to the convolution structure by enforcing
correspondences between feature maps and categories. Thus the feature
maps can be easily interpreted as categories confidence maps. Another
advantage is that there is no parameter to optimize in the global average
pooling thus overfitting is avoided at this layer. Furthermore, global average
pooling sums out the spatial information, thus it is more robust to spatial
translations of the input.
An example Likely to overfit the data

36
Underfitting and Overfitting
Underfitting Overfitting

Complexity of a Decision
Tree := number of nodes
It uses

Complexity of the Used Model

Underfitting: when model is too simple, both training and test errors are large
Overfitting: when model is too complex and test errors are large although training errors
are small.
How Overfitting affects Prediction
Predictive
Error

Error on Test Data

Error on Training Data

Model Complexity
How Overfitting affects Prediction
Underfitting Overfitting
Predictive
Error

Error on Test Data

Error on Training Data

Model Complexity

Ideal Range
for Model Complexity
Bias and Variance
• In statistics and machine learning, the bias–variance tradeoff (or dilemma)
is the problem of simultaneously minimizing two sources of error that
prevent supervised learning algorithms from generalizing beyond their
training set:
• The bias is error from erroneous assumptions in the learning algorithm.
High bias can cause an algorithm to miss the relevant relations between
features and target outputs (underfitting).i.e The model class does not
contain the solution.
• The variance is error from sensitivity to small fluctuations in the training
set. High variance can cause overfitting: modeling the random noise in the
training data, rather than the intended outputs.i.e The model is too general
and also learn the noise this is overfitting.

40
Bias and Variance
• Ensemble methods
• Combine learners to reduce variance

from Elder, John. From Trees to Forests and Rule Sets - A Unified
41
Overview of Ensemble Methods. 2007.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep
residual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition (pp. 770-778).
If the identity mapping f(x)=x is the desired underlying mapping, the residual
mapping amounts to g(x)=0 and it is thus easier to learn: we only need to push
the weights and biases of the upper weight layer (e.g., fully connected layer and
convolutional layer) within the dotted-line box to zero.
Residual Blocks

𝑎[𝑙+1]
𝑎[𝑙] 𝑎[𝑙+2]

𝑧 [𝑙+1] = 𝑊 [𝑙+1] 𝑎[𝑙] + 𝑏 [𝑙+1] 𝑎[𝑙+1] = 𝑔(𝑧 [𝑙+1] )

“linear” “relu”

𝑧 [𝑙+2] = 𝑊 [𝑙+2] 𝑎[𝑙+1] + 𝑏 [𝑙+2] 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙

“output” “relu on (output plus input)”
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

3x3 conv, 128
filters and downsample F(x) X 3x3 conv, 128
filters, /2
relu 3x3 conv, 128
spatially with
identity
spatially using stride 2 3x3 conv, 128 stride 2
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64 3x3 conv, 64
3x3 conv, 64
filters
3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2 Beginning
Input conv layer

Intro Results ResNet 1000 Comparison

Softmax
FC 1000 No FC layers
Pool besides FC
ResNet Architecture 3x3 conv, 512
3x3 conv, 512
1000 to
output
classes
3x3 conv, 512
3x3 conv, 512
Full ResNet architecture: Global
relu 3x3 conv, 512 average
- Stack residual blocks 3x3 conv, 512, /2 pooling layer
F(x) + x after last
- Every residual block has ..
conv layer
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64
- No FC layers at the end Residual block 3x3 conv, 64
3x3 conv, 64
(only FC 1000 to output
Pool
classes) 7x7 conv, 64, / 2
Input
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, /2
Total depths of 34, 50, 101, or
..
152 layers for ImageNet .
3x3 conv, 128
3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input
ResNet Architecture
28x28x256
output

For deeper networks 1x1 conv, 256

(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv, 64
(similar to GoogLeNet)
1x1 conv, 64

28x28x256
input
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
ResNet Architecture
28x28x256
output
1x1 conv, 256 filters projects
back to 256 feature maps
For deeper networks (28x28x256) 1x1 conv, 256
(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv operates over
3x3 conv, 64
(similar to GoogLeNet) only 64 feature maps

1x1 conv, 64 filters 1x1 conv, 64

to project to
28x28x64 28x28x256
input
Residual Blocks (skip connections)
Training ResNet in practice
• Batch Normalization after every CONV layer.
• Xavier/2 initialization from He et al.
• SGD + Momentum (0.9)
• Learning rate: 0.1, divided by 10 when validation error
plateaus.
• Mini-batch size 256.
• Weight decay of 1e-5.
• No dropout used.
Case Study: ResNet [He et al., 2015]
ILSVRC 2015 winner (3.6% top 5 error)

Slide from Kaiming He’s recent presentation https://www.youtube.com/watch?v=1PGLj-uKT1w

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Case Study: ResNet [He et al., 2015]
ILSVRC 2015 winner (3.6% top 5 error)

2-3 weeks of training

on 8 GPU machine

at runtime: faster
than a VGGNet!
(even though it has
8x more layers)

(slide from Kaiming He’s recent presentation)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Case Study:
224x224x3
ResNet
[He et al., 2015]
spatial dimension
only 56x56!

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Case Study: ResNet [He et al., 2015]

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016
Comparing complexity...

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Comparing complexity... Inception-v4: Resnet + Inception!

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Technical ResNet
Intro ResNet Results Comparison
details 1000
VGG: Highest
Comparing complexity... memory, most
operations

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Technical ResNet
Intro ResNet Results Comparison
details 1000
GoogLeNet:
Comparing complexity... most efficient

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Technical ResNet
Intro ResNet Results Comparison
details 1000
AlexNet:
Comparing complexity... Smaller compute, still memory
heavy, lower accuracy

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet:
Comparing complexity... Moderate efficiency depending on
model, highest accuracy

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Technical ResNet
Intro ResNet Results Comparison
details 1000
We can take some inspiration from the Inception block of Fig. 8.4.1 which has information flowing through the
block in separate groups. Applying the idea of multiple independent groups to the ResNet block of Fig.
8.6.3 led to the design of ResNeXt (Xie et al., 2017). Different from the smorgasbord of transformations in
Inception, ResNeXt adopts the same transformation in all branches, thus minimizing the need for manual
tuning of each branch.

Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive
into deep learning. arXiv preprint arXiv:2106.11342.
• Breaking up a convolution from ci to co channels into one of g groups of
size ci/g generating g outputs of size co/g is called, quite fittingly, a grouped
convolution. The computational cost (proportionally) is reduced
from O(ci⋅co) to O(g⋅(ci/g)⋅(co/g))=O(ci⋅co/g), i.e., it is g times faster. Even better,
the number of parameters needed to generate the output is also reduced from
a ci×co matrix to g smaller matrices of size (ci/g)×(co/g), again a g times
reduction. In what follows we assume that both ci and co are divisible by g.
• The only challenge in this design is that no information is exchanged between
the g groups. The ResNeXt block of Fig. amends this in two ways: the grouped
convolution with a 3×3 kernel is sandwiched in between two 1×1 convolutions.
The second one serves double duty in changing the number of channels back. The
benefit is that we only pay the O(c⋅b) cost for 1×1 kernels and can make do with
an O(b2/g) cost for 3×3 kernels. Similar to the residual block implementation in,
the residual connection is replaced (thus generalized) by a 1×1 convolution.
“You need a lot of a data if you want to
train/use CNNs”

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 2016

TResNet
No ratings yet
TResNet
37 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
19-ResNet-10-09-2024
No ratings yet
19-ResNet-10-09-2024
35 pages
Aggregated Residual Transformations For Deep Neural Networks
No ratings yet
Aggregated Residual Transformations For Deep Neural Networks
9 pages
Aggregated Residual Transformations For Deep Neural Networks
No ratings yet
Aggregated Residual Transformations For Deep Neural Networks
10 pages
Deep Residual Learning For Image Recognition (Summary)
No ratings yet
Deep Residual Learning For Image Recognition (Summary)
11 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
6 Apr - 6 - DL
No ratings yet
6 Apr - 6 - DL
69 pages
Res Net
No ratings yet
Res Net
46 pages
DL unit 3-5
No ratings yet
DL unit 3-5
44 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
RESNET
No ratings yet
RESNET
5 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
Lecture06 VDL
No ratings yet
Lecture06 VDL
79 pages
ResNet Presentation
No ratings yet
ResNet Presentation
25 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
Alexnet Tugce Kyunghee
No ratings yet
Alexnet Tugce Kyunghee
35 pages
Understanding AlexNet
No ratings yet
Understanding AlexNet
8 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
1512 03385-Cropped
No ratings yet
1512 03385-Cropped
12 pages
He Deep Residual Learning CVPR 2016 Paper PDF
No ratings yet
He Deep Residual Learning CVPR 2016 Paper PDF
9 pages
Ch-3 Convolutional Neural Networks (CNNs)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNs)
11 pages
Res Net
No ratings yet
Res Net
8 pages
Deep Residual Learning For Image Recognition: Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research
No ratings yet
Deep Residual Learning For Image Recognition: Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research
7 pages
Residual Networks
No ratings yet
Residual Networks
13 pages
Age and Gender Classification
No ratings yet
Age and Gender Classification
26 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
DNN Architectures
No ratings yet
DNN Architectures
12 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
CNN Apps
No ratings yet
CNN Apps
17 pages
CV - Deep Convolutional Neural Networks
No ratings yet
CV - Deep Convolutional Neural Networks
55 pages
Unit III
No ratings yet
Unit III
58 pages
Res Net 2
No ratings yet
Res Net 2
40 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
AE556_2024_Topic4_CNN
No ratings yet
AE556_2024_Topic4_CNN
26 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
CNN
No ratings yet
CNN
31 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
20cvprSCNet
No ratings yet
20cvprSCNet
10 pages
Aggregated Residual Transformations For Deep Neural Networks
No ratings yet
Aggregated Residual Transformations For Deep Neural Networks
9 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
37 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Unit-3
No ratings yet
Unit-3
38 pages
UNIT IV_NNDL (3)
No ratings yet
UNIT IV_NNDL (3)
32 pages
Lecture 9 Training Deep Networks
No ratings yet
Lecture 9 Training Deep Networks
20 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
ML Modelling - part 1
No ratings yet
ML Modelling - part 1
7 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Resnets: Background
No ratings yet
Resnets: Background
8 pages
465-Lecture 7 (1)
No ratings yet
465-Lecture 7 (1)
46 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Forensic Meta Data Analysis Rep in Dr Vichi Ganesh Case
No ratings yet
Forensic Meta Data Analysis Rep in Dr Vichi Ganesh Case
3 pages
SP13257 - Janit Bansal - 141227 - CSE - 2018
No ratings yet
SP13257 - Janit Bansal - 141227 - CSE - 2018
48 pages
Clearance Profile Scanner CPS
No ratings yet
Clearance Profile Scanner CPS
2 pages
DSP Unit 1 Lecture Notes
No ratings yet
DSP Unit 1 Lecture Notes
40 pages
LINEAR
No ratings yet
LINEAR
4 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
Ci10049l
No ratings yet
Ci10049l
2 pages
Business Process Management
0% (1)
Business Process Management
16 pages
Introduction To Interval Analysis 1st Edition Ramon E. Moore - The ebook is ready for download with just one simple click
No ratings yet
Introduction To Interval Analysis 1st Edition Ramon E. Moore - The ebook is ready for download with just one simple click
80 pages
Vignan's Lara Institute of Technology &amp Science
No ratings yet
Vignan's Lara Institute of Technology &amp Science
2 pages
Ddca
No ratings yet
Ddca
31 pages
Difference Between Switchings and TDM-FDM
No ratings yet
Difference Between Switchings and TDM-FDM
4 pages
Website Design of Job Description Based On Isco-08 and Calculation of Employee Total Needs Based On Work Load
No ratings yet
Website Design of Job Description Based On Isco-08 and Calculation of Employee Total Needs Based On Work Load
10 pages
Colorimetry
No ratings yet
Colorimetry
6 pages
1 CNC - Ushtrime Shqip Eng Germ PDF
No ratings yet
1 CNC - Ushtrime Shqip Eng Germ PDF
251 pages
3mensio Structural Heart User Manual
No ratings yet
3mensio Structural Heart User Manual
79 pages
Enercalc Manual
No ratings yet
Enercalc Manual
608 pages
Asm Language
No ratings yet
Asm Language
49 pages
Durairaj (6 0)
No ratings yet
Durairaj (6 0)
4 pages
Everbilt Traditional Design Gate Kit, Black (3-pcs) The Home Depot Canada
No ratings yet
Everbilt Traditional Design Gate Kit, Black (3-pcs) The Home Depot Canada
1 page
E-Commerce in Africa The Case of Nigeria PDF
No ratings yet
E-Commerce in Africa The Case of Nigeria PDF
7 pages
Splunk Soar
No ratings yet
Splunk Soar
2 pages
CRM Tasks for Medanta Lead & Tickets
No ratings yet
CRM Tasks for Medanta Lead & Tickets
2 pages
Kozma, R.B. (1991) - "Learning With Media." Review of Educational Research, 61 (2), 179-212
No ratings yet
Kozma, R.B. (1991) - "Learning With Media." Review of Educational Research, 61 (2), 179-212
37 pages
Session 9
No ratings yet
Session 9
18 pages
Cepik Canabarro Borne - 2015 - Cyberwar
No ratings yet
Cepik Canabarro Borne - 2015 - Cyberwar
15 pages
Spectral Analysis: Lecture #1-3
No ratings yet
Spectral Analysis: Lecture #1-3
53 pages
Tokenization On Blockchain
No ratings yet
Tokenization On Blockchain
9 pages
Autocad Web Tutorial
No ratings yet
Autocad Web Tutorial
35 pages

GoogleNET and ResNet v4 With Nin and Bias

Uploaded by

GoogleNET and ResNet v4 With Nin and Bias

Uploaded by

Convolutional 3D Neural Network (C3D)

Complexity of the Used Model

Error on Test Data

Error on Training Data

Error on Test Data

Error on Training Data

𝑧 [𝑙+1] = 𝑊 [𝑙+1] 𝑎[𝑙] + 𝑏 [𝑙+1] 𝑎[𝑙+1] = 𝑔(𝑧 [𝑙+1] )

𝑧 [𝑙+2] = 𝑊 [𝑙+2] 𝑎[𝑙+1] + 𝑏 [𝑙+2] 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

- Periodically, double # of 3x3 conv 3x3 conv, 128

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

Intro Results ResNet 1000 Comparison

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 128

3x3 conv, 128

For deeper networks 1x1 conv, 256

1x1 conv, 64 filters 1x1 conv, 64

Slide from Kaiming He’s recent presentation https://www.youtube.com/watch?v=1PGLj-uKT1w

2-3 weeks of training

(slide from Kaiming He’s recent presentation)

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

You might also like