SlideShare a Scribd company logo
Presented by ChanHyuk Lee
2021/06/13
Computer Graphics @ Korea University
EfficientDet
MingxingTan et al.
CVPR 2020
517 citation
1/
CONTENTS
Introduction
01
Related work
02
Proposed method
03
Experiments
04
Ablation study
05
Conclusion
05
2
3
Background
Detection architecture
00
Backbone network FPN Prediction Network
Box prediction
(Regression)
Class prediction
(Classification)
Backbone network Feature Pyramid Network Prediction network
Introduction
โ€ข Recent detectors have the trade-off between accuracy and efficiency
โ€ข Most previous works only focus on a specific or a small range of resource requirements
โ€ข This points make hard to apply the recent detection models on industry field
โ€ข โ€œIs it possible to build a scalable detection architecture with both higher
accuracy and better efficiency across a wide spectrum of resource constraints?โ€
Motivation
01
4
Introduction
Challenge 1. Efficient multi-scale feature fusion
01
5
โ€ข Feature fusion : The method for combining feature maps
โ†’ Normal feature fusion methods donโ€™t care about feature resolution.
Challenge 2 : Model scaling
โ€ข Model scaling : The method for up-scaling the model architecture
โ†’ Limitation of up-scaling by considering one factor
Input-image up-scaling
Network up-scaling
02
Introduction
6
Related work
Multi-scale feature representation
01
Conv
Conv
Conv
Conv
Up scaling
Up scaling
Up scaling
1x1 Conv
1x1 Conv
1x1 Conv
1x1 Conv
Prediction
Prediction
Prediction
Prediction
Backbone
Feature
pyramid
๐’‘๐Ÿ’๐’๐’–๐’•
๐’‘๐Ÿ‘๐’๐’–๐’•
๐’‘๐Ÿ๐’๐’–๐’•
๐’‘๐Ÿ๐’๐’–๐’•
๐’‘๐Ÿ’
๐’‘๐Ÿ‘
๐’‘๐Ÿ
๐’‘๐Ÿ
7
โ€ข For considering multi-scale object
Area
Prediction
layer
Related work
Model scaling
02
โ€ข EfficientNet (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Mingxing Tan et al, ICML 2019)
โ€ข Jointly Scale up the depth, width, resolution (Compound scaling)
8
๐‘“
๐‘“
๐‘“
๐‘“
๐ท๐‘’๐‘๐‘กโ„Ž
๐ผ๐‘›๐‘๐‘ข๐‘ก ๐‘Ÿ๐‘’๐‘ ๐‘œ๐‘™๐‘ข๐‘ก๐‘–๐‘œ๐‘›
Q&A
9
Proposed method
01 RetinaNet architecture
10
02 EfficientDet architecture
BiFPN : Efficient bidirectional cross-scale connections and weighted feature fusion
Problem formulation
01
11
โ€ข Delete two blocks (compared to PANet)
โ€ข Add skip connection
โ€ข Weighted feature fusion
โ€ข Repeat BiFPN Layers
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
BiFPN
Weighted Feature Fusion
02
โ€ข The difference of Resolution between Inputs โ†’ Different degrees of contribution to output
โ€ข Gave each input feature a weight to learn the contribution of the input feature.
๐‘ถ๐’–๐’•๐’‘๐’–๐’•
๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†
๐‘พ๐’†๐’Š๐’ˆ๐’‰๐’•๐’Š ๐‘ฐ๐’๐’‘๐’–๐’•
๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†๐’Š
๐‘บ๐’๐’‡๐’•๐’Ž๐’‚๐’™ โˆ’ ๐’ƒ๐’‚๐’”๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ ๐‘ญ๐’‚๐’”๐’• ๐’๐’๐’“๐’Ž๐’‚๐’๐’Š๐’›๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’
(30% Speed Gain in GPU)
12
EfficientDet
EfficientDet Architecture
01
โ€ข Using the efficientNet trained by ImageNet Data as backbone
โ€ข The Prediction layer networkโ€™s weights is shared for all Level features
13
EfficientDet
Compound scaling
02
โ€ข Previous works mostly scale up baseline network or using larger image inputs, stacking
more FPN layers
โ€ข New compound scaling method jointly scale up all dimensions of backbone network, BiFPN
network, prediction network and resolution of input.
Backbone network
02-1
โ€ข Reuse the same width/depth scaling coefficients of EfficientNet-B0 to B6
BiFPN network
02-2
โ€ข Perform grid search for finding best factor value on a list of values {1.2, 1.25, 1.3, 1.35, 1.4, 1.45}
๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘โ„Ž๐‘Ž๐‘›๐‘›๐‘’๐‘™ ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ
14
EfficientDet
Prediction network
02-3
โ€ข The width of network is same as BiFPN network's width
๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ
Input image resolution
02-4
Overall scaling output
02-5
15
Q&A
16
Experiments
Experiment configuration
01
โ€ข Dataset : COCO 2017 datasets with 118K images
โ€ข Optimizer : SGD with momentum 0.9 and weight decay 4e-5
โ€ข Learning Rate : 0 to 0.16 (First epoch), annealed down using cosine decay rule (0~0.16 ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก)
โ€ข Batch normalization is used after every convolution layer
โ€ข Every convolution layer is depth-wise conv layer
โ€ข Activation function : Swish (๐‘ฅ โˆ— ๐‘†๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐›ฝ๐‘ฅ))
โ€ข Augmentation : Multi-resolution cropping / scaling / flipping
17
Experiments
Loss function
02
โ€ข Using Focal-loss for detection
โ€ข Class imbalanced problem is most effected by easy negative samples
โ€ข Training by focusing on hard samples
โ€ข If ๐‘๐‘ก is almost 1 โ†’ โˆ’ 1 โˆ’ 0.999 ๐‘Ÿ
๐‘™๐‘œ๐‘” ๐‘๐‘ก โ‰ˆ 0
โ€ข Else โ†’ โˆ’ 1 โˆ’ 0.001 ๐‘Ÿ ๐‘™๐‘œ๐‘”(๐‘๐‘ก) โ‰ˆ โˆž
๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘œ๐‘“
๐‘๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘–๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
18
Experiments
Performance on COCO
03
โ€ข Latency is inference latency with batch size 1
โ€ข AA denotes Auto-Augmentation
19
Experiments
Model size and inference latency comparison
04
โ€ข The comparison result of using GPU (Titan-V), CPU (Xeon)
20
Experiments
EfficientDet for Semantic Segmentation
05
โ€ข Use P2 Layer in BiFPN for semantic segmentation in EfficientDet-D4 model
DeepLabv3
21
Ablation study
Disentangling Backbone and BiFPN
01
โ€ข The Backbone network and multi-feature network of EfficientDet achieves higher AP and
Efficiency than prior networks
22
Ablation study
BiFPN Cross Scale Connection
02
โ€ข For the fair comparison, FPN and PANet are repeated multiple times and change the conv.
โ€ข BiFPN achieves the best accuracy with fewer parameters and FLOPs
23
Ablation study
Softmax vs Fast Normalized fusion
03
โ€ข Fast normalized fusion approach achieves similar accuracy as the softmax-based method
โ€ข Figure 5 illustrates the learned weights for three feature fusion nodes
24
Ablation study
Compound Scaling
04
โ€ข EfficientDet jointly scale up the networkโ€™s backbone, BiFPN, prediction net, input resolution
โ€ข The proposed method achieves the best accuracy than other scaling method
25
Conclusion
Propose the weight bidirectional feature network and customized compound scaling
method, in order to improve accuracy and efficiency
01
EfficientDet achieves better accuracy and efficiency than the prior art across a wide
spectrum of resource constrains
02
EfficientDet achieves SOTA accuracy with much fewer parameters and FLOPs in object
detection and semantic segmentation
03
26
THANK
YOU
27

More Related Content

What's hot (19)

PPTX
Spectral clustering Tutorial
Zitao Liu
ย 
PDF
(DL Hacks่ผช่ชญ) How transferable are features in deep neural networks?
Masahiro Suzuki
ย 
PPTX
Yolo
NEHA Kapoor
ย 
PPTX
ใ‹ใ‚“ใŸใ‚“ใƒ™ใ‚ธใ‚งๆ›ฒ็ทš
Yu(u)ki IWABUCHI
ย 
PPT
Quantum Computing Lecture 1: Basic Concepts
Melanie Swan
ย 
PPTX
Vgg
heedaeKwon
ย 
PDF
ๆ•ตๅฏพ็š„ๅญฆ็ฟ’ใซๅฏพใ™ใ‚‹ใƒฉใƒ‡ใƒžใƒƒใƒ่ค‡้›‘ๅบฆ
Masa Kato
ย 
PPTX
CNN Tutorial
Sungjoon Choi
ย 
PDF
PRML 5็ซ  PP.227-PP.247
Tomoki Hayashi
ย 
PDF
Cellular Automata
Kirthi Balakrishnan
ย 
PDF
Artificial neural networks
stellajoseph
ย 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
ย 
PDF
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
ย 
PDF
Disco Presents ใƒ‡ใ‚ฃใ‚นใ‚ซใƒใƒชใƒผใƒใƒฃใƒณใƒใƒซใƒ—ใƒญใ‚ฐใƒฉใƒŸใƒณใ‚ฐใ‚ณใƒณใƒ†ใ‚นใƒˆ2016 ๆœฌ้ธ ่งฃ่ชฌ
AtCoder Inc.
ย 
PDF
[DL่ผช่ชญไผš]Towards End-to-End Prosody Transfer for Expressive Speech Synthesis wi...
Deep Learning JP
ย 
PDF
Quantum computing meghaditya
Meghaditya Roy Chaudhury
ย 
PPTX
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji
ย 
PDF
Mobilenetv1 v2 slide
ๅจๆ™บ ้ปƒ
ย 
PPTX
MIRU2014 tutorial deeplearning
Takayoshi Yamashita
ย 
Spectral clustering Tutorial
Zitao Liu
ย 
(DL Hacks่ผช่ชญ) How transferable are features in deep neural networks?
Masahiro Suzuki
ย 
Yolo
NEHA Kapoor
ย 
ใ‹ใ‚“ใŸใ‚“ใƒ™ใ‚ธใ‚งๆ›ฒ็ทš
Yu(u)ki IWABUCHI
ย 
Quantum Computing Lecture 1: Basic Concepts
Melanie Swan
ย 
Vgg
heedaeKwon
ย 
ๆ•ตๅฏพ็š„ๅญฆ็ฟ’ใซๅฏพใ™ใ‚‹ใƒฉใƒ‡ใƒžใƒƒใƒ่ค‡้›‘ๅบฆ
Masa Kato
ย 
CNN Tutorial
Sungjoon Choi
ย 
PRML 5็ซ  PP.227-PP.247
Tomoki Hayashi
ย 
Cellular Automata
Kirthi Balakrishnan
ย 
Artificial neural networks
stellajoseph
ย 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
ย 
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
ย 
Disco Presents ใƒ‡ใ‚ฃใ‚นใ‚ซใƒใƒชใƒผใƒใƒฃใƒณใƒใƒซใƒ—ใƒญใ‚ฐใƒฉใƒŸใƒณใ‚ฐใ‚ณใƒณใƒ†ใ‚นใƒˆ2016 ๆœฌ้ธ ่งฃ่ชฌ
AtCoder Inc.
ย 
[DL่ผช่ชญไผš]Towards End-to-End Prosody Transfer for Expressive Speech Synthesis wi...
Deep Learning JP
ย 
Quantum computing meghaditya
Meghaditya Roy Chaudhury
ย 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji
ย 
Mobilenetv1 v2 slide
ๅจๆ™บ ้ปƒ
ย 
MIRU2014 tutorial deeplearning
Takayoshi Yamashita
ย 

Similar to [2020 CVPR Efficient DET paper review] (20)

PPTX
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
ssuser2624f71
ย 
PDF
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
ย 
PPTX
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
ย 
PDF
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
ย 
PDF
Efficient de cvpr_2020_paper
shanullah3
ย 
PDF
ResNeSt: Split-Attention Networks
Seunghyun Hwang
ย 
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
ย 
PDF
Efficient_DNN_pruning_System_for_machine_learning.pdf
rppay777
ย 
PDF
Mixed Precision Training Review
LEE HOSEONG
ย 
PDF
OBDPC 2022
klepsydratechnologie
ย 
PPTX
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
NECST Lab @ Politecnico di Milano
ย 
PDF
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
InVID Project
ย 
PPTX
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
ย 
PDF
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
ย 
PPTX
150807 Fast R-CNN
Junho Cho
ย 
PDF
Deep Learning Initiative @ NECSTLab
NECST Lab @ Politecnico di Milano
ย 
PDF
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
ย 
PPTX
GoogLeNet.pptx
ssuser2624f71
ย 
PDF
ๆทฑๅบฆๅญธ็ฟ’ๅœจAOI็š„ๆ‡‰็”จ
CHENHuiMei
ย 
PPTX
Model compression
Nanhee Kim
ย 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
ssuser2624f71
ย 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
ย 
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
ย 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
ย 
Efficient de cvpr_2020_paper
shanullah3
ย 
ResNeSt: Split-Attention Networks
Seunghyun Hwang
ย 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
ย 
Efficient_DNN_pruning_System_for_machine_learning.pdf
rppay777
ย 
Mixed Precision Training Review
LEE HOSEONG
ย 
OBDPC 2022
klepsydratechnologie
ย 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
NECST Lab @ Politecnico di Milano
ย 
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
InVID Project
ย 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
ย 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
ย 
150807 Fast R-CNN
Junho Cho
ย 
Deep Learning Initiative @ NECSTLab
NECST Lab @ Politecnico di Milano
ย 
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
ย 
GoogLeNet.pptx
ssuser2624f71
ย 
ๆทฑๅบฆๅญธ็ฟ’ๅœจAOI็š„ๆ‡‰็”จ
CHENHuiMei
ย 
Model compression
Nanhee Kim
ย 
Ad

More from taeseon ryu (20)

PDF
VoxelNet
taeseon ryu
ย 
PDF
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
ย 
PPTX
3D Gaussian Splatting
taeseon ryu
ย 
PDF
JetsonTX2 Python
taeseon ryu
ย 
PPTX
Hyperbolic Image Embedding.pptx
taeseon ryu
ย 
PDF
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
taeseon ryu
ย 
PDF
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
ย 
PDF
YOLO V6
taeseon ryu
ย 
PDF
Dataset Distillation by Matching Training Trajectories
taeseon ryu
ย 
PDF
RL_UpsideDown
taeseon ryu
ย 
PDF
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
ย 
PPTX
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
ย 
PDF
Scaling Instruction-Finetuned Language Models
taeseon ryu
ย 
PDF
Visual prompt tuning
taeseon ryu
ย 
PDF
mPLUG
taeseon ryu
ย 
PDF
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
ย 
PDF
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
ย 
PDF
The Forward-Forward Algorithm
taeseon ryu
ย 
PPTX
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
ย 
PDF
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
ย 
VoxelNet
taeseon ryu
ย 
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
ย 
3D Gaussian Splatting
taeseon ryu
ย 
JetsonTX2 Python
taeseon ryu
ย 
Hyperbolic Image Embedding.pptx
taeseon ryu
ย 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
taeseon ryu
ย 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
ย 
YOLO V6
taeseon ryu
ย 
Dataset Distillation by Matching Training Trajectories
taeseon ryu
ย 
RL_UpsideDown
taeseon ryu
ย 
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
ย 
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
ย 
Scaling Instruction-Finetuned Language Models
taeseon ryu
ย 
Visual prompt tuning
taeseon ryu
ย 
mPLUG
taeseon ryu
ย 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
ย 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
ย 
The Forward-Forward Algorithm
taeseon ryu
ย 
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
ย 
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
ย 
Ad

Recently uploaded (20)

PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
ย 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
ย 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
ย 
PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
ย 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
ย 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
ย 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
ย 
PPTX
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
ย 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
ย 
PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
ย 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
ย 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
ย 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
ย 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
ย 
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
ย 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
ย 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
ย 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
ย 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
ย 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
ย 
Top Civil Engineer Canada Services111111
nengineeringfirms
ย 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
ย 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
ย 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
ย 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
ย 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
ย 
Real Life Application of Set theory, Relations and Functions
manavparmar205
ย 
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
ย 
Data Security Breach: Immediate Action Plan
varmabhuvan266
ย 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
ย 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
ย 
Introduction to Data Analytics and Data Science
KavithaCIT
ย 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
ย 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
ย 
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
ย 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
ย 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
ย 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
ย 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
ย 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
ย 

[2020 CVPR Efficient DET paper review]

  • 1. Presented by ChanHyuk Lee 2021/06/13 Computer Graphics @ Korea University EfficientDet MingxingTan et al. CVPR 2020 517 citation 1/
  • 3. 3 Background Detection architecture 00 Backbone network FPN Prediction Network Box prediction (Regression) Class prediction (Classification) Backbone network Feature Pyramid Network Prediction network
  • 4. Introduction โ€ข Recent detectors have the trade-off between accuracy and efficiency โ€ข Most previous works only focus on a specific or a small range of resource requirements โ€ข This points make hard to apply the recent detection models on industry field โ€ข โ€œIs it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints?โ€ Motivation 01 4
  • 5. Introduction Challenge 1. Efficient multi-scale feature fusion 01 5 โ€ข Feature fusion : The method for combining feature maps โ†’ Normal feature fusion methods donโ€™t care about feature resolution. Challenge 2 : Model scaling โ€ข Model scaling : The method for up-scaling the model architecture โ†’ Limitation of up-scaling by considering one factor Input-image up-scaling Network up-scaling 02
  • 7. Related work Multi-scale feature representation 01 Conv Conv Conv Conv Up scaling Up scaling Up scaling 1x1 Conv 1x1 Conv 1x1 Conv 1x1 Conv Prediction Prediction Prediction Prediction Backbone Feature pyramid ๐’‘๐Ÿ’๐’๐’–๐’• ๐’‘๐Ÿ‘๐’๐’–๐’• ๐’‘๐Ÿ๐’๐’–๐’• ๐’‘๐Ÿ๐’๐’–๐’• ๐’‘๐Ÿ’ ๐’‘๐Ÿ‘ ๐’‘๐Ÿ ๐’‘๐Ÿ 7 โ€ข For considering multi-scale object Area Prediction layer
  • 8. Related work Model scaling 02 โ€ข EfficientNet (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Mingxing Tan et al, ICML 2019) โ€ข Jointly Scale up the depth, width, resolution (Compound scaling) 8 ๐‘“ ๐‘“ ๐‘“ ๐‘“ ๐ท๐‘’๐‘๐‘กโ„Ž ๐ผ๐‘›๐‘๐‘ข๐‘ก ๐‘Ÿ๐‘’๐‘ ๐‘œ๐‘™๐‘ข๐‘ก๐‘–๐‘œ๐‘›
  • 10. Proposed method 01 RetinaNet architecture 10 02 EfficientDet architecture
  • 11. BiFPN : Efficient bidirectional cross-scale connections and weighted feature fusion Problem formulation 01 11 โ€ข Delete two blocks (compared to PANet) โ€ข Add skip connection โ€ข Weighted feature fusion โ€ข Repeat BiFPN Layers ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค
  • 12. BiFPN Weighted Feature Fusion 02 โ€ข The difference of Resolution between Inputs โ†’ Different degrees of contribution to output โ€ข Gave each input feature a weight to learn the contribution of the input feature. ๐‘ถ๐’–๐’•๐’‘๐’–๐’• ๐’‡๐’†๐’‚๐’•๐’–๐’“๐’† ๐‘พ๐’†๐’Š๐’ˆ๐’‰๐’•๐’Š ๐‘ฐ๐’๐’‘๐’–๐’• ๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†๐’Š ๐‘บ๐’๐’‡๐’•๐’Ž๐’‚๐’™ โˆ’ ๐’ƒ๐’‚๐’”๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ ๐‘ญ๐’‚๐’”๐’• ๐’๐’๐’“๐’Ž๐’‚๐’๐’Š๐’›๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ (30% Speed Gain in GPU) 12
  • 13. EfficientDet EfficientDet Architecture 01 โ€ข Using the efficientNet trained by ImageNet Data as backbone โ€ข The Prediction layer networkโ€™s weights is shared for all Level features 13
  • 14. EfficientDet Compound scaling 02 โ€ข Previous works mostly scale up baseline network or using larger image inputs, stacking more FPN layers โ€ข New compound scaling method jointly scale up all dimensions of backbone network, BiFPN network, prediction network and resolution of input. Backbone network 02-1 โ€ข Reuse the same width/depth scaling coefficients of EfficientNet-B0 to B6 BiFPN network 02-2 โ€ข Perform grid search for finding best factor value on a list of values {1.2, 1.25, 1.3, 1.35, 1.4, 1.45} ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘โ„Ž๐‘Ž๐‘›๐‘›๐‘’๐‘™ ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ 14
  • 15. EfficientDet Prediction network 02-3 โ€ข The width of network is same as BiFPN network's width ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ Input image resolution 02-4 Overall scaling output 02-5 15
  • 17. Experiments Experiment configuration 01 โ€ข Dataset : COCO 2017 datasets with 118K images โ€ข Optimizer : SGD with momentum 0.9 and weight decay 4e-5 โ€ข Learning Rate : 0 to 0.16 (First epoch), annealed down using cosine decay rule (0~0.16 ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก) โ€ข Batch normalization is used after every convolution layer โ€ข Every convolution layer is depth-wise conv layer โ€ข Activation function : Swish (๐‘ฅ โˆ— ๐‘†๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐›ฝ๐‘ฅ)) โ€ข Augmentation : Multi-resolution cropping / scaling / flipping 17
  • 18. Experiments Loss function 02 โ€ข Using Focal-loss for detection โ€ข Class imbalanced problem is most effected by easy negative samples โ€ข Training by focusing on hard samples โ€ข If ๐‘๐‘ก is almost 1 โ†’ โˆ’ 1 โˆ’ 0.999 ๐‘Ÿ ๐‘™๐‘œ๐‘” ๐‘๐‘ก โ‰ˆ 0 โ€ข Else โ†’ โˆ’ 1 โˆ’ 0.001 ๐‘Ÿ ๐‘™๐‘œ๐‘”(๐‘๐‘ก) โ‰ˆ โˆž ๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘œ๐‘“ ๐‘๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘–๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘› 18
  • 19. Experiments Performance on COCO 03 โ€ข Latency is inference latency with batch size 1 โ€ข AA denotes Auto-Augmentation 19
  • 20. Experiments Model size and inference latency comparison 04 โ€ข The comparison result of using GPU (Titan-V), CPU (Xeon) 20
  • 21. Experiments EfficientDet for Semantic Segmentation 05 โ€ข Use P2 Layer in BiFPN for semantic segmentation in EfficientDet-D4 model DeepLabv3 21
  • 22. Ablation study Disentangling Backbone and BiFPN 01 โ€ข The Backbone network and multi-feature network of EfficientDet achieves higher AP and Efficiency than prior networks 22
  • 23. Ablation study BiFPN Cross Scale Connection 02 โ€ข For the fair comparison, FPN and PANet are repeated multiple times and change the conv. โ€ข BiFPN achieves the best accuracy with fewer parameters and FLOPs 23
  • 24. Ablation study Softmax vs Fast Normalized fusion 03 โ€ข Fast normalized fusion approach achieves similar accuracy as the softmax-based method โ€ข Figure 5 illustrates the learned weights for three feature fusion nodes 24
  • 25. Ablation study Compound Scaling 04 โ€ข EfficientDet jointly scale up the networkโ€™s backbone, BiFPN, prediction net, input resolution โ€ข The proposed method achieves the best accuracy than other scaling method 25
  • 26. Conclusion Propose the weight bidirectional feature network and customized compound scaling method, in order to improve accuracy and efficiency 01 EfficientDet achieves better accuracy and efficiency than the prior art across a wide spectrum of resource constrains 02 EfficientDet achieves SOTA accuracy with much fewer parameters and FLOPs in object detection and semantic segmentation 03 26