SlideShare a Scribd company logo
Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
Object Detection
Day 2 Lecture 5
Object Detection
CAT, DOG, DUCK
The task of assigning a
label and a bounding box
to all objects in the image
2
Object Detection: Datasets
3
20 categories
6k training images
6k validation images
10k test images
200 categories
456k training images
60k validation + test images
80 categories
200k training images
60k val + test images
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
4
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
5
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
6
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
7
Object Detection as Classification
Problem:
Too many positions & scales to test
Solution: If your classifier is fast enough, go for it
8
Object Detection as Classification
HOG: Histogram of Oriented Gradients
Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. CVPR 2005 9
Deformable Part Model
Felzenszwalb et al, Object Detection with Discriminatively Trained Part Based Models, PAMI 2010
10
Object Detection with ConvNets?
Convnets are computationally demanding. We can’t test all positions & scales !
Solution: Look at a tiny subset of positions. Choose them wisely :)
11
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
● Look for “blob-like” regions
Slide Credit: CS231n 12
Region Proposals
Selective Search (SS) Multiscale Combinatorial Grouping (MCG)
[SS] Uijlings et al. Selective search for object recognition. IJCV 2013
[MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 13
Object Detection with Convnets: R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
14
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train network on proposals
2. Post-hoc training of SVMs & Box regressors on fc7 features
15
R-CNN
16
We expect: We get:
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train network on proposals
2. Post-hoc training of SVMs & Box regressors on fc7 features
3. Non Maximum Suppression + score threshold
17
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
18
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of
CNN for each region proposal
2. SVMs and regressors are post-hoc: CNN features
not updated in response to SVMs and regressors
3. Complex multistage training pipeline
Slide Credit: CS231n 19
Fast R-CNN
Girshick Fast R-CNN. ICCV 2015
Solution: Share computation of convolutional layers between region proposals for an image
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
20
Fast R-CNN: Sharing features
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
Fully-connected layers expect
low-res conv features:
C x h x w
Slide Credit: CS231n 21Girshick Fast R-CNN. ICCV 2015
Fast R-CNN
Solution: Train it all at together E2E
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
22Girshick Fast R-CNN. ICCV 2015
Fast R-CNN: End-to-end training
23Girshick Fast R-CNN. ICCV 2015
Predicted class scores
True class scores
True box coordinates
Predicted box coordinates
Log loss
Smooth
L1 loss
Only for positive boxes
Fast R-CNN: Positive / Negative Samples
24Girshick Fast R-CNN. ICCV 2015
Positive samples are defined as those whose IoU overlap with a
ground-truth bounding box is > 0.5.
Negative examples are sampled from those that have a maximum
IoU overlap with ground truth in the interval [0.1, 0.5).
25%/75% ratio for positive/negative samples in a minibatch.
Fast R-CNN
Slide Credit: CS231n
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
mAP (VOC 2007) 66.0 66.9
Using VGG-16 CNN on Pascal VOC 2007 dataset
Faster!
FASTER!
Better!
25
Fast R-CNN: Problem
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Test time per image
with Selective Search
50 seconds 2 seconds
(Speedup) 1x 25x
Test-time speeds don’t include region proposals
26
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
27
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Learn proposals end-to-end sharing parameters with the classification network
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
Fast R-CNN
28
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Learn proposals end-to-end sharing parameters with the classification network
Region Proposal Network
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
29
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Region Proposal Network: Loss function
30
Predicted probability of being an object for anchor i
i = anchor index in minibatch
Coordinates of the predicted bounding box for anchor i
Ground truth objectness label
True box coordinates
Ncls
= Number of anchors in minibatch (~ 256)
Nreg
= Number of anchor locations ( ~ 2400)
Log loss
Smooth
L1 loss
In practice = 10, so that both terms
are roughly equally balanced
Region Proposal Network: Positive / Negative Samples
31
An anchor is labeled as positive if:
(a) the anchor is the one with highest IoU overlap with a ground-truth box
(b) the anchor has an IoU overlap with a ground-truth box higher than 0.7
Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth
boxes.
50%/50% ratio of positive/negative anchors in a minibatch.
Faster R-CNN: Training
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
32
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
RoI Pooling is not differentiable w.r.t box coordinates. Solutions:
● Alternate training
● Ignore gradient of classification branch w.r.t proposal coordinates
● Make pooling function differentiable
Faster R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fast R-CNN Faster R-CNN
Test time per
image
(with proposals)
50 seconds 2 seconds 0.2 seconds
(Speedup) 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
Slide Credit: CS231n 33
Faster R-CNN
34
● Faster R-CNN is the basis of the winners of COCO and
ILSVRC 2015 object detection competitions.
He et al. Deep residual learning for image recognition. CVPR 2016
YOLO: You Only Look Once
35Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 36
YOLO: You Only Look Once
37
Each cell predicts:
- For each bounding box:
- 4 coordinates (x, y, w, h)
- 1 confidence value
- Some number of class
probabilities
For Pascal VOC:
- 7x7 grid
- 2 bounding boxes / cell
- 20 classes
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
YOLO: Training
38Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth
bounding box is matched into the
right cell
YOLO: Training
39Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth
bounding box is matched into the
right cell
YOLO: Training
40Slide credit: YOLO Presentation @ CVPR 2016
Optimize class prediction in that
cell:
dog: 1, cat: 0, bike: 0, ...
YOLO: Training
41Slide credit: YOLO Presentation @ CVPR 2016
Predicted boxes for this cell
YOLO: Training
42Slide credit: YOLO Presentation @ CVPR 2016
Find the best one wrt ground
truth bounding box, optimize it
(i.e. adjust its coordinates to be
closer to the ground truth’s
coordinates)
YOLO: Training
43Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s
confidence, decrease
non-matched boxes confidence
YOLO: Training
44Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s
confidence, decrease
non-matched boxes confidence
YOLO: Training
45Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth
detections, confidences of all
predicted boxes are decreased
YOLO: Training
46Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth
detections:
● Confidences of all predicted
boxes are decreased
● Class probabilities are not
adjusted
YOLO: Training, formally
47Slide credit: YOLO Presentation @ CVPR 2016
Bounding box
coordinate
regression
Bounding box
score prediction
Class
score prediction
= 1 if box j and cell i are matched together, 0 otherwise
= 1 if box j and cell i are NOT matched together
= 1 if cell i has an object present
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 48
Dog
Bicycle Car
Dining Table
Predict class probability for each cell
(conditioned on object P(car | object) )
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 49
+ NMS
+ Score threshold
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 50
Same idea as YOLO, + several predictors at different stages in the network
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 51
Similarly to Faster R-CNN, it uses box anchors to predict box coordinates as displacements
YOLOv2
52
Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
YOLOv2
53
Results on Pascal VOC 2007
YOLOv2
54
Results on COCO test-dev 2015
Summary
55
Proposal-based methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● SPPnet
● R-FCN
Proposal-free methods
● YOLO, YOLOv2
● SSD
Questions ?
A note on NMS
57
Tradeoff between recall/precision
0.8
0.6
0.7
Objectness scores
Soft NMS
58
Bodla, Singh et al. Improving Object Detection With One Line of Code. arXiv Apr 2017
Decay detection
scores of contiguous
objects instead of
setting them to 0
Avoid NMS: Sequential box prediction
59
Stewart et al. End-To-End People Detection in Crowded Scenes. CVPR 2016
Predict boxes one after the other and learn when to stop
Ad

Recommended

Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Auro tripathy - Localizing with CNNs
Auro tripathy - Localizing with CNNs
Auro Tripathy
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
150807 Fast R-CNN
150807 Fast R-CNN
Junho Cho
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
Sungjoon Choi
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
Junho Cho
 
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Deep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
Universitat Politècnica de Catalunya
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann17
 
Detection
Detection
simplyinsimple
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
Junho Cho
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
#10 pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
Andrew Brozek
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
D3L4-objects.pdf
D3L4-objects.pdf
ssusere945ae
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 

More Related Content

What's hot (20)

150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
Junho Cho
 
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Deep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
Universitat Politècnica de Catalunya
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann17
 
Detection
Detection
simplyinsimple
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
Junho Cho
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
#10 pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
Andrew Brozek
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
Junho Cho
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Deep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann17
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
Junho Cho
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
#10 pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
Andrew Brozek
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 

Similar to Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017) (20)

D3L4-objects.pdf
D3L4-objects.pdf
ssusere945ae
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
object detection paper review
object detection paper review
Yoonho Na
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detection
Brodmann17
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
Amir Alush
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Improving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
Anchor free object detection by deep learning
Anchor free object detection by deep learning
Yu Huang
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Edge AI and Vision Alliance
 
Object Detection An Overview
Object Detection An Overview
ijtsrd
 
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
gaojinming318
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptx
usmanyaseen16
 
IISc Internship Report
IISc Internship Report
HarshilJain26
 
Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET Journal
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
object detection paper review
object detection paper review
Yoonho Na
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detection
Brodmann17
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
Amir Alush
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Improving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
Anchor free object detection by deep learning
Anchor free object detection by deep learning
Yu Huang
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Edge AI and Vision Alliance
 
Object Detection An Overview
Object Detection An Overview
ijtsrd
 
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
gaojinming318
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptx
usmanyaseen16
 
IISc Internship Report
IISc Internship Report
HarshilJain26
 
Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET Journal
 
Ad

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
Informatics Market Insights AI Workforce.pdf
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
Informatics Market Insights AI Workforce.pdf
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 

Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)

  • 1. Amaia Salvador [email protected] PhD Candidate Universitat Politècnica de Catalunya Object Detection Day 2 Lecture 5
  • 2. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image 2
  • 3. Object Detection: Datasets 3 20 categories 6k training images 6k validation images 10k test images 200 categories 456k training images 60k validation + test images 80 categories 200k training images 60k val + test images
  • 4. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 4
  • 5. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 5 Object Detection as Classification
  • 6. Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 6 Object Detection as Classification
  • 7. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 7 Object Detection as Classification
  • 8. Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it 8 Object Detection as Classification
  • 9. HOG: Histogram of Oriented Gradients Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. CVPR 2005 9
  • 10. Deformable Part Model Felzenszwalb et al, Object Detection with Discriminatively Trained Part Based Models, PAMI 2010 10
  • 11. Object Detection with ConvNets? Convnets are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 11
  • 12. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector ● Look for “blob-like” regions Slide Credit: CS231n 12
  • 13. Region Proposals Selective Search (SS) Multiscale Combinatorial Grouping (MCG) [SS] Uijlings et al. Selective search for object recognition. IJCV 2013 [MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 13
  • 14. Object Detection with Convnets: R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 14
  • 15. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 1. Train network on proposals 2. Post-hoc training of SVMs & Box regressors on fc7 features 15
  • 17. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 1. Train network on proposals 2. Post-hoc training of SVMs & Box regressors on fc7 features 3. Non Maximum Suppression + score threshold 17
  • 18. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 18
  • 19. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors 3. Complex multistage training pipeline Slide Credit: CS231n 19
  • 20. Fast R-CNN Girshick Fast R-CNN. ICCV 2015 Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 20
  • 21. Fast R-CNN: Sharing features Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 21Girshick Fast R-CNN. ICCV 2015
  • 22. Fast R-CNN Solution: Train it all at together E2E R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 22Girshick Fast R-CNN. ICCV 2015
  • 23. Fast R-CNN: End-to-end training 23Girshick Fast R-CNN. ICCV 2015 Predicted class scores True class scores True box coordinates Predicted box coordinates Log loss Smooth L1 loss Only for positive boxes
  • 24. Fast R-CNN: Positive / Negative Samples 24Girshick Fast R-CNN. ICCV 2015 Positive samples are defined as those whose IoU overlap with a ground-truth bounding box is > 0.5. Negative examples are sampled from those that have a maximum IoU overlap with ground truth in the interval [0.1, 0.5). 25%/75% ratio for positive/negative samples in a minibatch.
  • 25. Fast R-CNN Slide Credit: CS231n R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better! 25
  • 26. Fast R-CNN: Problem Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds don’t include region proposals 26
  • 27. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 27 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  • 28. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 28 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  • 29. Region Proposal Network Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 29 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  • 30. Region Proposal Network: Loss function 30 Predicted probability of being an object for anchor i i = anchor index in minibatch Coordinates of the predicted bounding box for anchor i Ground truth objectness label True box coordinates Ncls = Number of anchors in minibatch (~ 256) Nreg = Number of anchor locations ( ~ 2400) Log loss Smooth L1 loss In practice = 10, so that both terms are roughly equally balanced
  • 31. Region Proposal Network: Positive / Negative Samples 31 An anchor is labeled as positive if: (a) the anchor is the one with highest IoU overlap with a ground-truth box (b) the anchor has an IoU overlap with a ground-truth box higher than 0.7 Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth boxes. 50%/50% ratio of positive/negative anchors in a minibatch.
  • 32. Faster R-CNN: Training Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 32 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 RoI Pooling is not differentiable w.r.t box coordinates. Solutions: ● Alternate training ● Ignore gradient of classification branch w.r.t proposal coordinates ● Make pooling function differentiable
  • 33. Faster R-CNN Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 33
  • 34. Faster R-CNN 34 ● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015 object detection competitions. He et al. Deep residual learning for image recognition. CVPR 2016
  • 35. YOLO: You Only Look Once 35Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline
  • 36. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 36
  • 37. YOLO: You Only Look Once 37 Each cell predicts: - For each bounding box: - 4 coordinates (x, y, w, h) - 1 confidence value - Some number of class probabilities For Pascal VOC: - 7x7 grid - 2 bounding boxes / cell - 20 classes 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
  • 38. YOLO: Training 38Slide credit: YOLO Presentation @ CVPR 2016 For training, each ground truth bounding box is matched into the right cell
  • 39. YOLO: Training 39Slide credit: YOLO Presentation @ CVPR 2016 For training, each ground truth bounding box is matched into the right cell
  • 40. YOLO: Training 40Slide credit: YOLO Presentation @ CVPR 2016 Optimize class prediction in that cell: dog: 1, cat: 0, bike: 0, ...
  • 41. YOLO: Training 41Slide credit: YOLO Presentation @ CVPR 2016 Predicted boxes for this cell
  • 42. YOLO: Training 42Slide credit: YOLO Presentation @ CVPR 2016 Find the best one wrt ground truth bounding box, optimize it (i.e. adjust its coordinates to be closer to the ground truth’s coordinates)
  • 43. YOLO: Training 43Slide credit: YOLO Presentation @ CVPR 2016 Increase matched box’s confidence, decrease non-matched boxes confidence
  • 44. YOLO: Training 44Slide credit: YOLO Presentation @ CVPR 2016 Increase matched box’s confidence, decrease non-matched boxes confidence
  • 45. YOLO: Training 45Slide credit: YOLO Presentation @ CVPR 2016 For cells with no ground truth detections, confidences of all predicted boxes are decreased
  • 46. YOLO: Training 46Slide credit: YOLO Presentation @ CVPR 2016 For cells with no ground truth detections: ● Confidences of all predicted boxes are decreased ● Class probabilities are not adjusted
  • 47. YOLO: Training, formally 47Slide credit: YOLO Presentation @ CVPR 2016 Bounding box coordinate regression Bounding box score prediction Class score prediction = 1 if box j and cell i are matched together, 0 otherwise = 1 if box j and cell i are NOT matched together = 1 if cell i has an object present
  • 48. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 48 Dog Bicycle Car Dining Table Predict class probability for each cell (conditioned on object P(car | object) )
  • 49. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 49 + NMS + Score threshold
  • 50. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 50 Same idea as YOLO, + several predictors at different stages in the network
  • 51. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 51 Similarly to Faster R-CNN, it uses box anchors to predict box coordinates as displacements
  • 52. YOLOv2 52 Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
  • 54. YOLOv2 54 Results on COCO test-dev 2015
  • 55. Summary 55 Proposal-based methods ● R-CNN ● Fast R-CNN ● Faster R-CNN ● SPPnet ● R-FCN Proposal-free methods ● YOLO, YOLOv2 ● SSD
  • 57. A note on NMS 57 Tradeoff between recall/precision 0.8 0.6 0.7 Objectness scores
  • 58. Soft NMS 58 Bodla, Singh et al. Improving Object Detection With One Line of Code. arXiv Apr 2017 Decay detection scores of contiguous objects instead of setting them to 0
  • 59. Avoid NMS: Sequential box prediction 59 Stewart et al. End-To-End People Detection in Crowded Scenes. CVPR 2016 Predict boxes one after the other and learn when to stop