Dlcv2017d2l4objectdetection 170622143747
Dlcv2017d2l4objectdetection 170622143747
Day 2 Lecture 4
Object Detection
Amaia Salvador
[email protected]
PhD Candidate
Universitat Politècnica de Catalunya
[course site]
Object Detection
2
Object Detection: Datasets
3
Object Detection as Classification
Cat ? NO
Dog ? NO
Duck? NO
4
Object Detection as Classification
Cat ? NO
Dog ? NO
Duck? NO
5
Object Detection as Classification
Cat ? YES
Dog ? NO
Duck? NO
6
Object Detection as Classification
Cat ? NO
Dog ? NO
Duck? NO
7
Object Detection as Classification
Problem:
Too many positions & scales to test
Convnets are computationally demanding. We can’t test all positions & scales !
[SS] Uijlings et al. Selective search for object recognition. IJCV 2013
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
12
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
13
R-CNN
We expect: We get:
14
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
15
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
16
R-CNN: Problems
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
Solution: Share computation of convolutional layers between region proposals for an image
Max-pool within
Convolution each grid cell
Fully-connected
and Pooling layers
Hi-res input image: Hi-res conv features: RoI conv features: Fully-connected layers expect
3 x 800 x 600 CxHxW Cxhxw low-res conv features:
with region with region proposal for region proposal Cxhxw
proposal
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
(Speedup) 1x 146x
(Speedup) 1x 25x
RPN Proposals
Region Proposal Network
layers
Conv Conv5_3
FC6
FC7
FC8
RoI
Pooling Class probabilities
RPN Proposals
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
23
Faster R-CNN
Learn proposals end-to-end sharing parameters with the classification network
RPN Proposals
Region Proposal Network
layers
Conv Conv5_3
FC6
FC7
FC8
RoI
Pooling Class probabilities
RPN Proposals
Fast R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
24
Region Proposal Network
RPN Proposals
Conv5_3
FC6
FC7
FC8
RoI
Pooling Class probabilities
RPN Proposals
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
26
Faster R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Slide Credit: CS231n 27
Faster R-CNN
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 29
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30
YOLO: You Only Look Once
Each cell predicts:
- 7x7 grid
- 2 bounding boxes / cell
- 20 classes
31
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
YOLO: You Only Look Once
Predict class probability for each cell
Bicycle Car
Dog
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Dining Table 32
YOLO: You Only Look Once
+ NMS
+ Score threshold
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33
SSD: Single Shot MultiBox Detector
Same idea as YOLO, + several predictors at different stages in the network
36
37
Proposal-based methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● SPPnet
● R-FCN
Proposal-free methods
● YOLO, YOLOv2
● SSD
38
Resources
● Official implementations:
○ Faster R-CNN [caffe]
○ Yolov2 [darknet]
○ SSD [caffe]
○ R-FCN [caffe][MxNet]
Bounding box
coordinate
regression
Class
score prediction
Slide credit: YOLO Presentation @ CVPR 2016 = 1 if cell i has an object present 50