0% found this document useful (0 votes)
37 views

Dlcv2017d2l4objectdetection 170622143747

This document summarizes an object detection lecture. It discusses object detection tasks like assigning labels and bounding boxes to objects in images. Popular object detection datasets with varying numbers of categories and images are presented. Object detection is described as classifying objects at different positions and scales, which is computationally expensive using convolutional networks. Region proposal methods are introduced to select potential object regions efficiently before classification. The R-CNN, Fast R-CNN, and Faster R-CNN models are summarized, which improved object detection speed and performance by sharing convolutional features between proposals and learning region proposals end-to-end.

Uploaded by

SriramGudimella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Dlcv2017d2l4objectdetection 170622143747

This document summarizes an object detection lecture. It discusses object detection tasks like assigning labels and bounding boxes to objects in images. Popular object detection datasets with varying numbers of categories and images are presented. Object detection is described as classifying objects at different positions and scales, which is computationally expensive using convolutional networks. Region proposal methods are introduced to select potential object regions efficiently before classification. The R-CNN, Fast R-CNN, and Faster R-CNN models are summarized, which improved object detection speed and performance by sharing convolutional features between proposals and learning region proposals end-to-end.

Uploaded by

SriramGudimella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

#DLUPC

Day 2 Lecture 4
Object Detection

Amaia Salvador
[email protected]

PhD Candidate
Universitat Politècnica de Catalunya

[course site]
Object Detection

The task of assigning a


label and a bounding box
to all objects in the image

CAT, DOG, DUCK

2
Object Detection: Datasets

20 categories 80 categories 200 categories


6k training images 200k training images 456k training images
6k validation images 60k val + test images 60k validation + test images
10k test images

3
Object Detection as Classification

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

4
Object Detection as Classification

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

5
Object Detection as Classification

Classes = [cat, dog, duck]

Cat ? YES

Dog ? NO

Duck? NO

6
Object Detection as Classification

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

7
Object Detection as Classification

Problem:
Too many positions & scales to test

Solution: If your classifier is fast enough, go for it


8
Object Detection with ConvNets?

Convnets are computationally demanding. We can’t test all positions & scales !

Solution: Look at a tiny subset of positions. Choose them wisely :)


9
Region Proposals

● Find “blobby” image regions that are likely to contain objects


● “Class-agnostic” object detector
● Look for “blob-like” regions

Slide Credit: CS231n 10


Region Proposals

Selective Search (SS) Multiscale Combinatorial Grouping (MCG)

[SS] Uijlings et al. Selective search for object recognition. IJCV 2013

[MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 11


Object Detection with Convnets: R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

12
R-CNN

1. Train network on proposals

2. Post-hoc training of SVMs & Box regressors on fc7 features

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

13
R-CNN

We expect: We get:

14
R-CNN

1. Train network on proposals

2. Post-hoc training of SVMs & Box regressors on fc7 features

3. Non Maximum Suppression + score threshold

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

15
R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

16
R-CNN: Problems

1. Slow at test-time: need to run full forward pass of


CNN for each region proposal

2. SVMs and regressors are post-hoc: CNN features


not updated in response to SVMs and regressors

3. Complex multistage training pipeline

Slide Credit: CS231n 17


Fast R-CNN

R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal

Solution: Share computation of convolutional layers between region proposals for an image

Girshick Fast R-CNN. ICCV 2015 18


Fast R-CNN: Sharing features

Max-pool within
Convolution each grid cell
Fully-connected
and Pooling layers

Hi-res input image: Hi-res conv features: RoI conv features: Fully-connected layers expect
3 x 800 x 600 CxHxW Cxhxw low-res conv features:
with region with region proposal for region proposal Cxhxw
proposal

Girshick Fast R-CNN. ICCV 2015 Slide Credit: CS231n 19


Fast R-CNN

R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.

Solution: Train it all at together E2E

Girshick Fast R-CNN. ICCV 2015 20


Fast R-CNN

R-CNN Fast R-CNN

Training Time: 84 hours 9.5 hours


Faster!
(Speedup) 1x 8.8x

Test time per image 47 seconds 0.32 seconds


FASTER!
(Speedup) 1x 146x

Better! mAP (VOC 2007) 66.0 66.9

Using VGG-16 CNN on Pascal VOC 2007 dataset

Slide Credit: CS231n 21


Fast R-CNN: Problem

Test-time speeds don’t include region proposals

R-CNN Fast R-CNN

Test time per image 47 seconds 0.32 seconds

(Speedup) 1x 146x

Test time per image


50 seconds 2 seconds
with Selective Search

(Speedup) 1x 25x

Slide Credit: CS231n 22


Faster R-CNN
Learn proposals end-to-end sharing parameters with the classification network

RPN Proposals
Region Proposal Network
layers
Conv Conv5_3

FC6

FC7

FC8
RoI
Pooling Class probabilities

RPN Proposals

Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
23
Faster R-CNN
Learn proposals end-to-end sharing parameters with the classification network

RPN Proposals
Region Proposal Network
layers
Conv Conv5_3

FC6

FC7

FC8
RoI
Pooling Class probabilities

RPN Proposals

Fast R-CNN

Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
24
Region Proposal Network

Bounding Box Regression


Objectness scores
(object/no object)

In practice, k = 9 (3 different scales and 3 aspect ratios)


Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
25
Faster R-CNN: Training
RoI Pooling is not differentiable w.r.t box coordinates. Solutions:
● Alternate training
● Ignore gradient of classification branch w.r.t proposal coordinates
● Make pooling function differentiable (spoiler D3L6)

RPN Proposals

Region Proposal Network


layers
Conv

Conv5_3

FC6

FC7

FC8
RoI
Pooling Class probabilities

RPN Proposals

Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
26
Faster R-CNN

R-CNN Fast R-CNN Faster R-CNN

Test time per 50 seconds 2 seconds 0.2 seconds


image
(with proposals)

(Speedup) 1x 25x 250x

mAP (VOC 2007) 66.0 66.9 66.9

Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Slide Credit: CS231n 27
Faster R-CNN

● Faster R-CNN is the basis of the winners of COCO and


ILSVRC 2015&2016 object detection competitions.

He et al. Deep residual learning for image recognition. CVPR 2016


28
YOLO: You Only Look Once

Proposal-free object detection pipeline

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 29
YOLO: You Only Look Once

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30
YOLO: You Only Look Once
Each cell predicts:

- For each bounding box:


- 4 coordinates (x, y, w, h)
- 1 confidence value
- Some number of class
probabilities

For Pascal VOC:

- 7x7 grid
- 2 bounding boxes / cell
- 20 classes
31
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
YOLO: You Only Look Once
Predict class probability for each cell

Bicycle Car

Dog

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Dining Table 32
YOLO: You Only Look Once

+ NMS
+ Score threshold

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33
SSD: Single Shot MultiBox Detector
Same idea as YOLO, + several predictors at different stages in the network

Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 34


YOLOv2

Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017


35
YOLOv2

36

Results on Pascal VOC 2007


YOLOv2

37

Results on COCO test-dev 2015


Summary

Proposal-based methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● SPPnet
● R-FCN
Proposal-free methods
● YOLO, YOLOv2
● SSD
38
Resources
● Official implementations:
○ Faster R-CNN [caffe]
○ Yolov2 [darknet]
○ SSD [caffe]
○ R-FCN [caffe][MxNet]

● Unofficial ports to other frameworks are likely to exist… eg type “yolo


tensorflow” in your browser and pick the one you like best.
● Or… use the newly released Object detection API by Google: SSD, R-FCN &
Faster R-CNN (code & pretrained models in tensorflow)

Object detection tutorials (project ideas maybe?): 39


● Toy object detection (squares, circles, etc.) (keras)
● Object detection (pets dataset) (tensorflow)
Questions?
YOLO: Training

For training, each ground truth


bounding box is matched into the
right cell

Slide credit: YOLO Presentation @ CVPR 2016 41


YOLO: Training

For training, each ground truth


bounding box is matched into the
right cell

Slide credit: YOLO Presentation @ CVPR 2016 42


YOLO: Training

Optimize class prediction in that


cell:
dog: 1, cat: 0, bike: 0, ...

Slide credit: YOLO Presentation @ CVPR 2016 43


YOLO: Training

Predicted boxes for this cell

Slide credit: YOLO Presentation @ CVPR 2016 44


YOLO: Training

Find the best one wrt ground


truth bounding box, optimize it
(i.e. adjust its coordinates to be
closer to the ground truth’s
coordinates)

Slide credit: YOLO Presentation @ CVPR 2016 45


YOLO: Training

Increase matched box’s


confidence, decrease
non-matched boxes confidence

Slide credit: YOLO Presentation @ CVPR 2016 46


YOLO: Training

Increase matched box’s


confidence, decrease
non-matched boxes confidence

Slide credit: YOLO Presentation @ CVPR 2016 47


YOLO: Training

For cells with no ground truth


detections, confidences of all
predicted boxes are decreased

Slide credit: YOLO Presentation @ CVPR 2016 48


YOLO: Training

For cells with no ground truth


detections:
● Confidences of all predicted
boxes are decreased
● Class probabilities are not
adjusted

Slide credit: YOLO Presentation @ CVPR 2016 49


YOLO: Training, formally
= 1 if box j and cell i are matched together, 0 otherwise

Bounding box
coordinate
regression

Bounding box = 1 if box j and cell i are NOT matched together


score prediction

Class
score prediction

Slide credit: YOLO Presentation @ CVPR 2016 = 1 if cell i has an object present 50

You might also like