YOLO
YOLO
YOLO
design
loss function
training
weaknesses
Classification vs detection/recognition
Common tasks on images
https://medium.com/@nikasa1889/the-modern-history-of-object-recognition-infographic-aea18517c318
Bounding box proposal
Region of interest, region proposal, box proposal
Ground truth
5 parameters
w, h
x, y
confidence score: how likely it
contains an object & accuracy
of the box
How good: Intersection over Union (IOU)
1:
Outlines
YOLO
design
loss function
training
weaknesses
A brief history of object detection
https://stats385.github.io
A brief history of object detection
ImageNet
A brief history of object detection
YOLO
design
loss function
training
weaknesses
YOLO: you look only once
Results
x, y, w, h
confidence
Look once score:
contain an object &
box accuracy
class score:
belong to a class
Solution
Oh, no math please. Let's speak human language
Problem 1:
One object is
partially/fully
covered by
several boxes.
Problem 2: 0.5
Most boxes has
no objects.
Human language
Problem 3:
Multi-task training
problem: location
& class. Weighted sum: here the problem is left untouched.
Human language
sqrt
Problem 4:
Small objects need
more accurate
location & box size.
Other problems
x, y can be out of the grid cell
smaller objects can locate
worse than the largers
Anchor-free detection is a research topic, see https://arxiv.org/abs/1904.01355 for an instance. anchors used in YOLO v2
Improvements (in v2)
Resizing image sizes randomly during training: {320, 352, ..., 608}
CNN only reduce an image by a constant factor (here 32), hence is robust to input image size
resize every 10 epochs.
multi-scale training
vs
Feature map
Training
ImageNet: COCO/PASCAL VOC:
classification dataset detection dataset
YOLO
Step 1: Step 2 (transfer learning):
train classification backbone remove head layers
add regression as new head
fine-tune backbone & train head
Training tricks
decaying learning rate
batch normalization
data augmentation
Performance
Generalizability
arXiv: 1807.05511