50% found this document useful (2 votes)
309 views

YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems

YOLO (You Only Look Once) is a state-of-the-art real-time system for object detection using deep learning. It divides an image into grids and predicts bounding boxes and probabilities for objects in each grid. This is done in one pass of the network, allowing it to process images and perform object detection very quickly. YOLO trains on full images simultaneously predicting multiple bounding boxes and classes. This allows its predictions to be informed by global context compared to previous methods that performed object detection as a multi-step pipeline.

Uploaded by

pradeep_dhote9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
309 views

YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems

YOLO (You Only Look Once) is a state-of-the-art real-time system for object detection using deep learning. It divides an image into grids and predicts bounding boxes and probabilities for objects in each grid. This is done in one pass of the network, allowing it to process images and perform object detection very quickly. YOLO trains on full images simultaneously predicting multiple bounding boxes and classes. This allows its predictions to be informed by global context compared to previous methods that performed object detection as a multi-step pipeline.

Uploaded by

pradeep_dhote9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

YOLO (You only look once)

YOLO is the state-of-the-art, real time system built on


deep learning for solving object detection problems.

@ml.india
YOLO Architecture:

Hit to support! Swipe.


As show in the first image on the previous slide, the
algorithm first divides the image into defined
bounding boxes, then runs a recognition algorithm in
parallel for all of these to identify which object class
they belong to, lastly it goes on to merging these
boxes intelligently to form optimal bounding boxes
around the objects.

@ml.india

Hit to support! Save for later!


YOLO looks at the whole image at test time so its
predictions are informed by global context in the image.

Previous methods, like R-CNN and its variations, used a

@ml.india
pipeline to perform this task in multiple steps. This can
be slow to run and also hard to optimize, because each
individual component must be trained separately. YOLO,
does it all with a single neural network.

Hit to support! Save for later!


The training:
The authors describe the training in the following way:

@ml.india
First, pretrain the first 20 convolutional layers using the
ImageNet 1000-class competition dataset, using a input
size of 224x224.
Then, increase the input resolution to 448x448.
Train the full network for about 135 epochs using a batch
size of 64, momentum of 0.9 and decay of 0.0005.
Learning rate schedule: for the first epochs, the learning
rate was slowly raised from 0.001 to 0.01. Train for about
75 epochs and then start decreasing it.
Use data augmentation with random scaling and
translations, and randomly adjusting exposure and
saturation.

Hit to support! Save for later!


The loss:
YOLO predicts multiple bounding boxes per grid cell. To

@ml.india
compute the loss for the true positive, we only want one of
them to be responsible for the object. For this purpose, we
select the one with the highest IoU (intersection over union)
with the ground truth.

Each prediction gets better at predicting certain sizes and


aspect ratios. YOLO uses sum-squared error between the
predictions and the ground truth to calculate loss. The loss
function composes of:

the classification loss.


the localization loss (errors between the predicted
boundary box and the ground truth).
the confidence loss (the objectness of the box).

Hit to support! Save for later!


The loss function:
Localization loss.

Confidence loss. @ml.india

Classification loss.

Overwhelmed? Don't be! Check this amazing blog for the


mathematical details: https://medium.com/@jonathan_hui/real-
time-object-detection-with-yolo-yolov2-28b1b93e2088

Hit to support! Save for later!


The original YOLO paper: You Only Look Once: Unified, Real-Time
Object Detection by Joseph Redmon, Santosh Divvala, Ross
Girshick and Ali Farhadi (2015).
Link: https://pjreddie.com/media/files/papers/yolo.pdf

@ml.india

Hit to support! Save for later!


There have been many improvements proposed since
then, that were combined in the newer YOLOv2 and
YOLOv3. Will talk about them in some another time.

Link to the code is available on YOLO's official site:


https://pjreddie.com/darknet/yolo/

Sources and references:


https://hackernoon.com/understanding-yolo-
f5a74bbc7967
https://pjreddie.com/darknet/yolo/

@ml.india
https://medium.com/@jonathan_hui/real-time-
object-detection-with-yolo-yolov2-28b1b93e2088
https://arxiv.org/pdf/1506.02640.pdf

Hit to support! Save for later!

You might also like