YOLO (You Only Look Once)
YOLO (You Only Look Once)
Object detection in YOLO is done as a regression problem and provides the class probabilities of the detected
image.
YOLO is important because it has three main things (Speed, High accuracy, and Learning capabilities)
Residual blocks: - Image will be divided into various grids. Each grid has a dimension of S*S.
Intersection over union: - Intersection over Union is an evaluation metric used to measure the accuracy of an
object detector on a particular dataset.
Difference between YOLOv1, YOLOv2 or 9000
YOLOv1: -
• V1 uses darknet frame for training image dataset.
• Darknet is an architecture, used as a neural network framework for training YOLO.
• YOLO v1 could not find small objects in an image as if they appeared in the cluster, is restricted.
• This architecture found difficulty in generalization of objects if the image is of other dimensions different from the
trained image.
• The major issue is localization of objects in the input image.
YOLO v2 or 9000: -
Batch normalization: -
• Batch normalization decreases the shift in unit value in the hidden layer and by doing so it improves the stability
of the neural network.
• Its precision has been increased by 2%. Also helps model to regularize and overfitting has been reduced overall.
Higher resolution classifier: -
The input size in YOLO v2 has been increased from 224*224 to 448*448, increase in the precision up to 4%.
Anchor boxes: -
• Most notable changes is visible in YOLOv2, introduction anchor boxes.
• V2 does classification and prediction on single frame work.
• Anchor box is responsible for predicting bounding box.
Darknet architecture: -
• YOLOv2 uses darknet 19 architecture, that has 19 convolution layers, 5 max pooling layers and a softmax
layer for classification of objects.
• YOLO v2 is better, faster, and stronger.
• YOLO v2 has seen a great improvement in detecting smaller objects with much more accuracy which it
lacked in its previous version.