HOG (Histogram of Oriented Gradients) : A HOG Is A
HOG (Histogram of Oriented Gradients) : A HOG Is A
for object detection. HOGs are widely known for their use in pedestrian detection. A
HOG relies on the property of objects within an image to possess the distribution of
intensity gradients or edge directions. Gradients are calculated within an image per block.
A block is considered as a pixel grid in which gradients are constituted from the
magnitude and direction of change in the intensities of the pixel within the block.
In the current example, Gx and Gy are respectively the horizontal and vertical
components of the change in the pixel intensity. A window size of 128 x 144 is used for
face images since it matches the general aspect ratio of human faces. The descriptors are
calculated over blocks of pixels with 8 x 8 dimensions. These descriptor values for each
pixel over 8 x 8 block are quantized into 9 bins, where each bin represents a directional
angle of gradient and value in that bin, which is the summation of the magnitudes of all
pixels with the same angle. Further, the histogram is then normalized over a 16 x 16
block size, which means four blocks of 8 x 8 are normalized together to minimize light
conditions. This mechanism mitigates the accuracy drop due to a change in light. The
SVM model is trained using a number of HOG vectors for multiple faces.
HOW IT WORK ?
STEP 1: Preprocessing
As mentioned earlier HOG feature descriptor used for pedestrian detection is calculated
on a 64×128 patch of an image. As mentioned earlier HOG feature descriptor used for
pedestrian detection is calculated on a 64×128 patch of an image. Of course, an image
may be of any size. Typically patches at multiple scales are analyzed at many image
locations. The only constraint is that the patches being analyzed have a fixed aspect ratio.
To illustrate this point I have shown a large image of size 720×475
To calculate the final feature vector for the entire image patch, the 36×1 vectors are
concatenated into one giant vector. What is the size of this vector ? Let us calculate
a. How many positions of the 16×16 blocks do we have ? There are 7 horizontal and
15 vertical positions making a total of 7 x 15 = 105 positions.
b. Each 16×16 block is represented by a 36×1 vector. So when we concatenate them
all into one gaint vector we obtain a 36×105 = 3780 dimensional vector.
The HOG descriptor of an image patch is usually visualized by plotting the 9×1
normalized histograms in the 8×8 cells. See image on the side. You will notice that
dominant direction of the histogram captures the shape of the person, especially
around the torso and legs.