Color Based Object Tracking With OpenCV A Survey

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39964.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/39964/color-based-object-tracking-with-opencv--a-survey/vatsal-bambhania

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views5 pages

Color Based Object Tracking With OpenCV A Survey

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 5 Issue 3, March-April 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

Color Based Object Tracking with OpenCV - A Survey

Vatsal Bambhania1, Harshad P Patel2
1Student, 2Lecturer,

1Computer Engineering Department, L.J. Institute Engineering and Technology, Ahmedabad, Gujarat, India
2Instrumentation & Control Engineering Department, Government Polytechnic, Ahmedabad, Gujarat, India

ABSTRACT How to cite this paper: Vatsal Bambhania

Object tracking is a rapidly growing field in machine learning. Object tracking | Harshad P Patel "Color Based Object
is exactly what name suggests, to keep tracking of an object. This method has Tracking with OpenCV - A Survey"
all sorts of application in wide range of fields like military, household, traffic Published in
cameras, industries, etc. There are certain algorithms for the object tracking International Journal
but the easiest one is color-based object detection. This is a color-based of Trend in Scientific
algorithm for object tracking supported very well in OpenCV library. OpenCV Research and
is an library popular among python developers, those who are interested in Development (ijtsrd),
Computer vision. It is an open source library and hence anyone can use and ISSN: 2456-6470,
modify it without any restrictions and licensing. The Color-based method of Volume-5 | Issue-3, IJTSRD39964
object tracking is fully supported by OpenCV's vast varieties of functions. April 2021, pp.802-
There is little bit of simple math and an excellent logic behind this method of 806, URL:
object tracking. But in simple language the target object is identified from and www.ijtsrd.com/papers/ijtsrd39964.pdf
image given explicitly by user or some area selected from frame of video, and
algorithm continuously search for that object from each frame in video and Copyright © 2021 by author (s) and
highlights the best match for every frame. But like every algorithm it also has International Journal of Trend in Scientific
some pros and cons which are discussed here. Research and Development Journal. This
is an Open Access article distributed
under the terms of
the Creative
Commons Attribution
License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)

1. INTRODUCTION
1.1. OpenCV
OpenCV (Open Source Computer Vision Library) is an open running interactive art in Spain and New York, checking
source computer vision and machine learning software runways for debris in Turkey, inspecting labels on products
library. OpenCV is built to provide a common infrastructure in factories around the world on to rapid face detection in
for computer vision applications and to accelerate the use of Japan. Thus there are tons of uses, as some mentioned above,
machine perception in the commercial products. Being a of Object detection technique.
BSD-licensed product, OpenCV makes it easy for businesses
It has C++, Python, Java and MATLAB interfaces and
to utilize and modify the code.
supports Windows, Linux, Android and Mac OS. A full-
The library comes handy with more than 2500 optimized featured CUDA and OpenCL interfaces are being actively
algorithms, which includes a comprehensive set of both used. OpenCV is written natively in C++ and has a template
classic and state-of-the-art computer vision and machine interface that works seamlessly with STL containers.
learning algorithms. These algorithms can be used to detect
1.2. Object Recognition
and recognize faces, identify objects, classify human actions
Object recognition is the area of artificial intelligence (AI)
in videos, track camera movements, track moving objects,
concerned with the abilities of AI and robots
extract 3D models of objects, produce 3D point clouds from
implementations to recognize various things and entities.
stereo cameras, stitch images together to produce a high
Object recognition allows AI programs and robots to pick out
resolution image of an entire scene, find similar images from
and identify objects from inputs like video and still camera
an image database, remove red eyes from images taken using
images. Methods used for object identification include 3D
flash, follow eye movements, recognize scenery and establish
models, component identification, edge detection and
markers to overlay it with augmented reality, etc.
analysis of appearances from different angles.
Along with well-established companies like Google, Yahoo,
Object recognition is at the convergence points of robotics,
Microsoft, Intel, IBM, Sony, Honda, Toyota that employ the
machine vision, neural networks and AI. Google and
library, there are many startups such as Applied Minds,
Microsoft are among the companies working in this area, e.g.
Video Surf, and Zeitera, that make extensive use of OpenCV.
Google’s driverless car and Microsoft’s Kinect system both
OpenCV’s deployed uses span the range from stitching street
use object recognition. Robots that understand their
view images together, detecting intrusions in surveillance
environments can perform more complex tasks better. Major
video in Israel, monitoring mine equipment in China,
advances of object recognition stand to revolutionize AI and
detection of swimming pool drowning accidents in Europe,
robotics. MIT has created neural networks, based on our

@ IJTSRD | Unique Paper ID – IJTSRD39964 | Volume – 5 | Issue – 3 | March-April 2021 Page 802
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
understanding of how the brain works, that allow software 2.2.2. How Back Projection works
to identify objects almost as quickly as primates do. In order to understand the working of Back Projection let's
Gathered visual data from cloud robotics can allow multiple take an example of detecting the skin. Suppose we have skin
robots to learn tasks associated with object recognition histogram(Hue-Saturation) as shown in image below in
faster. Robots can also reference massive databases of figure-2. The Histogram below is the histogram model of the
known objects and that knowledge can be shared among all skin ( which we know represents the sample of skin), also
connected robots. we had applied some mask to capture histogram of skin only.
But all of these needed to be started at some point, and that
is where the OpenCV comes in play. Thus OpenCV is the tool
that is used to accomplish the task of object detection.
2. Mathematics of the Object Detection
2.1. Histograms
Before getting into the mathematics behind the object
detection first we need to understand how an image is
represented in the computer. An image is nothing but the
array of number representing some values, specifically the Skin Image Histogram Model
value of color that is to be displayed for each pixel. The color Figure 2
is represented in any form. It can be RGB, HSV, Gray Scale,
etc. A histogram is a graphical display of data using bars of Now to test other sample of skin like one shown below in
different heights. In it, each bar group is partitioned by figure-3.
specific ranges. Taller bars show that more data falls in that
range. A figure-1 below shows histogram. It displays the
shape and spread of continuous sample data.

Skin Image Histogram Model

Figure 3
1. In each pixel of our Test Image (i.e. p(i,j) ), collect the
data and find the correspondent bin location for that
pixel (i.e. ( h(i,j) , s(i,j) ) ).
2. Lookup the model histogram in the correspondent bin
(bar of corresponding color) e.g ( h(i,j) , s(i,j) ), and read
the bin value.
3. Store this bin value in a new image (Back Projection).
Also, you may consider to normalize the model
Figure 1 Histogram Model histogram first, so the output for the Test Image can be
visible for you.
Histogram or Histogram model is a plot of the frequency
4. Applying the steps above, we get the Back Projection
against their matching reflectance values. For images the
image for our test image as shown in figure-4.
values is the value of pixel color in the image. As said above
image is the histogram of in image. Y-axis represents the
frequency of particular color and X-axis represents
corresponding color. This color range contains all the colors
those are in the image. To make the histogram model from
the source image the OpenCV contains the inbuilt function
cv2.calcHist().
2.2. Back Projection on and Image
2.2.1. What is Back Projection
First of all if we want to detect the object from histogram the
Figure 4 Back Projection Test Image
best approach is to use Back Projection as it quick, requires
less computational power and its working can be easily 2.3. MeanShift
understood. Thus the Back Projection is first step for object A MeanShift is an algorithm, a non-parametric algorithm, use
tracking. to analyze the feature space to find the maxima of density
function. The intuition behind the MeanShift is simple.
Back Projection is a way of recording how well the pixels of a
Consider you have a set of points. (It can be a pixel
given image fit the distribution of pixels in a histogram
distribution like histogram Back Projection). You are given a
model. To make it simpler, for Back Projection, we calculate
small window (may be a circle) and you have to move that
the histogram model of a feature and then use it to find this
window to the area of maximum pixel density (or maximum
feature in an image. Application example: If you have a
number of points). It is illustrated in the simple image given
histogram of flash color (say, a Hue-Saturation histogram),
below in figure-6.
then you can use it to find flash color areas in an image.

@ IJTSRD | Unique Paper ID – IJTSRD39964 | Volume – 5 | Issue – 3 | March-April 2021 Page 803
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Again it applies the MeanShift with new scaled search
window and previous window location. The process
continues until the required accuracy is met. Example of
CamShift is as shown below in figure-7.

Figure 6 MeanShift Sample Image

The initial window is shown in blue circle with the name
"C1". Its original center is marked in blue rectangle, named
"C1_o". But if you find the centroid of the points inside that
window, you will get the point "C1_r" (marked in small blue
circle) which is the real centroid of the window. Surely they
don't match. So move your window such that the circle of the
new window matches with the previous centroid. Again find
the new centroid. Most probably, it won't match. So move it
again, and continue the iterations such that the center of
window and its centroid falls on the same location (or within
a small desired error). So finally what you obtain is a
window with maximum pixel distribution. It is marked with
a green circle, named "C2". So we normally pass the
histogram back projected image and initial target location.
The target location is specified manually, hard coded
coordinates, or selected from the video frame. When the
object moves, obviously the movement is reflected in the
histogram back projected image. As a result, the MeanShift
algorithm moves our window to the new location with
maximum density.
Meanshift using OpenCV library:
###
_, frame = cap.read()
hsvFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
bp = cv2.calcBackProject([hsvFrame], [0], roi, [0, 180], 1)
_, object_tracker = cv2.meanShift(bp, object_tracker, termn) Figure 7 CamShift Sample Image
x, y, width, height = object_tracker
Continiously Adatpive Meanshift(CAM) using OpenCV
result = cv2.rectangle(frame, (x, y), (x + width, y + height), library:
255, 2)
###
cv2.imshow('MeanShift Demo', result)
_, frame = cap.read()
###
hsvFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
But even with this hectic math there comes a flaw with this bp = cv2.calcBackProject([hsvFrame], [0], roi, [0, 180], 1)
method, that is the size of the window is constant hard coded
and it won't change with the change in the size of target _, object_tracker = cv2.CamShift(bp, object_tracker, termn)
object. x, y, width, height = object_tracker
2.4. CamShift result = cv2.rectangle(frame, (x, y), (x + width, y + height),
As mentioned the flaw in MeanShift above, CamShift is 255, 2)
similar algorithm to MeanShift but aimed to overcome the cv2.imshow('CamShift Demo', result)
flaw of MeanShift. In the algorithm the size of window is ###
initially defined but as video continues the window size also
changes in response to change in size of target object. About the methods in above snippet:
calcBackProject:
In first step of CamShift it applied the MeanShift to calculate This is a build-in method in opencv that calculates the back
the density and find displacement of object from it. Once the projection of our roi to current frame. In other words, this
MeanShift converges, it updates the size of window. It also method will calculate how well our roi (region of interest)
calculates the orientation of the best fitting ellipse to it. fits with pixel distribution of image.

@ IJTSRD | Unique Paper ID – IJTSRD39964 | Volume – 5 | Issue – 3 | March-April 2021 Page 804
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
inRange: 3. If our target object goes out of frame than the method
This method is use to filter image on using a given range of won't understand that there is no target object in frame,
color. Also it a build-in method of openCV library. instead it will assume the other object whose histogram
model matches the best with target histogram model as
3. Applying the about Math to track Object
a target object.
3.1. How the Object tracking works at Backend
4. Compare to deep learning algorithms our algorithm
Now as we had seen the math behind the object tracking,
cannot understand context from past experience.
now it's time to apply it on video to track an object. The
video that will be given as input will be analyses frame by 5. Scope of improvement
frame. Thus we can control the speed of video by controlling Now due to some requirement or just for curiosity, if we had
the frames per second (fps). to use the color based tracking than there are some
techniques, tricks and tips that can be applied to get the
In order to detect an object following steps has to be
better results.
performed.
1. First of all the target object is detected from the video Other than Back Projection there are other methods to scan
2. Now this detected object is tracked frame by frame as the pixel pattern of target object in frame of video. We all
video goes on know that in any situation if the noise is reduced from any
sample than we can analyze if better. Same concept is
First of all we need to detect the target object which we want
applied here. That is if we remove the noise from the frame
to keep track of in video, and in order to detect an object we
than we can analyze and detect the object better. Here this
need to have an image of that target object. This image we
noise is low light image that is the bad image quality due to
will use to create the histogram model of target object. After
the low or insufficient light, in other words we are giving the
creating the histogram model of target image we feed that
threshold value to every pixel in order to accept it in input.
model to the Back Projection algorithm. Thus as a result we
get a box's starting coordinate in 2-D space of image and it's Above concept can be implemented using an in Build method
width and height. This box is showing the target object in from OpenCV library known as in Range function. Basic idea
present frame of video. behind this function is to filter out the range of light from the
frame. As an input we gives an image on which we want to
Once we get our target object coordinate in starting frame,
apply the threshold, and then the range of color that we want
than we just need to keep track of it as video continues to
to accept from image. Here the range is given in HSV format.
play frame by frame. In order to achieve that first we
In our case we will apply the range of color that we want to
calculate the back projection on the present video frame
from image to be accepted. For every frame first we will
with the histogram model of target object. It will give us back
apply the inRange method and filter out the pixel with low
projected image of frame. If we output the back projected
light which can potentially be recognized as noise. After this
image than we'll see the whole image is black except for the
we'll apply the back projection with histogram model of
target object which are white. Further we give that image as
target object.
input to MeanShift or CamShift algorithm. If we give input to
MeanShift and it will return new starting coordinate of the So by this small trick we can see pretty good increase in
box. It will show our target object in that video frame with its accuracy of object detection and tracking but with only slight
width and height value. The value of width and height of box increase in the processing needs. HSV Color range can be
it returns every time, will be same as one that we has given found on online.
as input while detecting the target object in first step. This
Another way to increase the accuracy is very obvious way to
process will repeat till there are more frames in video.
do so. There is a termination criterion which is required
After calculating MeanShift after every iteration we need to input for back projection, which defines the criteria to stop
show the target object, which can be achieved be the algorithm and give the result. In these criteria we need to
highlighting the target object with rectangle, and the give epsilon value and max iteration, thus tweaking this two
coordinate of rectangle the one which we get as output from parameters we can improve the accuracy without much
the MeanShift or CamShift algorithm. increase in processing needs.
4. Pros and Cons of Color based Object Tracking 6. Conclusion
Pros Thus from the above discussion we can conclude that,
1. This technique is based on color and not on features of OpenCV is easy to understand and quit powerful tool. Here
target object so it require relatively very less we had discussed about the methods for tracking an object in
computational power for detecting object from every any video. To recap first we make histogram for the target
frame. object than for small portions in every frame in video we'll
2. In this method it is possible to get pretty good frame compare histogram of target object with histogram of that
rate while detecting the object in video. object. We will continuously do same for each and every
3. This method is easy to understand and implement, thus frame of video, along with highlighting where the histogram
it is the best way for beginners to get started with of target object match with the best in frame. This is how we
machine learning and object tracking using OpenCV. keep track of an object in video.
Cons But as we know the nothing is perfect in the world, thus
1. This method is not recognize the features of target there comes some flaws with this method. As this is color
object while detecting it from video so has very less based tracking technique it can easy confuse the target
accuracy. object with other object with same color or relevant
2. This method will go in ambiguous state if there more histogram. Also if video is playing and if target object isn't in
than one objects in frame having same color as that of the frame, than it can't understand that object was there and
our target object. afterward it got out of frame or it wasn't in video from

@ IJTSRD | Unique Paper ID – IJTSRD39964 | Volume – 5 | Issue – 3 | March-April 2021 Page 805
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
beginning, that is it cannot understand context from past. [8] OpenCV change logs:
But along with all these disadvantages there is a major http://code.opencv.org/projects/opencv/wiki/Chang
advantage of using this method, which is need of eLog Archived 2013-01-15 at the Wayback Machine
computational power. This method can achieve pretty good
[9] OpenCV Developer Site: http://code.opencv.org
frame rates on an average machine. So this method can be
Archived 2013-01-13 at Archive. today
used to analyze the video footage where accuracy is second
factor but time is the limiting factor. [10] OpenCV User Site: http://opencv.org/
Hence ever beginner can use this OpenCV with this method [11] "Intel Acquires Computer Vision for IOT, Automotive |
of tracking object to get the basics of object tracking without Intel Newsroom". Intel Newsroom. Retrieved 2018-11-
any deep mathematical understanding , also the OpenCV is 26.
an open source library hence no licensing is needed in order
[12] "Intel acquires Russian computer vision company
to use it which make the learning process even more
Itseez". East-West Digital News. 2016-05-31. Retrieved
enjoyable.
2018-11-26.
7. References
[13] OpenCV: http://opencv.org/opencv-3-3.html
[1] https://docs.opencv.org/
[14] OpenCV C interface: http://docs.opencv.org
[2] https://www.geeksforgeeks.org/
[15] Introduction to OpenCV.js and Tutorials
[3] "GitHub - opencv/Opencv: Open Source Computer
Vision Library". 21 May 2020. [16] Cuda GPU port:
http://opencv.org/platforms/cuda.html Archived
[4] Intel acquires Itseez: https://opencv.org/intel-
2016-05-21 at the Way back Machine
acquires-itseez.html
[17] OpenCL Announcement: http://opencv.org/opencv-
[5] "CUDA". Opencv.org. Retrieved 2020-10-15.
v2-4-3rc-is-under-way.html
[6] Adrian Kaehler; Gary Bradski (14 December 2016).
[18] OpenCL-accelerated Computer Vision API Reference:
Learning OpenCV 3: Computer Vision in C++ with the
http://docs.opencv.org/modules/ocl/doc/ocl.html
OpenCV Library. O'Reilly Media. pp. 26ff. ISBN 978-1-
4919-3800-3. [19] Maemo port:
https://garage.maemo.org/projects/opencv
[7] Bradski, Gary; Kaehler, Adrian (2008). Learning
OpenCV: Computer vision with the OpenCV library. [20] BlackBerry 10 (partial port):
O'Reilly Media, Inc. p. 6. https://github.com/blackberry/OpenCV