Region Proposal Object Detection With Opencv, Keras, and Tensorflow
Region Proposal Object Detection With Opencv, Keras, and Tensorflow
DE E P LE ARNI NG KE RAS AND T E NSORFLOW OB JE CT DE T E CT I ON OPE NCV T UT ORI ALS T UT ORI ALS
6:40
In this tutorial, you will learn how to utilize region proposals for object detection using OpenCV, Keras,
and TensorFlow.
Today’s tutorial is part 3 in our 4-part series on deep learning and object detection:
Part 1: Turning any deep learning image classi er into an object detector with Keras and
TensorFlow
Part 3: Region proposal for object detection with OpenCV, Keras, and TensorFlow (today’s tutorial)
In last week’s tutorial, we learned how to utilize Selective Search to replace the traditional computer
vision approach of using bounding boxes and sliding windows for object detection.
But the question still remains: How do we take the region proposals (i.e., regions of an image that
could contain an object of interest) and then actually classify them to obtain our nal object
detections?
To learn how to perform object detection with region proposals using OpenCV, Keras, and
TensorFlow, just keep reading.
Looking for the source code to this post?
JUMP RIGHT TO THE DOWNLOA DS SEC TION
We’ll then implement region proposal object detection using OpenCV, Keras, and TensorFlow.
We’ll wrap up this tutorial by reviewing our region proposal object detection results.
What are region proposals, and how can they be used for object
detection?
Figure 1: OpenCV’s Selective Search applies hierarchical similarity measures to join regions and eventually form
the nal set of region proposals for where objects could be present. (image source)
We discussed the concept of region proposals and the Selective Search algorithm in last week’s tutorial
on OpenCV Selective Search for Object Detection — I suggest you give that tutorial a read before you
continue here today, but the gist is that traditional computer vision object detection algorithms relied
on image pyramids and sliding windows to locate objects in images and varying scales and locations:
There0:13
are a few problems with the image pyramid and sliding window method, but the primary issues
are that:
2 They are sensitive to hyperparameter choices (namely pyramid scale size, ROI size, and window
step size)
Region proposal algorithms seek to replace the traditional image pyramid and sliding window
approach.
These algorithms:
3 Merge segments of the superpixels based on ve components (color similarity, texture similarity,
size similarity, shape similarity/compatibility, and a nal meta-similarity that linearly combines the
aforementioned scores)
The end results are proposals that indicate where in the image there could be an object:
Figure 2: In this tutorial, we will learn how to use Selective Search region proposals to perform
object detection with OpenCV, Keras, and TensorFlow.
Notice how I’ve italicized “could” in the sentence above the image — keep in mind that region proposal
algorithms have no idea if a given region does in fact contain an object.
“ Hey, this looks like an interesting region of the input image. Let’s apply our more computationally
expensive classi er to determine what’s actually in this region.
Region proposal algorithms tend to be far more e cient than the traditional object detection
techniques of image pyramids and sliding windows because:
In the rest of this tutorial, you’ll learn how to implement region proposal object detection.
Either tutorial will help you con gure your system with all the necessary software for this blog post in a
convenient Python virtual environment.
Please note that PyImageSearch does not recommend or support Windows for CV/DL projects.
Project structure
Be sure to grab today’s les from the “Downloads” section so you can follow along with today’s tutorial:
As you can see, our project layout is very straightforward today, consisting of a single Python script,
aptly named region_proposal_detection.py for today’s region proposal object detection
example.
I’ve also included a picture of Jemma, my family’s beagle. We’ll use this photo for testing our OpenCV,
Keras, and TensorFlow region proposal object detection system.
Open a new le, name it region_proposal_detection.py , and insert the following code:
We begin our script with a handful of imports. In particular, we’ll be using the pre-trained ResNet50
classi er, my imutils implementation of non_max_suppression (NMS), and OpenCV. Be sure to
follow the links in the “Con guring your development environment” section to ensure that all of the
required packages are installed in a Python virtual environment.
Last week, we learned about Selective Search to nd region proposals where an object might exist.
We’ll now take last week’s code snippet and wrap it in a convenience function named
selective_search :
Our selective_search function accepts an input image and algorithmic method (either
"fast" or "quality" ).
From there, we initialize Selective Search with our input image (Lines 14 and 15).
We then explicitly set our mode using the value contained in method (Lines 19-24), which should
either be "fast" or "quality" . Generally, the faster method will be suitable; however, depending
on your application, you might want to sacri ce speed to achieve better quality results.
Finally, we execute Selective Search and return the region proposals ( rects ) via Lines 27-30.
When we call the selective_search function and pass an image to it, we’ll get a list of bounding
boxes that represent where an object could exist. Later, we will have code which accepts the bounding
boxes, extracts the corresponding ROI from the input image, passes the ROI into a classi er, and
applies NMS. The result of these steps will be a deep learning object detector based on independent
Selective Search and classi cation. We are not building an end-to-end deep learning object detector
with Selective Search embedded. Keep this distinction in mind as you follow the rest of this tutorial.
--image : The path to our input photo we’d like to perform object detection on
Now that our command line args are de ned, let’s hone in on the --filter argument:
Line 46 sets our class labelFilters directly from the --filter command line argument. From
there, Lines 49 and 50 overwrite labelFilters with each comma delimited class stored organized
into a single Python list.
We also load our input --image and extract its dimensions (Lines 57 and 58).
At this point, we’re ready to apply Selective Search to our input photo:
In the next code block, we’re going to populate two lists using our region proposals:
proposals : Initialized on Line 68, this list will hold su ciently large pre-processed ROIs from our
input --image , which we will feed into our ResNet classi er.
boxes : Initialized on Line 69, this list of bounding box coordinates corresponds to our
proposals and is similar to rects with an important distinction: Only su ciently large regions
are included.
We need our proposals ROIs to send through our image classi er, and we need the boxes
coordinates so that we know where in the input --image each ROI actually came from.
Now that we have an understanding of what we need to do, let’s get to it:
Looping over proposals from Selective Search ( rects ) beginning on Line 73, we proceed to:
Filter out small boxes that likely don’t contain an object (i.e., noise) via Lines 77 and 78
Extract our region proposal roi (Line 83) and preprocess it (Lines 84-89)
We have one nal pre-processing step to handle before inference — converting the proposals list
into a NumPy array. Line 96 handles this step.
We make predictions on our proposals by performing deep learning classi cation inference (Line
102 and 103).
Given each classi cation, we’ll lter the results based on our labelFilters and --conf
(con dence threshold). The labels dictionary (initialized on Line 107) will hold each of our class
labels (keys) and lists of bounding boxes + probabilities (values). Let’s lter and organize the results
now:
Extract the prediction information including the class label and probability (Line 112)
Ensure the particular prediction’s class label is in the label lter, dropping results we don’t wish to
consider (Lines 116 and 117)
Grab the bounding box associated with the prediction and then convert and store (x, y)-coordinates
(Lines 124 and 125)
Update the labels dictionary so that it is organized with each ImageNet class label (key)
associated with a list of tuples (value) consisting of a detection’s bounding box and prob (Lines
129-131)
Now that our results are collated in the labels dictionary, we will produce two visualizations of our
results:
By applying NMS, weak overlapping bounding boxes will be suppressed, thereby resulting in a single
object detection.
In order to demonstrate the power of NMS, rst let’s generate our Before NMS result:
→ Launch Jupyter Notebook on Google Colab
Looping over unique keys in our labels dictionary, we annotate our output image with bounding
boxes for that particular label (Lines 140-144) and display the Before NMS result (Line 149). Given
that our visualization will likely be very cluttered with many bounding boxes, I chose not to annotate
class labels.
Now, let’s apply NMS and display the After NMS result:
From there, we annotate each remaining bounding box and class label (Lines 160-166) and display
the After NMS result (Line 169).
Both the Before NMS and After NMS visualizations will remain on your screen until a key is pressed
(Line 170).
Make sure you use the “Downloads” section of this tutorial to download the source code and example
images.
Figure 3: Left: Object detections for the “beagle” class as a result of region proposal object detection with
OpenCV, Keras, and TensorFlow. Right: After applying non-maxima suppression to eliminate overlapping
bounding boxes.
If you take a look at Figure 3, you’ll see that on the left we have the object detections for the “beagle”
class (a type of dog) and on the right we have the output after applying non-maxima suppression.
As you can see from the output, Jemma, my family’s beagle, was correctly detected!
However, as the rest of our results show, our model is also reporting that we detected a “clog” (a type
of wooden shoe):
Figure 4: One of the regions proposed by Selective Search is later predicted incorrectly to have a “clog” shoe in
it using OpenCV, Keras, and TensorFlow.
Figure 5: Another region proposed by Selective Search is then classi ed incorrectly to have a “quill” pen in it.
Looking at the ROIs for each of these classes, one can imagine how our CNN may have been confused
when making those classi cations.
The solution here is that we can lter through only the detections we care about.
For example, if I were building a “beagle detector” application, I would supply the --filter beagle
command line argument:
And in that case, only the “beagle” class is found (the rest are discarded).
In next week’s tutorial, I’ll show you how we can use Selective Search and region proposals to build a
complete R-CNN object detector pipeline that is far more accurate than the method we’ve covered
here today.
I strongly believe that if you had the right teacher you could master computer vision and
deep learning.
Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and equations? Or
requires a degree in computer science?
All you need to master computer vision and deep learning is for someone to explain things
to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change
education and how complex Arti cial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch
University, the most comprehensive computer vision, deep learning, and OpenCV course
online today. Here you’ll learn how to successfully and con dently apply computer vision to
your work, research, and projects. Join me in computer vision mastery.
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no
dev environment con guration required!)
Summary
In this tutorial, you learned how to perform region proposal object detection with OpenCV, Keras, and
TensorFlow.
1 Step #1: Use Selective Search (a region proposal algorithm) to generate candidate regions of an
input image that could contain an object of interest.
2 Step #2: Take these regions and pass them through a pre-trained CNN to classify the candidate
areas (again, that could contain an object).
3 Step #3: Apply non-maxima suppression (NMS) to suppress weak, overlapping bounding boxes.
4 Step #4: Return the nal bounding boxes to the calling function.
We implemented the above pipeline using OpenCV, Keras, and TensorFlow — all in ~150 lines of code!
However, you’ll note that we used a network that was pre-trained on the ImageNet dataset.
And how will that change our inference code used for object detection?
I’ll be answering those questions in next week’s tutorial.
To download the source code to this post (and be noti ed when future tutorials are published here
on PyImageSearch), simply enter your email address in the form below!
Previous Article:
Next Article:
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing
from readers, a couple years ago I made the tough decision to no longer o er 1:1 help over
blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I
simply did not have the time to moderate and respond to them all, and the shear volume of
requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV
community at large by focusing my time on authoring high-quality blog posts, tutorials, and
books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my
full catalog of books and courses — they have helped tens of thousands of developers,
students, and researchers just like yourself learn Computer Vision, Deep Learning, and
OpenCV.