Laboratory 5. Feature Detection and Content Descriptors For Matching Applications
Laboratory 5. Feature Detection and Content Descriptors For Matching Applications
Laboratory 5. Feature detection and content descriptors for matching applications ..................... 1
“Image features are interesting points in an image, also called interest points or key points. They
can be useful in multiple computer vision application as image alignment or image matching.”
For a point in cartesian coordinates (𝑥, 𝑦), translation is defined by the multiplication with the
identity matrix and adding a displacement vector (𝛿𝑥 , 𝛿𝑦 ) (or a shift). Similar types of formulae define the
other operations. These formulas give the starting point coordinates (𝑥, 𝑦) and the final coordinates after
the transform is applied:
1 0 𝑥 𝛿𝑥 𝑥 + 𝛿𝑥
𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛: [ ] ∗ [ 𝑦 ] + [𝛿 ] = [𝑦 + 𝛿 ]
0 1 𝑦 𝑦
𝑠𝑥 0 𝑥 𝑥𝑠𝑥
𝑆𝑐𝑎𝑙𝑒: [0 𝑠𝑦 ] ∗ [ 𝑦 ] = [𝑦𝑠𝑦 ]
1
Fundamentals of Image Processing and Computer Vision – Laboratory 5
2
Fundamentals of Image Processing and Computer Vision – Laboratory 5
Having two images related by an affine transform and knowing the location of at least 3 points in
the source image and the destination image, we can recover the affine transform between them using the
function cv2.estimateAffine2D. Details about the function parameters can be found here.
❖ Ex. 5.1 Build a test image (float32) containing a green square on a black background. Apply the
following affine transforms, analyze the output images and identify the types of operations used
(rotation, translation, shear and / or scaling).
𝟏 𝟎 𝟐𝟓 𝟐 𝟎 𝟎 𝒄𝒐𝒔(𝜃) 𝒔𝒊𝒏(𝜃) 𝟎 𝟏 𝟎. 𝟏 𝟎
[ ]; [ ]; [ ]; [ ]
𝟎 𝟏 𝟐𝟓 𝟎 𝟏 𝟎 −𝒔𝒊𝒏(𝜃) 𝒄𝒐𝒔(𝜃) 𝟎 𝟎 𝟏 𝟎
❖ Ex. 5.2 Use the same input image of the square and consider the coordinates of 3 corners: (50,50), (50,
149) and (149, 50). In the destination image, these points must be located at (74, 50), (83,170), (192,
29) respectively. Use the function estimateAffine2D to calculate the matrix and then apply the
estimated transform to the original image of the square. Compare the result to the one obtained by
specifying all 4 corners (original position (149, 149) will move to (183, 135)). The true affine matrix is
known to be:
𝟏. 𝟎𝟖𝟑𝟐 𝟎. 𝟎𝟖𝟐𝟔 𝟏𝟎
[ ]
−𝟎. 𝟏𝟗𝟏𝟎 𝟏. 𝟏𝟎𝟐𝟑 𝟎
Is the estimate getting closer to the true affine matrix when we use 3 points or more?
5.2. Homography
Homography (or perspective transform) is a transformation that maps the points in one image to
the corresponding points in another image, provided these points lie on the same plane. An example is
presented in Figure 2, where a blue square from the image in the left is mapped to another view of the same
blue square, taken from a different perspective.
3
Fundamentals of Image Processing and Computer Vision – Laboratory 5
❖ Ex. 5.3 Read the source image casa.jpg and the destination image books.jpg. Identify the 4 corners of
the book in the source image (zoom out) and build a numpy array srcPts with their coordinates. Repeat
4
Fundamentals of Image Processing and Computer Vision – Laboratory 5
for the destination image and obtain dstPts. Calculate the perspective transform matrix that relates
the 2 sets of points using findHomography, then warp the source image to the destination based on
the estimated homography matrix. Display all 3 images, source, destination and output.
1. Brute-Force matcher – This matcher is simple. It takes the descriptor of one feature in first set
and matches it with all other features in the second set, using some distance calculation. The closest one is
returned. For BF matcher, it is necessary to create the BFMatcher object using cv2.BFMatcher(). Once
the matcher is created, 2 important methods can be used for matching:
- BFMatcher.match() - returns the best match, or
- BFMatcher.knnMatch() - returns k best matches where k is specified by the user. It is useful when
there is necessary to do additional work on that.
3. FLANN Matcher – FLANN stands for Fast Library for Approximate Nearest Neighbors. It
contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high
dimensional features. It works faster than BFMatcher for large datasets.
In order to use a FLANN based matcher, we need to specify the algorithm to be used and its related
parameters etc. The parameters are:
5
Fundamentals of Image Processing and Computer Vision – Laboratory 5
IndexParams – Specifies the algorithm to be used. For algorithms like SIFT and SURF, use the
following method FLANN_INDEX_KDTREE = 1. An example of parameter set is:
SearchParams – It specifies the number of times the trees in the index should be recursively
traversed. Higher values give a better precision, but also takes more time. To change the value, pass
search_params = dict(checks=100)
flann = cv2.FlannBasedMatcher(index_params,search_params) can be used to
create the Matcher object flann.
flann.match() - returns the best match, or flann.knnMatch() - returns k best matches.
5.3.1 RANSAC
RANSAC stands for Random Sample Consensus and it is a powerful model-fitting algorithm, since
it works in the presence of a large number of outliers. Suppose there is a sample data set presented in Figure
3.a) and it is necessary to fit a line through it. The samples are affected by noise, since not all the points lie
on the line. The main idea is to identify a line that minimizes the sum of the distances of the points from
the line. The method is called least squares.
Let us suppose the data set contains also an outlier, as illustrated in Figure 3.b). Using directly least
squares, the outlier will ruin the estimate of the fitted line. In RANSAC, these cases are solved.
a) b)
Figure 3. Fitting a set of data samples
6
Fundamentals of Image Processing and Computer Vision – Laboratory 5
a) b)
Figure 4. Random selection of 2 data points to fit a line.
Step 2. Identify inliers. Find all the points that agree with the fitted line from Figure 4.b). The
points located close to the fitted line are considered inliers or consensus set and are all marked with green
dots in Figure 5. The outliers are displayed in red. In this example there are 10 inliers and 6 outliers.
Step 3. Repeat Step 1 and Step 2 for another pair of random points and fit another line through it.
Suppose the selection is presented in Figure 6.a). The model is not good this time (count of inliers versus
count of outliers), so the previous fitted model is considered the best one so far, while the current one is
ignored.
a) b) c)
Figure 6.
Repeating this process multiple times (see Figure 6.b)), it will lead in the end to a model that has a
large number of inliers, as the one in Figure 6.c). For each repetition, the model with the largest number of
inliers is selected.
7
Fundamentals of Image Processing and Computer Vision – Laboratory 5
Step 4. Line fit using only inliers. The best final model from Step 3 is then improved by fitting a
new line only for the group of inliers.
Being robust to outliers, this technique is often used also for 3D computer vision applications.
Figure 7. Left: original form. Center: filled out form photographed using a smart phone. Right: Result of aligning the
filled-out form to the original template.
Image alignment (or image registration) is the technique of warping one image so that the features
in the two images line up perfectly. It is necessary also in some medical applications, when multiple scans
of a tissue may be taken at slightly different times and the two images need to be registered.
8
Fundamentals of Image Processing and Computer Vision – Laboratory 5
# Match features.
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = matcher.match(descriptors1, descriptors2, None)
Step 4. Compute the homography transform using the corresponding points from the 2 images. Automatic
feature matching does not always produce 100% accurate matches and it is not uncommon for 20-30%
of the matches to be incorrect. Fortunately, the findHomography method utilizes RANSAC, which
produces the right result even in the presence of a large number of bad matches. Print the computed
homography transform.
# Find homography
h, mask = cv2.findHomography(points1, points2, cv2.RANSAC)
Step 5. Warp the filled-out form image, using the homography calculated previously. The homography
transform should be applied to all pixels in the filled-out form to map them to the template image.
Display the final registered image.
# Use homography
height, width, channels = imTemplate.shape
imRegistered = cv2.warpPerspective(imFilledForm, h, (width, height))
5.3.3 Panorama
In panorama creation there are multiple computer vision techniques involved. The basic principle
is to align the concerned images using a homography and 'stitch'ing them intelligently so that bindings are
not visible. The technique is called “feature based” image alignment because a sparse set of features are
detected in one image and matched with the features in the other image. A perspective transformation
(homography) is calculated based on these matched features and this transform is used to warp one image
on to the other. We have already illustrated this in the last application. Once the second image is aligned
with respect to the first image, we simply stitch the first image with the aligned second image to obtain a
panorama.
9
Fundamentals of Image Processing and Computer Vision – Laboratory 5
❖ Ex. 5.5 Follow the steps to create a simple panorama from 2 images:
Step 1. Read the input color images boy1.jpg and boy2.jpg (or house1.jpg and house2.jpg). Compute the
keypoints and descriptors for both images using the algorithm ORB. Display the keypoints located in
the first image.
Step 2. Match the corresponding points using a DescriptorMatcher. Sort the scores in descending order
and only take top 15% of the matches as corresponding points for the next step. Display the matchings
obtained from the DescriptorMatcher.
# Find homography
h, mask = cv2.findHomography(points2, points1, cv2.RANSAC)
Step 4. Apply the perspective transformation h to all pixels in one image to map it to the other image. This
is done using the warpPerspective function in OpenCV. While using the warp function, specify a
different size of the output image (not the default size). For stitching the images horizontally, we should
specify the width of the output image to be the sum of both images. Keep the height to be the same as
any one of them.
# Use homography
im1Height, im1Width, channels = im1.shape
im2Height, im2Width, channels = im2.shape
10
Fundamentals of Image Processing and Computer Vision – Laboratory 5
Step 5. Stitch the first image to the second image aligned in Step 4. Stitching can be a simple concatenation.
11