0% found this document useful (0 votes)
41 views

Laboratory 5. Feature Detection and Content Descriptors For Matching Applications

This document discusses feature detection, content descriptors, and matching applications in image processing and computer vision. It covers the following topics in 3 sentences: Affine transforms including translation, rotation, scale, and shear are described through transformation matrices. Homography, or perspective transform, is a transformation that maps points in one image to corresponding points in another image provided they lie on the same plane and can be represented by a 3x3 matrix. Feature matching techniques like RANSAC and using homography to align images and create panoramas are also discussed.

Uploaded by

Iulian Neaga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Laboratory 5. Feature Detection and Content Descriptors For Matching Applications

This document discusses feature detection, content descriptors, and matching applications in image processing and computer vision. It covers the following topics in 3 sentences: Affine transforms including translation, rotation, scale, and shear are described through transformation matrices. Homography, or perspective transform, is a transformation that maps points in one image to corresponding points in another image provided they lie on the same plane and can be represented by a 3x3 matrix. Feature matching techniques like RANSAC and using homography to align images and create panoramas are also discussed.

Uploaded by

Iulian Neaga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Fundamentals of Image Processing and Computer Vision – Laboratory 5

Laboratory 5. Feature detection and content descriptors for


matching applications

Laboratory 5. Feature detection and content descriptors for matching applications ..................... 1

5.1 Affine transforms ................................................................................................................... 1


5.2. Homography ......................................................................................................................... 3
5.3 Feature matching.................................................................................................................... 5

5.3.1 RANSAC ........................................................................................................................ 6


5.3.2 Image alignment using homography ............................................................................... 8
5.3.3 Panorama......................................................................................................................... 9

“Image features are interesting points in an image, also called interest points or key points. They
can be useful in multiple computer vision application as image alignment or image matching.”

5.1 Affine transforms


An affine transform is a combination of translation, rotation, scale and shear, illustrated in Figure
1 for a simple triangle shape.

Original image Translation Rotation Scale Shear


Figure 1. Examples of translation, rotation, scale and shear transforms, applied to a triangle.

For a point in cartesian coordinates (𝑥, 𝑦), translation is defined by the multiplication with the
identity matrix and adding a displacement vector (𝛿𝑥 , 𝛿𝑦 ) (or a shift). Similar types of formulae define the
other operations. These formulas give the starting point coordinates (𝑥, 𝑦) and the final coordinates after
the transform is applied:
1 0 𝑥 𝛿𝑥 𝑥 + 𝛿𝑥
𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛: [ ] ∗ [ 𝑦 ] + [𝛿 ] = [𝑦 + 𝛿 ]
0 1 𝑦 𝑦

𝑠𝑥 0 𝑥 𝑥𝑠𝑥
𝑆𝑐𝑎𝑙𝑒: [0 𝑠𝑦 ] ∗ [ 𝑦 ] = [𝑦𝑠𝑦 ]

1
Fundamentals of Image Processing and Computer Vision – Laboratory 5

𝑐𝑜𝑠(𝜃) −𝑠𝑖𝑛(𝜃) 𝑥 𝑥 𝑐𝑜𝑠(𝜃) − 𝑦 𝑠𝑖𝑛(𝜃)


𝑅𝑜𝑡𝑎𝑡𝑖𝑜𝑛: [ ] ∗ [𝑦 ] = [ ]
𝑠𝑖𝑛(𝜃) 𝑐𝑜𝑠(𝜃) 𝑥 𝑠𝑖𝑛(𝜃) + 𝑦 𝑐𝑜𝑠(𝜃)
1 𝑘 𝑥 𝑥 + 𝑘𝑦
𝑆ℎ𝑒𝑎𝑟: [ ] ∗ [𝑦] = [ ]
0 1 𝑦
In case of multiple transforms applied to the starting point, for example scale, shear and translation,
the transformation matrices are simply applied one after the other:
1 𝑘 𝑠𝑥 0 𝑥 𝛿𝑥 𝑥𝑠𝑥 + 𝑘𝑦𝑠𝑦 + 𝛿𝑥
[ ]∗([0 𝑠𝑦 ] ∗ [ 𝑦 ] ) + [𝛿𝑦 ] = [ ]
0 1 𝑦𝑠𝑦 + 𝛿𝑦
An affine transform implies rotation, shear, scale and translation at the same time:
𝑐𝑜𝑠(𝜃) −𝑠𝑖𝑛(𝜃) 1 𝑘 𝑠𝑥 0 𝑥 𝛿𝑥
[ ]∗[ ]∗[0 𝑠𝑦 ] ∗ [𝑦] + [𝛿𝑦 ]
𝑠𝑖𝑛(𝜃) 𝑐𝑜𝑠(𝜃) 0 1
𝑠𝑥 𝑐𝑜𝑠(𝜃) 𝑘𝑠𝑥 𝑐𝑜𝑠(𝜃) − 𝑠𝑥 𝑠𝑖𝑛(𝜃) 𝑥 𝛿𝑥
=[ ] ∗ [𝑦 ] + [ ]
𝑠𝑦 𝑠𝑖𝑛(𝜃) 𝑘𝑠𝑦 𝑠𝑖𝑛(𝜃) + 𝑠𝑦 𝑐𝑜𝑠(𝜃) 𝛿𝑦
𝑥
𝑠𝑥 𝑐𝑜𝑠(𝜃) 𝑘𝑠𝑥 𝑐𝑜𝑠(𝜃) − 𝑠𝑥 𝑠𝑖𝑛(𝜃) 𝛿𝑥
=[ 𝑦
]∗[ ]
𝑠𝑦 𝑠𝑖𝑛(𝜃) 𝑘𝑠𝑦 𝑠𝑖𝑛(𝜃) + 𝑠𝑦 𝑐𝑜𝑠(𝜃) 𝛿𝑦
1
So, an affine transform is a linear operation represented as 2x3 matrix:
𝑎 𝑏 𝛿𝑥
[𝑐 𝑑 𝛿𝑦 ]
Having 2 sets of 3 non-collinear points in each set: 𝑆1 = {(𝑥11 , 𝑦11 ), (𝑥12 , 𝑦12 ), (𝑥13 , 𝑦13 )} and
𝑆2 = {(𝑥21 , 𝑦21 ), (𝑥22 , 𝑦22 ), (𝑥23 , 𝑦23 )}, there exists a unique affine transform that can be applied to the
1st set of points to obtain the 2nd set of points. The inverse of the affine transform matrix can be applied to
the 2nd set of points to obtain the 1st set.
Through an affine transform, parallel lines in the original image remain parallel, but lines that were
perpendicular to each other are no longer at 90o to each other in the transformed image. Orthogonality is
not preserved in the affine transform due to the shear component.
Affine transforms do not include perspective distortion. Through an affine transform, a square can
be changed into a parallelogram (by scaling and rotation), but it can not be changed into an arbitrary
quadrilateral. For this type of shape modifications we need to introduce another type of transform named
homography. In OpenCV, an affine transform can be applied to an image using the function
cv2.warpAffine:

dst = cv2.warpAffine(src, M,dsize[, dst[, flags[, borderMode[, borderValue]]]])

with the following parameters:


src input image.
dst output image that has the size dsize and the same type as src.
M is a 2×3 transformation matrix.

2
Fundamentals of Image Processing and Computer Vision – Laboratory 5

dsize size of the output image.


flags combination of interpolation methods (see InterpolationFlags) and the optional flag
WARP_INVERSE_MAP that means that M is the inverse transformation ( dst→src ).
borderMode pixel extrapolation method (see BorderTypes); when
borderMode=BORDER_TRANSPARENT, it means that the pixels in the destination image
corresponding to the "outliers" in the source image are not modified by the function.
borderValue value used in case of a constant border; by default, it is 0.

Having two images related by an affine transform and knowing the location of at least 3 points in
the source image and the destination image, we can recover the affine transform between them using the
function cv2.estimateAffine2D. Details about the function parameters can be found here.

retval, inliers = cv2.estimateAffine2D(from, to[, inliers[, method[,


ransacReprojThreshold[, maxIters[, confidence[, refineIters]]]]]] )

❖ Ex. 5.1 Build a test image (float32) containing a green square on a black background. Apply the
following affine transforms, analyze the output images and identify the types of operations used
(rotation, translation, shear and / or scaling).
𝟏 𝟎 𝟐𝟓 𝟐 𝟎 𝟎 𝒄𝒐𝒔(𝜃) 𝒔𝒊𝒏(𝜃) 𝟎 𝟏 𝟎. 𝟏 𝟎
[ ]; [ ]; [ ]; [ ]
𝟎 𝟏 𝟐𝟓 𝟎 𝟏 𝟎 −𝒔𝒊𝒏(𝜃) 𝒄𝒐𝒔(𝜃) 𝟎 𝟎 𝟏 𝟎
❖ Ex. 5.2 Use the same input image of the square and consider the coordinates of 3 corners: (50,50), (50,
149) and (149, 50). In the destination image, these points must be located at (74, 50), (83,170), (192,
29) respectively. Use the function estimateAffine2D to calculate the matrix and then apply the
estimated transform to the original image of the square. Compare the result to the one obtained by
specifying all 4 corners (original position (149, 149) will move to (183, 135)). The true affine matrix is
known to be:
𝟏. 𝟎𝟖𝟑𝟐 𝟎. 𝟎𝟖𝟐𝟔 𝟏𝟎
[ ]
−𝟎. 𝟏𝟗𝟏𝟎 𝟏. 𝟏𝟎𝟐𝟑 𝟎

Is the estimate getting closer to the true affine matrix when we use 3 points or more?

5.2. Homography
Homography (or perspective transform) is a transformation that maps the points in one image to
the corresponding points in another image, provided these points lie on the same plane. An example is
presented in Figure 2, where a blue square from the image in the left is mapped to another view of the same
blue square, taken from a different perspective.

3
Fundamentals of Image Processing and Computer Vision – Laboratory 5

Figure 2. A homography transform.

A homography can be described by a 3x3 matrix:


ℎ11 ℎ12 ℎ13
𝐻 = [ℎ21 ℎ22 ℎ23 ]
ℎ31 ℎ32 ℎ33
and it actually maps the point in the left corner, with coordinates (𝑥, 𝑦), to the point in the right having as
coordinates (𝑢, 𝑣), according to the formula
𝑥 𝑢 ℎ11 ℎ12 ℎ13 𝑢
[𝑦] = 𝐻 ∗ [𝑣 ] = [ℎ21 ℎ22 ℎ23 ] ∗ [𝑣 ]
1 1 ℎ31 ℎ32 ℎ33 1
Having then a photograph of a plane taken from 2 different locations, all the points on that plane in one
image are related to the corresponding points in the other image through a homography transform.
Homography is used in camera calibration, 3D reconstruction techniques, document alignment and in
building panoramas. Compared to an affine transform, a homography has more degrees of freedom. Also,
to find the best affine transform that relates 2 images, we need minimum 3 pairs of corresponding points
between those images. To identify the best homography that relates 2 images, we need at least 4 pairs of
corresponding points. Homography preserves only straight lines (while affine transform preserves also
parallel lines).
In OpenCV we can calculate a homography between different images using the function
findHomography, and then we can use the function cv2.warpPerspective to align the 2 images using
the estimated homography. At least 4 corresponding points in the source and destination images are
necessary to find a homography matrix h. The parameter size represents the size (width, height) of
im_dst.

h, status = cv2.findHomography(pts_src, pts_dst)


im_dst = cv2.warpPerspective(im_src, h, size)

❖ Ex. 5.3 Read the source image casa.jpg and the destination image books.jpg. Identify the 4 corners of
the book in the source image (zoom out) and build a numpy array srcPts with their coordinates. Repeat

4
Fundamentals of Image Processing and Computer Vision – Laboratory 5

for the destination image and obtain dstPts. Calculate the perspective transform matrix that relates
the 2 sets of points using findHomography, then warp the source image to the destination based on
the estimated homography matrix. Display all 3 images, source, destination and output.

5.3 Feature matching


Using SIFT or ORB, we have found in the previous laboratory the best keypoints (or features) in
an image. Those features can be matched in different images and there are 3 types of feature matching
methods available in OpenCV:

1. Brute-Force matcher – This matcher is simple. It takes the descriptor of one feature in first set
and matches it with all other features in the second set, using some distance calculation. The closest one is
returned. For BF matcher, it is necessary to create the BFMatcher object using cv2.BFMatcher(). Once
the matcher is created, 2 important methods can be used for matching:
- BFMatcher.match() - returns the best match, or
- BFMatcher.knnMatch() - returns k best matches where k is specified by the user. It is useful when
there is necessary to do additional work on that.

2. Descriptor Matcher – It is similar in functionality to the BFMatcher class.


cv2.DescriptorMatcher_create can be used to create the Matcher object. More details can be found
here. The Descriptor Matcher object (let it be denoted DMatcher) contains the method match:
matches = DMatcher.match(queryDescriptor, trainDescriptor[, mask])
The method DMatcher.match() finds the best match for each descriptor from the query set. Optional
mask (or masks) can be passed to specify which query and training descriptors can be matched.
The output matches has the following attributes:
matches.distance - Distance between descriptors. Should be lower for better match.
matches.trainIdx - Index of the descriptor in train descriptors (corresponds to points in image2).
matches.queryIdx - Index of the descriptor in query descriptors (corresponds to points in image1)
matches.imgIdx - Index of the train image.

3. FLANN Matcher – FLANN stands for Fast Library for Approximate Nearest Neighbors. It
contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high
dimensional features. It works faster than BFMatcher for large datasets.
In order to use a FLANN based matcher, we need to specify the algorithm to be used and its related
parameters etc. The parameters are:

5
Fundamentals of Image Processing and Computer Vision – Laboratory 5

IndexParams – Specifies the algorithm to be used. For algorithms like SIFT and SURF, use the
following method FLANN_INDEX_KDTREE = 1. An example of parameter set is:

index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)

For ORB, use FLANN_INDEX_LSH = 6.

index_params= dict(algorithm = FLANN_INDEX_LSH, table_number = 6, key_size =


12,multi_probe_level = 1)

SearchParams – It specifies the number of times the trees in the index should be recursively
traversed. Higher values give a better precision, but also takes more time. To change the value, pass
search_params = dict(checks=100)
flann = cv2.FlannBasedMatcher(index_params,search_params) can be used to
create the Matcher object flann.
flann.match() - returns the best match, or flann.knnMatch() - returns k best matches.

5.3.1 RANSAC
RANSAC stands for Random Sample Consensus and it is a powerful model-fitting algorithm, since
it works in the presence of a large number of outliers. Suppose there is a sample data set presented in Figure
3.a) and it is necessary to fit a line through it. The samples are affected by noise, since not all the points lie
on the line. The main idea is to identify a line that minimizes the sum of the distances of the points from
the line. The method is called least squares.
Let us suppose the data set contains also an outlier, as illustrated in Figure 3.b). Using directly least
squares, the outlier will ruin the estimate of the fitted line. In RANSAC, these cases are solved.

a) b)
Figure 3. Fitting a set of data samples

RANSAC assumes the following steps:


Step 1. Selection of data points. Randomly select the minimum number of data points needed to
fit the model. For a line, the minimum number of data points is 2. Suppose the random points are those
presented in green in Figure 4.a), while the fitted line is displayed in 4.b).

6
Fundamentals of Image Processing and Computer Vision – Laboratory 5

a) b)
Figure 4. Random selection of 2 data points to fit a line.

Step 2. Identify inliers. Find all the points that agree with the fitted line from Figure 4.b). The
points located close to the fitted line are considered inliers or consensus set and are all marked with green
dots in Figure 5. The outliers are displayed in red. In this example there are 10 inliers and 6 outliers.

Figure 5. Inliers and outliers for a fitted model

Step 3. Repeat Step 1 and Step 2 for another pair of random points and fit another line through it.
Suppose the selection is presented in Figure 6.a). The model is not good this time (count of inliers versus
count of outliers), so the previous fitted model is considered the best one so far, while the current one is
ignored.

a) b) c)
Figure 6.

Repeating this process multiple times (see Figure 6.b)), it will lead in the end to a model that has a
large number of inliers, as the one in Figure 6.c). For each repetition, the model with the largest number of
inliers is selected.

7
Fundamentals of Image Processing and Computer Vision – Laboratory 5

Step 4. Line fit using only inliers. The best final model from Step 3 is then improved by fitting a
new line only for the group of inliers.

Being robust to outliers, this technique is often used also for 3D computer vision applications.

5.3.2 Image alignment using homography


In some applications, 2 images of the same document need to be aligned (one feature in image 1 is
located at different coordinates in image 2), as the case with the documents presented in Figure 7.

Figure 7. Left: original form. Center: filled out form photographed using a smart phone. Right: Result of aligning the
filled-out form to the original template.

Image alignment (or image registration) is the technique of warping one image so that the features
in the two images line up perfectly. It is necessary also in some medical applications, when multiple scans
of a tissue may be taken at slightly different times and the two images need to be registered.

❖ Ex. 5.4 Follow the steps to complete an image registration task.


Step 1. Read the input images containing th–e template form (adev.jpg) and filled-out form (adev_scan.jpg).
Convert both images to grayscale and display them.
Step 2. Detect ORB features in the two images and compute also the corresponding descriptors. In theory,
we need at least 4 features to compute the homography, but in practice, hundreds of features are detected
in the two images. Set the number of features to 500 using the parameter MAX_FEATURES in the Python
code.
Step 3. Match the features in the 2 images using a DescriptorMatcher. Sort them by goodness of match and
keep only a small percentage of original matches (set GOOD_MATCH_PERCENT to 0.15). Use the
Hamming distance as a measure of similarity between the two feature descriptors computed with ORB.
Display the matched features using cv2.drawMatches. Are all matches correct?

8
Fundamentals of Image Processing and Computer Vision – Laboratory 5
# Match features.
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = matcher.match(descriptors1, descriptors2, None)

# Sort matches by score


matches.sort(key=lambda x: x.distance, reverse=False)

# Remove not so good matches


numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]

Step 4. Compute the homography transform using the corresponding points from the 2 images. Automatic
feature matching does not always produce 100% accurate matches and it is not uncommon for 20-30%
of the matches to be incorrect. Fortunately, the findHomography method utilizes RANSAC, which
produces the right result even in the presence of a large number of bad matches. Print the computed
homography transform.

# Extract location of good matches


points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)

for i, match in enumerate(matches):


points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt

# Find homography
h, mask = cv2.findHomography(points1, points2, cv2.RANSAC)

Step 5. Warp the filled-out form image, using the homography calculated previously. The homography
transform should be applied to all pixels in the filled-out form to map them to the template image.
Display the final registered image.

# Use homography
height, width, channels = imTemplate.shape
imRegistered = cv2.warpPerspective(imFilledForm, h, (width, height))

5.3.3 Panorama
In panorama creation there are multiple computer vision techniques involved. The basic principle
is to align the concerned images using a homography and 'stitch'ing them intelligently so that bindings are
not visible. The technique is called “feature based” image alignment because a sparse set of features are
detected in one image and matched with the features in the other image. A perspective transformation
(homography) is calculated based on these matched features and this transform is used to warp one image
on to the other. We have already illustrated this in the last application. Once the second image is aligned
with respect to the first image, we simply stitch the first image with the aligned second image to obtain a
panorama.

9
Fundamentals of Image Processing and Computer Vision – Laboratory 5

❖ Ex. 5.5 Follow the steps to create a simple panorama from 2 images:
Step 1. Read the input color images boy1.jpg and boy2.jpg (or house1.jpg and house2.jpg). Compute the
keypoints and descriptors for both images using the algorithm ORB. Display the keypoints located in
the first image.
Step 2. Match the corresponding points using a DescriptorMatcher. Sort the scores in descending order
and only take top 15% of the matches as corresponding points for the next step. Display the matchings
obtained from the DescriptorMatcher.

Figure 8. Best matchings between images

Step 3. Compute the homography using findHomography and RANSAC.

points1 = np.zeros((len(matches), 2), dtype=np.float32)


points2 = np.zeros((len(matches), 2), dtype=np.float32)

for i, match in enumerate(matches):


points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt

# Find homography
h, mask = cv2.findHomography(points2, points1, cv2.RANSAC)

Step 4. Apply the perspective transformation h to all pixels in one image to map it to the other image. This
is done using the warpPerspective function in OpenCV. While using the warp function, specify a
different size of the output image (not the default size). For stitching the images horizontally, we should
specify the width of the output image to be the sum of both images. Keep the height to be the same as
any one of them.

# Use homography
im1Height, im1Width, channels = im1.shape
im2Height, im2Width, channels = im2.shape

im2Aligned = cv2.warpPerspective(im2, h,(im2Width + im1Width, im2Height))

10
Fundamentals of Image Processing and Computer Vision – Laboratory 5

Display the second image aligned to the first image.

Step 5. Stitch the first image to the second image aligned in Step 4. Stitching can be a simple concatenation.

# Stitch Image 1 with aligned image 2


stitchedImage = np.copy(im2Aligned)
stitchedImage[0:im1Height,0:im1Width] = im1

Display the output image.

Figure 9. The 2 images combined into a panorama

11

You might also like