CV Assignment 2 RecognitionAR
CV Assignment 2 RecognitionAR
In this assignment, we will practice object recognition and robust model fitting with RANSAC
for homography estimation. We want to locate a given reference object in another image of a scene
which contains that object (and eventually many other objects). Applications are on augmented
reality or in a robotics context of we want to make a robot containing a camera to recognize and
locate a given object in space to grasp it. We will start considering planar objects, and very salient
objects to find, such as a painting. For instance, you will try to locate the painting “Nuit étoilée”
of Van Gogh (the reference image, shown in the left most picture) in other publicly available image
views in the MoMA museum (first row). We will then locate and fit an homography transformation
between the reference view of the painting and its new locations in the other views and then replace
it with another picture...such as the logo of ESIREM (second row of images) as follows:
For doing that automatically with the concepts seen in the class, we first will detect keypoints, ex-
tract features and then match the points between the different images. Then, from these candidate
point correspondences, you will use RANSAC (and implement it) to estimate a robust transfor-
mation between the reference painting view (left most image) and the different target views, such
as to be robust to wrong matches. Many of the steps of this assignment can be performed with
tensors in PyTorch (for you to practice). Please notice you are NOT allowed to use any high level
OpenCV implementation during this exercise (unless stated). However, you can have use them
for debug purposes (verification of results), as well as the tips from OpenCV tutorials on image
matching1 and homography estimation2 .
1
https://docs.opencv.org/4.x/dc/dc3/tutorial_py_matcher.html
2
https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homography.html
1
Computer Vision – Visual Perception Assignment II - Object Recognition & AR
We will use mostly PyTorch tensors and numpy arrays during this exercise. So do not hesitate
to check some definitions on the tutorials on PyTorch3 and for Numpy. There are also some
helpful “Cheat Sheets” of basic commands for Numpy and Scipy (Linear algebra in Python)4 . One
important operation we will for visualization is to make tensor conversions between Numpy and
Tensors in PyTorch. One useful operation we will often make is to convert from Numpy arrays to
torch tensors (and vice versa) as well as sending tensors between the CPU to GPU (if available):
1 import numpy as np
2 import cv2 as cv
3 import matplotlib . pyplot as plt
4 import torch
5
6 # Determine device to run on ( GPU vs CPU )
7 device = torch . device ( " cuda " if torch . cuda . is_available () else " cpu " )
8 print ( " Running tensors in " , device )
9
10 # numpy array to torch tensor
11 def t o_ to r ch _t en s or (x , device = " cpu " , dtype = torch . float32 , requires_grad = False ) :
12 # return torch . from_numpy ( x ) . to ( device )
13 return torch . tensor (x , dtype = dtype , requires_grad = requires_grad ) . to ( device )
14
15 # torch tensor to numpy
16 def to_nump y_array ( x ) :
17 return x . detach () . cpu () . numpy ()
We are proving the images in the folder data/, where the reference painting view is named
img ref.jpg.
1. Write a function extract features that receives an image and returns a list of detected keypoints
and the extracted features with SIFT on each image using SIFT default parameters.
2. Display the detected keypoints on the images. Hint: You can use for that the function
cv.drawKeypoints with flags=cv.DRAW MATCHES FLAGS DRAW RICH KEYPOINTS
2
Computer Vision – Visual Perception Assignment II - Object Recognition & AR
4. Is it a way of selecting the “best” keypoints? (hint: please look the fields in cv.Keypoint)
Modify your algorithm to return the N best points and then display the best 1000 points for
each image.
5. Please now change the parameters from SIFT to contrastThreshold=0.02, nfeatures=1000.
Are there differences regarding the number of detected keypoints using the default parame-
ters?
Once we have detected keypoints and extracted features related to them, we will now perform
the matching to find corresponding visual features. For each detector/descriptor:
1. Please write a function find matches that receives a pair of images, their corresponding key-
points and descriptors and that returns the found matches. For finding the correspondences
between the features using brute force matching (one to all). The distances between the
descriptors should be the Euclidean distance between the descriptions with SIFT features;
2. Implement yourself the 1NN/2NN ratio test for removing some ambiguous correspondences
inside the function find matches (check the slides of the course) and OpenCV matching tu-
torial for possible implementations5 ;
3. Please display the found correspondences between each pair of images inside the function
find matches. Hint: For displaying keypoints we can use cv.drawMatchesKnn with flags
= cv.DrawMatchesFlags NOT DRAW SINGLE POINTS. Your result should be something
similar to the following visualization between the img ref and img 2 :
1. Check in the course slides the model fitting constraints for fitting a 2D Homography trans-
formation model. How many corresponding 2D points are required?
5
https://docs.opencv.org/4.x/dc/dc3/tutorial_py_matcher.html
3
Computer Vision – Visual Perception Assignment II - Object Recognition & AR
2. The homography model fitting estimates a 3x3 matrix that handles correspondences between
planar points such as from two images (2D–2D). We will adopt again the linear DLT algorithm
(similar to the one used for camera Calibration). Implement the algorithm DLT following
the recipe of the following algorithm shown in Figure 1. What is the minimal number of
correspondences to find the homography transformation?
3. Implement your own RANSAC algorithm for estimating the homography matrix following
the RANSAC recipe algorithm shown in Figure 2.
4. Discuss and indicate the parameters you need to select in order to compute one iteration of
RANSAC to fit your descriptors matches. How many iterations would you select for finding
the model with either 95% and 99% confidence?
5. Replace the painting on each target image with the ESIREM logo using the estimated homo-
graphies from the previous step Hint: For warping the image with the homography and then
blending the images you can use/adjust the following function:
1 # Function to warp and replace texture into new image given an homography
tran sformati on
2 def r e n d e r _ w a r p e d _ t e x t u r e (H , img_ref , img_tgt , patch_texture ) :
3
4 # # warp patch texture that will be placed in the selected region with
5 # the estimated homography tran sformat ion H
6 h , w , _ = img_ref . shape
7 p a t c h _ t e x t u r e _ r e s = cv . resize ( patch_texture ,( w , h ) )
8 warped_patch = cv . w ar pP er s pe ct iv e ( patch_texture_res , H , ( target . shape [1] , target .
shape [0]) )
9 # # remove the pixels of the foreground that will be replaced ( we keep only
background )
10 mask = (255* np . ones (( h , w ) ) ) . astype ( np . uint8 )
11 mask_warped = cv . wa rp Pe r sp ec ti v e ( mask , H , ( img_tgt . shape [1] , img_tgt . shape [0]) ,
flags = cv . INTER_NEAREST )
12 m as k_ ba c kg ro un d = (1 - mask_warped /255) . astype ( np . uint8 )
13 m as k_ ba c kg ro un d = cv . merge (( mask_background , mask_background , m a sk _b ac k gr ou n d ) )
4
Computer Vision – Visual Perception Assignment II - Object Recognition & AR
Submission
Please return a PDF report explaining your reasoning and equation developments. You should
also submit your python notebook (commented) with the corresponding implementations, together
inside a [name] assign2 recognition.zip file (replacing [name] with your name ;-)). The submission
should be done via the teams channel of the course using teams assignment.
Deadline: 08/02/2023 at 23:59pm.
Note: This assignment is individual. Plagiarism such as copying the work from another source
(student, internet, etc.) will be awarded a 0 mark. In case there are multiple submissions with the
same work (or partial), each one will receive a 0.
References
[1] Hartley, Richard, and Andrew Zisserman. ”Multiple view geometry in computer vision.” (2003).