0% found this document useful (0 votes)
89 views

CV Assignment 2 RecognitionAR

1. This document describes an assignment on object recognition and augmented reality using homographies. Students will locate a reference object in other images and estimate homography transformations to map between views. 2. The steps include detecting keypoints, extracting SIFT features, matching features between images, estimating homographies with RANSAC, and using the homographies to replace the object with a new texture. 3. OpenCV will be used to extract features and find initial matches, but students must implement homography estimation with DLT and RANSAC, and warping textures with the estimated homographies.

Uploaded by

Paul Brgr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

CV Assignment 2 RecognitionAR

1. This document describes an assignment on object recognition and augmented reality using homographies. Students will locate a reference object in other images and estimate homography transformations to map between views. 2. The steps include detecting keypoints, extracting SIFT features, matching features between images, estimating homographies with RANSAC, and using the homographies to replace the object with a new texture. 3. OpenCV will be used to extract features and find initial matches, but students must implement homography estimation with DLT and RANSAC, and warping textures with the estimated homographies.

Uploaded by

Paul Brgr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computer Vision – Visual Perception

MSCV VIBOT & ESIREM Robotique


Renato Martins - Universite de Bourgogne, January 2023

Assignment II – Object Recognition &


Augmented Reality with Homographies

In this assignment, we will practice object recognition and robust model fitting with RANSAC
for homography estimation. We want to locate a given reference object in another image of a scene
which contains that object (and eventually many other objects). Applications are on augmented
reality or in a robotics context of we want to make a robot containing a camera to recognize and
locate a given object in space to grasp it. We will start considering planar objects, and very salient
objects to find, such as a painting. For instance, you will try to locate the painting “Nuit étoilée”
of Van Gogh (the reference image, shown in the left most picture) in other publicly available image
views in the MoMA museum (first row). We will then locate and fit an homography transformation
between the reference view of the painting and its new locations in the other views and then replace
it with another picture...such as the logo of ESIREM (second row of images) as follows:

For doing that automatically with the concepts seen in the class, we first will detect keypoints, ex-
tract features and then match the points between the different images. Then, from these candidate
point correspondences, you will use RANSAC (and implement it) to estimate a robust transfor-
mation between the reference painting view (left most image) and the different target views, such
as to be robust to wrong matches. Many of the steps of this assignment can be performed with
tensors in PyTorch (for you to practice). Please notice you are NOT allowed to use any high level
OpenCV implementation during this exercise (unless stated). However, you can have use them
for debug purposes (verification of results), as well as the tips from OpenCV tutorials on image
matching1 and homography estimation2 .
1
https://docs.opencv.org/4.x/dc/dc3/tutorial_py_matcher.html
2
https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homography.html

1
Computer Vision – Visual Perception Assignment II - Object Recognition & AR

Part I - Environment Setup & Useful Commands


Please create a notebook on Colab or follow these steps to configure your conda environment if you
are worning in your local machine:
1 >> conda create -n cv_recog nition
2 >> conda activate cv_r ecogniti on
3 >> conda install jupyter
4 # Go your workspace directory of the course and run jupyter
5 >> jupyter - notebook

We will use mostly PyTorch tensors and numpy arrays during this exercise. So do not hesitate
to check some definitions on the tutorials on PyTorch3 and for Numpy. There are also some
helpful “Cheat Sheets” of basic commands for Numpy and Scipy (Linear algebra in Python)4 . One
important operation we will for visualization is to make tensor conversions between Numpy and
Tensors in PyTorch. One useful operation we will often make is to convert from Numpy arrays to
torch tensors (and vice versa) as well as sending tensors between the CPU to GPU (if available):
1 import numpy as np
2 import cv2 as cv
3 import matplotlib . pyplot as plt
4 import torch
5
6 # Determine device to run on ( GPU vs CPU )
7 device = torch . device ( " cuda " if torch . cuda . is_available () else " cpu " )
8 print ( " Running tensors in " , device )
9
10 # numpy array to torch tensor
11 def t o_ to r ch _t en s or (x , device = " cpu " , dtype = torch . float32 , requires_grad = False ) :
12 # return torch . from_numpy ( x ) . to ( device )
13 return torch . tensor (x , dtype = dtype , requires_grad = requires_grad ) . to ( device )
14
15 # torch tensor to numpy
16 def to_nump y_array ( x ) :
17 return x . detach () . cpu () . numpy ()

We are proving the images in the folder data/, where the reference painting view is named
img ref.jpg.

Part II - Extracting Features and Matching using OpenCV


To establish correspondences between the objects in the images, we will detect and match keypoints.
Keypoints are points in the image which are expected to be reliably matched, such as corner points
or blobs (as seen in the class). In this first part of the exercise, you ARE ALLOWED to use OpenCV
functions (detectors and descriptors). There are many different keypoint detectors implemented in
OpenCV. We will test with SIFT detector and descriptor:

1. Write a function extract features that receives an image and returns a list of detected keypoints
and the extracted features with SIFT on each image using SIFT default parameters.

2. Display the detected keypoints on the images. Hint: You can use for that the function
cv.drawKeypoints with flags=cv.DRAW MATCHES FLAGS DRAW RICH KEYPOINTS

3. How many points do you get in each image?


3
https://pytorch.org/tutorials/
4
Scipy: http://datacamp-community-prod.s3.amazonaws.com/dfdb6d58-e044-4b38-bab3-5de0b825909b
Numpy: http://datacamp-community-prod.s3.amazonaws.com/ba1fe95a-8b70-4d2f-95b0-bc954e9071b0

2
Computer Vision – Visual Perception Assignment II - Object Recognition & AR

4. Is it a way of selecting the “best” keypoints? (hint: please look the fields in cv.Keypoint)
Modify your algorithm to return the N best points and then display the best 1000 points for
each image.
5. Please now change the parameters from SIFT to contrastThreshold=0.02, nfeatures=1000.
Are there differences regarding the number of detected keypoints using the default parame-
ters?

Once we have detected keypoints and extracted features related to them, we will now perform
the matching to find corresponding visual features. For each detector/descriptor:
1. Please write a function find matches that receives a pair of images, their corresponding key-
points and descriptors and that returns the found matches. For finding the correspondences
between the features using brute force matching (one to all). The distances between the
descriptors should be the Euclidean distance between the descriptions with SIFT features;
2. Implement yourself the 1NN/2NN ratio test for removing some ambiguous correspondences
inside the function find matches (check the slides of the course) and OpenCV matching tu-
torial for possible implementations5 ;
3. Please display the found correspondences between each pair of images inside the function
find matches. Hint: For displaying keypoints we can use cv.drawMatchesKnn with flags
= cv.DrawMatchesFlags NOT DRAW SINGLE POINTS. Your result should be something
similar to the following visualization between the img ref and img 2 :

Part III - Robust 2D Homography Model Fitting


We will now estimate a transformation between each two images based on the found matches in the
previous step. Since we are observing a planar object this transformation can be represented by
an homography. From the corresponding features between the image of the object and the image
of the scene, and based on the matched features, we will then be capable of finding the object’s
position in the different views.

1. Check in the course slides the model fitting constraints for fitting a 2D Homography trans-
formation model. How many corresponding 2D points are required?
5
https://docs.opencv.org/4.x/dc/dc3/tutorial_py_matcher.html

3
Computer Vision – Visual Perception Assignment II - Object Recognition & AR

2. The homography model fitting estimates a 3x3 matrix that handles correspondences between
planar points such as from two images (2D–2D). We will adopt again the linear DLT algorithm
(similar to the one used for camera Calibration). Implement the algorithm DLT following
the recipe of the following algorithm shown in Figure 1. What is the minimal number of
correspondences to find the homography transformation?

Figure 1: Homography estimation using DLT recipe.

3. Implement your own RANSAC algorithm for estimating the homography matrix following
the RANSAC recipe algorithm shown in Figure 2.

4. Discuss and indicate the parameters you need to select in order to compute one iteration of
RANSAC to fit your descriptors matches. How many iterations would you select for finding
the model with either 95% and 99% confidence?

5. Replace the painting on each target image with the ESIREM logo using the estimated homo-
graphies from the previous step Hint: For warping the image with the homography and then
blending the images you can use/adjust the following function:
1 # Function to warp and replace texture into new image given an homography
tran sformati on
2 def r e n d e r _ w a r p e d _ t e x t u r e (H , img_ref , img_tgt , patch_texture ) :
3
4 # # warp patch texture that will be placed in the selected region with
5 # the estimated homography tran sformat ion H
6 h , w , _ = img_ref . shape
7 p a t c h _ t e x t u r e _ r e s = cv . resize ( patch_texture ,( w , h ) )
8 warped_patch = cv . w ar pP er s pe ct iv e ( patch_texture_res , H , ( target . shape [1] , target .
shape [0]) )
9 # # remove the pixels of the foreground that will be replaced ( we keep only
background )
10 mask = (255* np . ones (( h , w ) ) ) . astype ( np . uint8 )
11 mask_warped = cv . wa rp Pe r sp ec ti v e ( mask , H , ( img_tgt . shape [1] , img_tgt . shape [0]) ,
flags = cv . INTER_NEAREST )
12 m as k_ ba c kg ro un d = (1 - mask_warped /255) . astype ( np . uint8 )
13 m as k_ ba c kg ro un d = cv . merge (( mask_background , mask_background , m a sk _b ac k gr ou n d ) )

4
Computer Vision – Visual Perception Assignment II - Object Recognition & AR

Figure 2: RANSAC algorithm recipe.

14 background = cv . multiply ( mask_background , img_tgt )


15
16 # blend images to a single frame and display
17 blend_img = cv . add ( background , warped_patch )
18 plt . figure ( figsize =(20 ,10) )
19 plt . imshow ( cv . cvtColor ( blend_img , cv . COLOR_BGR2RGB ) )
20 plt . axis ( ’ off ’)
21 plt . show ()

6. Compare your homography solution with the OpenCV implementation cv.findHomography


(with the RANSAC flag deactivated), and using your own RANSAC routine in the estimation.
7. (EXTRA POINTS) Perform the previous items with FAST detector and ORB descriptor.
What can you observe of the performance between SIFT wrt ORB features for these images?

Submission
Please return a PDF report explaining your reasoning and equation developments. You should
also submit your python notebook (commented) with the corresponding implementations, together
inside a [name] assign2 recognition.zip file (replacing [name] with your name ;-)). The submission
should be done via the teams channel of the course using teams assignment.
Deadline: 08/02/2023 at 23:59pm.
Note: This assignment is individual. Plagiarism such as copying the work from another source
(student, internet, etc.) will be awarded a 0 mark. In case there are multiple submissions with the
same work (or partial), each one will receive a 0.

References
[1] Hartley, Richard, and Andrew Zisserman. ”Multiple view geometry in computer vision.” (2003).

You might also like