0% found this document useful (0 votes)
8 views

VSLAM

This document discusses monocular visual simultaneous localization and mapping (VSLAM). It begins by introducing VSLAM and how it differs from regular SLAM due to more complex motion and measurement models that must account for projecting 3D scenes onto 2D images. It then covers key concepts in VSLAM including pinhole camera models, epipolar geometry, the fundamental matrix F, and the essential matrix E. It explains how to compute F from point correspondences and refine it, and how E can be derived from F when camera calibration is known. The document provides information on using F and E to recover camera poses and reconstruct 3D structure.

Uploaded by

mini Samavedam
Copyright
© © All Rights Reserved
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

VSLAM

This document discusses monocular visual simultaneous localization and mapping (VSLAM). It begins by introducing VSLAM and how it differs from regular SLAM due to more complex motion and measurement models that must account for projecting 3D scenes onto 2D images. It then covers key concepts in VSLAM including pinhole camera models, epipolar geometry, the fundamental matrix F, and the essential matrix E. It explains how to compute F from point correspondences and refine it, and how E can be derived from F when camera calibration is known. The document provides information on using F and E to recover camera poses and reconstruct 3D structure.

Uploaded by

mini Samavedam
Copyright
© © All Rights Reserved
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 75

VSLAM / Monocular SLAM

Siddharth Tourani
K Madhava Krishna
Acknowledgements
CAIR
Siddharth
Marc Pollefys Slides
MVG Book of Hartley and Zisserman
Oxford Univ Slides of AZ’s group
Nikos Sunderhauf’s thesis
Slides of Shankar Shastry and Jana Koseca
Frank Dellart’s papers
The SLAM Taxonomy
The SLAM as a Graphical Model
SLAM as a Least Squares Problem
SLAM as a Least Squares

The above is a convenient form to solve by NLS methods such


as Gradient Descent/ Gauss Newton/LM
One can also introduce loop closure as a constraint in the
above such as in Pose Graph formalisms
SLAM as a Least Squares
However one can proceed to linearize it and get it into a least
squares form

Which can be further reduced to


SLAM as Least Squares
VSLAM vs SLAM
VSLAM is also essentially solved as a Non Linear Least Squares
also popularly termed as Bundle Adjustment then what is the
difference:

Motion models and measurement models are quite straightforward in


regular SLAM (Range Based SLAM)
Or initialization is straightforward in Range Based SLAM
VSLAM motion and measurement models are NON TRIVIAL
Since VSLAM projects 3D world onto a 2D image it encounters
variety of paradigms, formalisms and degeneracies most critically
the formalism based on Projective Geometry
Projective Geometry: The Pinhole Camera

Pinhole camera geometry. C is the camera centre and p the


principal point.
The camera centre is here placed at the coordinate origin.
Note the image plane is placed in front of the camera centre.
The Pinhole Camera

Note However:
(λX, λY, λZ)T also projects to (fλX/λZ, fλY/λZ)T OR
(fX/Z, fY/Z)T

Thus the pinhole camera projects any world point X along


the ray that passes through the camera center and the image
point x back again to x
The Pinhole Camera
The central projection in homogenous
coordinates:

Again note any λX will project to the same image


coordinate x
The Pinhole Camera

The above represents the central projection formulation


Again note any λX will project to the same image coordinate
x
K represents the internal camera calibration matrix
The Pinhole Camera
When camera frame is rotated and translated wrt world frame

Is a inhomogenous vector in world frame


Is camera location measured in world frame
then represents in camera frame ;
where R is the rotation of world wrt camera
The Pinhole Camera
When camera frame is rotated and translated wrt world frame
Model of the Projective Plane
Epipolar Geometry

Point correspondence geometry.


(a) The two cameras are indicated by their centres C and C and image planes.

The camera centres, 3-space point X, and its images x and x lie in a common
plane π.
(b) An image point x back-projects to a ray in 3-space defined by the first
camera centre, C, and x. This ray is imaged as a line l in the second view.
The 3-space point X which projects to x must lie on this ray, so the image of
X in the second view must lie on l.
Epipolar Geometry
Epipolar Geometry
The Fundamental Matrix

The above equation represents the solution for the back


projection problem of PX = x.

The first term on the RHS represents the particular solution

The second term on the right is the span of the Null Space for
PX = x has N of rank 1

The Null Space is spanned by the homogenous camera center


coordinates as PC equals the zero vector
The Fundamental Matrix
Two points imaged by the second camera:
The line that joins the above two points in the second camera:

The first point P’C represents the epipole e’ of the first camera
in the second image
Then

where F is the Fundamental Matrix


The Fundamental Matrix
F then satisfies
The above expression can be considered as one of the
landmark expressions in MVG and used ubiquitously in:
SLAM
Motion Segmentation
Feature Matching for Geometric Validation
Correspondence search
Computation of F
For a pair of corresponding points (x,y,1) and (x’, y’,1) in two
images we now have:

Represent this as an inner product


Computation of F

From a set of such n corresponding points

The above is a set of homogenous equations and A can have a


rank of at most 8 when correspondences are precise
And f can be determined only upto a scale
Computation of F
The non normalized algorithm
⎡ f11 ⎤
⎢f ⎥
⎢ 12 ⎥
⎢ f13 ⎥
⎡ x1 x1´ y1 x1´ x1´ x1 y1´ y1 y1´ y1´ x1 y1 1⎤ ⎢ ⎥
⎢x x ´ f 21 ⎥
⎢ 2 2 y 2 x2 ´ x2 ´ x2 y 2 ´ y2 y2´ y2´ x2 y2 1⎥⎥ ⎢
⎢ f 22 ⎥ = 0
⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ f 23 ⎥
⎣ xn xn ´ y n xn ´ xn ´ xn y n ´ yn yn ´ yn ´ xn yn 1⎦ ⎢
f 31 ⎥
~1000 ~1000 ~10 ~1000 ~1000 ~10 ~10 ~10 1⎢ ⎥
0 0 0 Orders
0 of magnitude
0 difference
0 0 0 ⎢ f 32 ⎥
⎢ ⎥
between column of data matrix f
! → least-squares yields poor results
⎣ 33 ⎦
Computation of F
Transform image to ~[-1,1]x[-1,1]

(0,50 (700,50 (- (1,


0) 0) ⎡ 2
0

− 1⎥ 1,1) 1)
⎢ 700
⎢ 2 ⎥
⎢ − 1⎥
⎢ 500 ⎥
⎢ 1⎥ (0,
⎢⎣ ⎥⎦
0)

(0, (700, (-1,- (1,-


0) 0) 1) 1)
normalized least squares yields good results (Hartley,
PAMI´97)
Computation of F
F has a non trivial Null Space as Fe = e’TF = 0 where e and e’ ≠
0
Computation of F
To Tackle Null Space Constraint

e'T F = 0 Fe = 0 detF = 0 rank F = 2

SVD from linearly computed F matrix (rank 3)

⎡σ1 ⎤
F = U⎢ σ2 ⎥ V T = U1σ1V1T + U 2 σ 2 V2T + U 3σ 3V3T
⎢ σ 3 ⎥⎦

Compute closest rank-2 approximation min F - F' F

⎡σ1 ⎤
F' = U ⎢ σ 2 ⎥ V T = U1σ1V1T + U 2 σ 2 V2T
⎢ 0⎥⎦

Computation of F
Further refinement of F is done over n correspondences by
iterating over RANSAC and finding F for the best possible
inliers

Other methods of refinement include using the normalized 8


point algorithm as an initialization of F and further refine it by
estimating through a cost function involving reprojection error
that is minimized by LM algorithm
Computation of F

From a pair of such images obtain correspondences typically with SIFT


Obtain F by 8 point algorithm with singularity constraint or 7 point algorithm
Refine with RANSAC
The Essential Matrix

When camera calibration, K, is known we can get rid of the cameras by


multiplying the image points with inv(K) and getting the image in normalized
coordinates
The Essential Matrix
Epipolar Constraint: Calibrated
Case

Essential Matrix
(Longuet-Higgins, 1981)
Properties of the Essential
Matrix

E p’ is theTepipolar line associated with


p’.
ETp is the epipolar line associated with
p.
E e’=0 and ETe=0.
E is singular.
E has two equal non-zero singular
values
(Huang and Faugeras, 1989).
Another way of finding E is from F as
Pose Recovery from E

The four possible solutions for calibrated reconstruction from E. Between the left and right sides there is a baseline reversal.
Between the top and bottom rows camera B rotates 180◦ about the baseline. Note, only in (a) is the reconstructed point in front of both
cameras.
Camera Recovery from F
Cameras can be recovered only upto a projective transform:
In other words given F we cannot recover the cameras whose locations
relative to each other is known upto a scale or similarity transform

Cannot extract motion, structure and calibration from one


fundamental matrix (two views)

allows reconstruction up to a projective transformation (as we will see


soon)

encodes all the geometric information among two views when no


additional information is available
Camera Recovery from F
The general formula for a pair of canonic camera matrices corresponding
to a fundamental matrix F is given by
F vs E or Calibrated vs Uncalibrated
F vs E
3D Reconstruction from Cameras
3D Reconstruction from Cameras
3D Reconstruction from Cameras
3D Reconstruction from Cameras
3D Reconstruction from Cameras
3D Reconstruction from Cameras
3D Reconstruction from Cameras
Projective reconstruction ambiguity from a pair of cameras.
The cup in the center is the true cup.
3D Reconstruction from Cameras
Upto scale reconstruction with E, when K is known
The translation magnitude is unknown between cameras,
however its direction is known
3D Reconstruction from Cameras
When both K and magnitude of T is known
Metric reconstruction is obtained by Trinagulation
Similar to Stereo
3D Reconstruction from Cameras
Left Right Object
Camera Camera Px(x,y,z)
Principl Principl
e e By similar triangle,
axis axis w.r.t left camera lens
center '
x xl
z = ,
z f
( x - b) x'r
Left Right
=
Image Image
z f
X X
plane ’l ’r plane f •b
Focal elimate x ⇒ z =
Lengt ( xl' - x'r )
h
f
b (Baseline) By similar triangle,
Left camera center
(reference point) Horizontal w.r.t right camera lens
Disparity=xL-xR
center
3D computer vision techniques v.4b2

One major problem is to locate x’l and x’r The correspondence problem
Camera Resectioning
Camera Resectioning
Camera Resectioning
The Bundle Adjustment: BA
The routine projection equation, f0 is typically set as unity

The camera calibration matrix K is assumed to be known. They


are typically calibrated by a separate calibration procedure
The BA
One can rewrite the central projection equation as

Suppose we take M images of N points (Xα, Yα,Zα), α = 1, ...,


N, in the scene.
Let (xακ, yακ) be the projection of the αth point onto the κth image.
Let Pκ be the projection matrix of the κth image.
The BA
We define the reprojection error as the sum square of
differences between the predicted projection of world points
and their observed locations summed over N points and
across M images

where Iακ is the visibility index , taking 1 if the αth point is


visible in the κth image and 0 otherwise.
The BA
Define

BA tries to refine the estimates of the projection matrices and


3D points such that the reprojection error goes to zero or
converges at a very low value
The BA
The BA
Expressing the correction of Rκ needs care.

The orthogonality relationship RκRκT = I imposes three


constraints on the nine elements of Rκ, so Rκ has three degrees
of freedom.

From RκRκT = I we get

And hence we have a skew symmetric matrix in


The BA
And hence

From (6) we can obtain (7) below


The BA
The BA
The Jacobians then take the form:

The Hessian is approximated as the transpose of the Jacobian


times Jacobian in LM
The BA
And hence Hessian is approximated as

Thus both J and JTJ have only single derivative terms


The BA
Derivatives of 3D positions
The BA
Similarly the derivatives with respect to translation
components, principal point, focal length, rotation
components are all evaluated

Finally the LM algorithm is used to compute the update rule


for the state vector
The BA
The BA
The BA
The BA
The BA with Constraints [IROS 2015]
The BA with Constraints [IROS 2015]
The BA with Constraints [IROS 2015]
The BA with Constraints [IROS 2015]
The BA with Constraints [IROS 2015]
The BA with Constraints [IROS 2015]
Results

You might also like