Lec 17
Lec 17
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
• Relative Orientation 2D ←→ 2D
• Exterior Orientation 2D ←→ 3D
• Intrinsic Orientation 3D ←→ 2D
Below we discuss each of these problems at a high level. We will be discussing these problems in greater depth later in this and
following lectures.
2. Given one 3D sensor, such as a lidar sensor, and two objects (note that this could be two distinct objects at a single point
in time, or a single object at two distinct points in time), our goal is to find the transformation, or pose, between the
two objects.
2. Given two cameras, our goal is to find the (relative) transformation, or pose, between the two cameras.
1
1.1.3 Exterior Orientation
This photogrammetry problem aims from going 2D −→ 3D. One common example in robotics (and other field related to machine
vision) is localization, in which a robotic agent must find their location/orientation on a map given 2D information from a
camera (as well as, possibly, 3D laser scan measurements).
More generally, with localization, our goal is to find where we are and how we are oriented in space given a 2D image
and a 3D model of the world.
Figure 1: The problem of binocular stereo. Having two 2D sensors enables us to recover 3D structure (specifically, depth) from
the scene we image. A few key terms/technicalities to note here: (i) The origin is set to be halfway between the two cameras,
(ii) The distance between the cameras is called the baseline, and (iii) binocular disparity refers to the phenomenon of having
each camera generate a different image. Finally, note that in practice, it is almost impossible to line up these cameras exactly.
Goal: Calculate X and Z. This would not be possible with only monocular stereo (monocular ≈ one sensor). Using similar
triangles, we have:
b
X− 2 xl
=
Z f
b
X+ 2 xr
=
Z f
x r − xl b bf
−→ = =⇒ z =
f z xr − xl
Where the binocular disparity is given by (xr − xl ). Solving this system of equations becomes:
xr + xl 1
X=b , Z = bf
xr − xl xr − xl
Note that, more generally, we can calculate in the same way as well! From these equations, we can see that:
2
• Increasing the baseline increases X and Z (all things equal)
• Increasing the focal length increases Z.
But it is important to also be mindful of system constraints that determine what range, or the exact value, of what these variables
can be. For instance, if we have a self-driving car, we cannot simply have the baseline distance between our two cameras be 100
meters, because this would require mounting the cameras 100 meters apart.
Figure 2: General case of absolute orientation: Given the coordinate systems (xl , yl , zl ) ∈ R3×3 and (xr , yr , zr ) ∈ R3×3 , our goal
is to find the transformation, or pose, between them using points measured in each frame of reference pi .
Here, we also note the following, as they are important for establishing that we are not limited just to finding the transfor-
mation between two camera sensors:
• “Two cameras” does not just mean having two distinct cameras (known as “two-view geometry”); this could also refer to
having a single camera with images taken at two distinct points in time (known as “Visual Odometry” or “VO”).
• Note also that we could have the same scenario described above when introducing this problem, which is that we have
either one camera and multiple objects, or multiple objects and one cameras. Hence, there is a sense of duality here with
solving these problems:
1. One camera, two objects?
2. Two cameras, one object (Two-View Geometry)?
3. Camera moving (VO)?
4. Object moving?
To better understand this problem, it is first important to precisely define the transformation or pose (translation + rotation)
between the two cameras.
3
What form does our transformation take? Since we are considering these transformations as a transformation determined by
translation and rotation, then we can write our transformation as such:
rr = R(rl ) + r0
Where:
• R describes the rotation. Note that this is not necessarily always an orthonormal rotation matrix, and we have more
generally parameterized it as a function.
• The translation vector and the rotation function comprise our unknowns.
When R is described by an orthonormal rotation matrix R, then we require this matrix to have the following properties:
2. R is skew-symmetric, so we end up having 3 unknowns instead of 9. Skew-symmetric matrices take the form:
⎡ ⎤
a b c
R = ⎣−b d e ⎦ ∈ R3×3
−c −e f
Supposing we have clusters of points from the left and righthand side, and we want to align clusters of points as closely as
possible. With a mechanical spring analog, we have the force (given by Hooke’s law), and consequently the energy:
F = Ke (1)
Z
1
E = F = ke2 (2)
2
Where e (the “error”) is the distance between the measured point in the two frames of reference. Therefore, the solution to
this problem involves “minimizing the energy” of the system. Interestingly, energy minimization is analogous to least squares
regression.
Using the definition of our transformation, our error for the ith point is given by:
ei = (R(rl,i ) + r0 ) − rr,i
Then our objective for energy minimization and least squares regression is given by:
N
X
min ||ri ||22
R,r0
i=1
This is another instance of solving what is known as the “inverse” problem. The “forward” and “inverse” problems are given by:
Now that we have framed our optimization problem, can we decouple the optimization over translation and rotation? It turns
out we can by setting an initial reference point. For this, we consider two methods.
4
1.2.4 Procedure - “Method 1”
:
1. Pick a measured point from one set of measured points as the origin.
2. Take a second point, look at the distance between them, and compute the unit vector. Take unit vector as one axis of the
system:
xl
xl = rl,2 − rl,1 −→ x̂l =
||xl ||2
3. Take a third vector, and compute the component of the vector that is equal to the vector from point 1 to point 2. Now,
points 1, 2, and 3 from (x, y) plane, and the removed component from point 3 forms the y-component of the coordinate
system.
(Then we also have that ẑl ·ŷl = 0 and ẑl ·x̂l = 0. This then defines a coordinate system (x̂l , ŷl , ẑl ) for the left camera/point
of reference. Note that this only requires 3 points!
6. To calculate this for the righthand frame of reference, we can repeat steps 1-5 for the righthand side to obtain the coordinate
system (x̂r , ŷr , ẑr ).
From here, all we need to do is find the transformation (rotation, since we have artificially set the origin) between the coordinate
system (x̂r , ŷr , ẑr ) and (x̂l , ŷl , ẑl ). Mathematically, we have the following equations:
x̂r = R(x̂l )
ŷr = R(ŷl )
ẑr = R(ẑl )
We can condense these equations into a matrix equation, and subsequently a matrix inversion problem:
x̂r ŷr ẑr = R x̂l ŷl ẑl
• Is this system invertible? Yes, because by construction these vectors are orthogonal to one another.
It turns out this system “favors” points over one another (usually sub-optimal).
• This is a somewhat ad hoc method - this could be useful for deriving a fast approximation or initial estimate, but this
method is definitely suboptimal.
5
1.2.5 Procedure - “Method 2”
For this algorithm, recall our prior work with blob analysis. We can look at axes of inertia (see the figure below), where the
inertia of the “blob” is given by:
ZZ
I= r2 dm
O
How do we generalize this from 2D to 3D? How do we figure out these axes of inertia (see the example below)?
ZZZ
I= r2 dm
O
Figure 4: Computing the axes of inertia for a 3D blob - we can generalize the notion of inertia from 2D to 3D.
One trick we can use here is using the centroid as the origin.
r0 = (r · ω̂)ω̂
6
Then:
r − r0 = r − (r · ω̂)ω̂
r2 = (r − r0 ) · (r − r0 )
= r · r − 2(r · ω̂)2 + (r · ω̂)2 (ω̂ · ω̂)
= r · r − 2(r · ω̂)2 + (r · ω̂)2
= r · r − (r · ω̂)2
(r · r) = (r · r)(ω̂ · ω̂)
ω T I3 ω̂
= (r · r)ˆ
From this expression, we want to find the extrema. We can solve for the extrema by solving for the minimium and maximums
of this objective:
ˆ T Aω̂
1. Minimum: minω̂ ω
2. Maximum: maxω̂ ω ˆ T Aω̂
Δ RRR
Where A = O
((r · r)I3 − rrT )dm. This matrix is known as the “inertia matrix”.
How can we solve this problem? We can do so by looking for the eigenvectors of the inertia matrix:
• For minimization, the eigenvector corresponding to the smallest eigenvalue of the inertia matrix corresponds to our
solution.
• For maximization, the eigenvector corresponding to the largest eigenvalue of the inertia matrix corresponds to our
solution.
• For finding the saddle point, our solution will be the eigenvector corresponding to the middle eigenvalue.
7
Since this is a polynomial system of degree 3, we have a closed-form solution! These three eigenvectors will form a coordinate
system for the lefthand system.
Taking a step back, let us look at what we have done so far. We have taken the cloud of points from the left frame of
reference/coordinate system and have estimated a coordinate system for it by finding an eigenbasis from solving these opti-
ˆ T Aω̂. With this, we can then repeat the same process for the righthand system.
mization problems over the objective ω
• This approach fails under symmetry - i.e. of the inertia is hte same in all directions (for shapes such as spheres, polyhedra,
octahedra, cube, etc.)
• “Double-Edged Sword” - using correspondences can provide you with more information, but can run into issues with
symmetry.
4. Rotations preserve triple products: [R(a) R(b) R(c)] = [a b c] (Where the triple product [a b c] = a · (b × c)).
Using these properties, we are now ready to set this up as least squares, using correspondences between points measured between
two coordinate systems:
Transform: rr = R(rl ) + r0
And we can write our optimization as one that minimizes this error term:
N
X
R∗ , r∗0 ||ei ||2
i=1
Next, we can compute the centroids of the left and right systems:
N N
1 X 1 X
r̄l = rl,i , r̄r = rr,i
N i=1 N i=1
We can use these computed centroids from points so we do not have to worry about translation. A new feature of this system
is that the new centroid is at the origin. To prove this, let us “de-mean” (subtract the mean) of our coordinates in the left and
righthand coordinate systems:
Because we subtract the mean, the mean of these new points now becomes zero:
N N
1 X 0 1 X 0
r̂0l = rl,i = 0 = r̂0r = r
N i=1 N i=1 r,i
8
Substituting this back into the objective, we can solve for an optimal rotation R:
N
X
R∗ = min0 ||r0i − R(r0l,i − r00 )||22
R,r0
i=1
N
X N
X
= min0 ||r0i − R(rl,i
0
)||22 − 2r00 (r0r,i − R(rl,i )) + N ||r00 ||22
R,r0
i=1 i=1
N
X
= min0 ||r0i − R(rl,i
0
)||22 + N ||r00 ||22
R,r0
i=1
Since only the last term depends on r00 , we can set r00 to minimize. Moreover, we can solve for the true r0 by back-solving later:
r0 = r̄r − R(r̄l )
Intuitively, this makes sense: the translation vector between these two coordinate systems/point clouds is the difference between
the centroid of the right point cloud and the centroid of the left point cloud after it has been rotated.
Since we now have that r00 = 0 ∈ R3 , we can write our error term as:
ei = r0r,i − R(r0l,i )
Where the first of these terms is fixed, the second of these terms is to be maximized (since there is a negative sign in front of
this term, this thereby minimizes the objective), and the third of these terms is fixed. Therefore, our rotation problem can be
simplified to:
N
X N
X
min ||ei ||22 = −2 (r0r,i − R(r0l,i )) (3)
i=1 i=1
d
To solve this objective, we could take the derivative dR (·) of the objective, but constraints make this optimization problem
difficult to solve (note that we are not optimizing over a Euclidean search space - rather, we are optimizing over a generalized
transformation). We will look into a different representation of R for next class to find solutions to this problem!
1.3 References
1. Bundle Adjustment, ____________________________________________
https://en.wikipedia.org/wiki/Bundle adjustment
9
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms