0% found this document useful (0 votes)
10 views

Lec 17

This document provides an overview of photogrammetry problems and absolute orientation. It discusses four photogrammetry problems: absolute orientation, relative orientation, exterior orientation, and interior orientation. For absolute orientation, it describes the general problem of determining the transformation between two coordinate systems using corresponding points. It also discusses binocular stereo as a specific example and defines transformations and poses between systems using rotation matrices.

Uploaded by

stathiss11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lec 17

This document provides an overview of photogrammetry problems and absolute orientation. It discusses four photogrammetry problems: absolute orientation, relative orientation, exterior orientation, and interior orientation. For absolute orientation, it describes the general problem of determining the transformation between two coordinate systems using corresponding points. It also discusses binocular stereo as a specific example and defines transformations and poses between systems using rotation matrices.

Uploaded by

stathiss11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

6.801/6.

866: Machine Vision, Lecture 17

Professor Berthold Horn, Ryan Sander, Tadayuki Yoshitake


MIT Department of Electrical Engineering and Computer Science
Fall 2020

These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.

1 Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Ab-


solute, Relative, Interior, and Exterior Orientation
This lecture marks a transition in what we have covered so far in the course from lower-level machine vision to higher-level
problems, beginning with photogrammetry.

Photogrammetry: “Measurements from Imagery”, for example, map-making.

1.1 Photogrammetry Problems: An Overview


Four important problems in photogrammetry that we will cover are:
• Absolute Orientation 3D ←→ 3D

• Relative Orientation 2D ←→ 2D
• Exterior Orientation 2D ←→ 3D
• Intrinsic Orientation 3D ←→ 2D
Below we discuss each of these problems at a high level. We will be discussing these problems in greater depth later in this and
following lectures.

1.1.1 Absolute Orientation


We will start with covering absolute orientation. This problem asks about the relationship between two or more objects
(cameras, points, other sensors) in 3D. Some examples of this problem include:
1. Given two 3D sensors, such as lidar (light detection and ranging) sensors, our goal is to find the transformation, or pose,
between these two sensors.

2. Given one 3D sensor, such as a lidar sensor, and two objects (note that this could be two distinct objects at a single point
in time, or a single object at two distinct points in time), our goal is to find the transformation, or pose, between the
two objects.

1.1.2 Relative Orientation


This problem asks how we can find the 2D ←→ 2D relationship between two objects, such as cameras, points, or other sensors.
This type of problem comes up frequently in machine vision, for instance, binocular stereo. Two high-level applications
include:
1. Given two cameras/images that these cameras take, our goal is to extract 3D information by finding the relationship
between two 2D images.

2. Given two cameras, our goal is to find the (relative) transformation, or pose, between the two cameras.

1
1.1.3 Exterior Orientation
This photogrammetry problem aims from going 2D −→ 3D. One common example in robotics (and other field related to machine
vision) is localization, in which a robotic agent must find their location/orientation on a map given 2D information from a
camera (as well as, possibly, 3D laser scan measurements).

More generally, with localization, our goal is to find where we are and how we are oriented in space given a 2D image
and a 3D model of the world.

1.1.4 Interior Orientation


This photogrammetry problem aims from going 3D −→ 2D. The most common application of this problem is camera calibra-
tion. Camera calibration is crucial for high-precision imaging, as well as solving machine and computer vision problems such as
Bundle Adjustment [1]. Finding a principal point is another example of the interior orientation problem.

1.2 Absolute Orientation


We will begin our in-depth discussion and analysis of these four problems with absolute orientation. Let us start by introducing
binocular stereo.

1.2.1 Binocular Stereopsis


To motivate binocular stereo, we will start with a fun fact: Humans have ≈ 12 depth cues. One of these is binocular stereopsis,
or binocular stereo (binocular ≈ two sensors).

Binocular stereo is given by the figure below:

Figure 1: The problem of binocular stereo. Having two 2D sensors enables us to recover 3D structure (specifically, depth) from
the scene we image. A few key terms/technicalities to note here: (i) The origin is set to be halfway between the two cameras,
(ii) The distance between the cameras is called the baseline, and (iii) binocular disparity refers to the phenomenon of having
each camera generate a different image. Finally, note that in practice, it is almost impossible to line up these cameras exactly.

Goal: Calculate X and Z. This would not be possible with only monocular stereo (monocular ≈ one sensor). Using similar
triangles, we have:
b
X− 2 xl
=
Z f
b
X+ 2 xr
=
Z f
x r − xl b bf
−→ = =⇒ z =
f z xr − xl
Where the binocular disparity is given by (xr − xl ). Solving this system of equations becomes:
xr + xl 1
X=b , Z = bf
xr − xl xr − xl
Note that, more generally, we can calculate in the same way as well! From these equations, we can see that:

2
• Increasing the baseline increases X and Z (all things equal)
• Increasing the focal length increases Z.
But it is important to also be mindful of system constraints that determine what range, or the exact value, of what these variables
can be. For instance, if we have a self-driving car, we cannot simply have the baseline distance between our two cameras be 100
meters, because this would require mounting the cameras 100 meters apart.

1.2.2 General Case


All of the methods we have looked at so far assume we have already found “interesting points”, for instance, through feature
detection algorithms (also known as “descriptors”) such as SIFT, SURF, ORB, or FAST+BRIEF. In the general case, absolute
orientation involves having a point in space that is measured by two separate frames of reference (could be a camera, or another
sensor), and the goal is to find the closest approach between coordinate systems, a.k.a. the shortest line that connects them.
We will arbitrarily assign the 3 x 3 matrix (xl , yl , zl ) ∈ R3×3 to refer to a “lefthanded” coordinate system, and 3 x 3 matrix
(xr , yr , zr ) ∈ R3×3 to refer to a “righthanded” coordinate system, where our goal is to find the transformation between these
two coordinate systems. The diagram below illustrates this:

Figure 2: General case of absolute orientation: Given the coordinate systems (xl , yl , zl ) ∈ R3×3 and (xr , yr , zr ) ∈ R3×3 , our goal
is to find the transformation, or pose, between them using points measured in each frame of reference pi .

Here, we also note the following, as they are important for establishing that we are not limited just to finding the transfor-
mation between two camera sensors:
• “Two cameras” does not just mean having two distinct cameras (known as “two-view geometry”); this could also refer to
having a single camera with images taken at two distinct points in time (known as “Visual Odometry” or “VO”).
• Note also that we could have the same scenario described above when introducing this problem, which is that we have
either one camera and multiple objects, or multiple objects and one cameras. Hence, there is a sense of duality here with
solving these problems:
1. One camera, two objects?
2. Two cameras, one object (Two-View Geometry)?
3. Camera moving (VO)?
4. Object moving?
To better understand this problem, it is first important to precisely define the transformation or pose (translation + rotation)
between the two cameras.

1.2.3 Transformations and Poses


Recall that in 3D, we have 6 degrees of freedom (DOF), of which 3 are for translation, and 3 are for rotation. In the general
case of a d-dimensional system, we will have d translational degrees of freedom, and d(d−1) 2 rotational degrees of freedom
Pd−1
(which, interestingly, is also equal to i=0 i. Therefore, for a d-dimensional system, the total degrees of freedom from translation
and rotation:
d(d − 1)
DOFT, R (d) = d +
2

3
What form does our transformation take? Since we are considering these transformations as a transformation determined by
translation and rotation, then we can write our transformation as such:

rr = R(rl ) + r0

Where:

• R describes the rotation. Note that this is not necessarily always an orthonormal rotation matrix, and we have more
generally parameterized it as a function.

• r0 ∈ R3 describes the translation.

• The translation vector and the rotation function comprise our unknowns.

When R is described by an orthonormal rotation matrix R, then we require this matrix to have the following properties:

1. RT R = RRT = I, i.e. RT = R−1

2. R is skew-symmetric, so we end up having 3 unknowns instead of 9. Skew-symmetric matrices take the form:
⎡ ⎤
a b c
R = ⎣−b d e ⎦ ∈ R3×3
−c −e f

3. det |R| = +1 (This constraint is needed to eliminate reflections.)


4. R ∈ SO(3) (Known as the Special Orthogonal group). One note about this: optimization over the Special Orthogonal
group (e.g. for estimating optimal rotation matrices) can be difficult, and one way that this can be done is via Singular
Value Decomposition (SVD) from this group/manifold onto some Euclidean space Rd where d < 3.

Supposing we have clusters of points from the left and righthand side, and we want to align clusters of points as closely as
possible. With a mechanical spring analog, we have the force (given by Hooke’s law), and consequently the energy:

F = Ke (1)
Z
1
E = F = ke2 (2)
2

Where e (the “error”) is the distance between the measured point in the two frames of reference. Therefore, the solution to
this problem involves “minimizing the energy” of the system. Interestingly, energy minimization is analogous to least squares
regression.

Using the definition of our transformation, our error for the ith point is given by:

ei = (R(rl,i ) + r0 ) − rr,i

Then our objective for energy minimization and least squares regression is given by:
N
X
min ||ri ||22
R,r0
i=1

This is another instance of solving what is known as the “inverse” problem. The “forward” and “inverse” problems are given by:

• Forward Problem: R, r0 −→ {rr,i , rl,i }N


i=1 (Find correspondences)

• Inverse Problem: rl,i }N


i=1 −→ R, r0 (Find transformation)

Now that we have framed our optimization problem, can we decouple the optimization over translation and rotation? It turns
out we can by setting an initial reference point. For this, we consider two methods.

4
1.2.4 Procedure - “Method 1”
:

1. Pick a measured point from one set of measured points as the origin.

2. Take a second point, look at the distance between them, and compute the unit vector. Take unit vector as one axis of the
system:
xl
xl = rl,2 − rl,1 −→ x̂l =
||xl ||2

3. Take a third vector, and compute the component of the vector that is equal to the vector from point 1 to point 2. Now,
points 1, 2, and 3 from (x, y) plane, and the removed component from point 3 forms the y-component of the coordinate
system.

4. We can compute the y vector:

y = (rl,3 − rl,1 ) − (rl,3 − rl,1 ) · x̂l


yl
From here, we then take the unit vector yl = ||yl ||2 . From this, we know x̂l · ŷl = 0.

5. Obtain the z-axis by the cross product:

ẑl = x̂l × ŷl

(Then we also have that ẑl ·ŷl = 0 and ẑl ·x̂l = 0. This then defines a coordinate system (x̂l , ŷl , ẑl ) for the left camera/point
of reference. Note that this only requires 3 points!
6. To calculate this for the righthand frame of reference, we can repeat steps 1-5 for the righthand side to obtain the coordinate
system (x̂r , ŷr , ẑr ).
From here, all we need to do is find the transformation (rotation, since we have artificially set the origin) between the coordinate
system (x̂r , ŷr , ẑr ) and (x̂l , ŷl , ẑl ). Mathematically, we have the following equations:

x̂r = R(x̂l )
ŷr = R(ŷl )
ẑr = R(ẑl )

We can condense these equations into a matrix equation, and subsequently a matrix inversion problem:
   
x̂r ŷr ẑr = R x̂l ŷl ẑl

Then we can solve this as a matrix inversion problem:


  −1
R = x̂r ŷr ẑr x̂l ŷl ẑl

A few notes on this system:

• Is this system invertible? Yes, because by construction these vectors are orthogonal to one another.

• 3 vector equalities −→ 9 scalar equalities

• Are we using all of these constraints?

– For point 1, we use all 3 constraints.


– For point 2, we use 2 constraints.
– For point 3, we use 1 constraint.

It turns out this system “favors” points over one another (usually sub-optimal).

• This is a somewhat ad hoc method - this could be useful for deriving a fast approximation or initial estimate, but this
method is definitely suboptimal.

5
1.2.5 Procedure - “Method 2”
For this algorithm, recall our prior work with blob analysis. We can look at axes of inertia (see the figure below), where the
inertia of the “blob” is given by:
ZZ
I= r2 dm
O

Figure 3: Computing the axes of inertia in a 2D blob.

How do we generalize this from 2D to 3D? How do we figure out these axes of inertia (see the example below)?
ZZZ
I= r2 dm
O

Figure 4: Computing the axes of inertia for a 3D blob - we can generalize the notion of inertia from 2D to 3D.

One trick we can use here is using the centroid as the origin.

Using the triangle figure below:

Figure 5: Triangle for computing our r0 vector.

Then we can compute the hypotenuse r0 vector in the ω̂ direction:

r0 = (r · ω̂)ω̂

6
Then:

r − r0 = r − (r · ω̂)ω̂

And we can compute r2 :

r2 = (r − r0 ) · (r − r0 )
= r · r − 2(r · ω̂)2 + (r · ω̂)2 (ω̂ · ω̂)
= r · r − 2(r · ω̂)2 + (r · ω̂)2
= r · r − (r · ω̂)2

Substituting this into our inertia expression:


ZZZ
I= (r · r − (r · ω̂)2 )dm
O

Let us simplify some of these formulas:


1. (r · ω̂)2 :

(r · ω̂)2 = (r · ω̂)(r · ω̂)


= (ω̂ · r)(rω̂)
= ω̂ T (rrT )ω̂

Where rrT ∈ R3×3 is the dyadic product.


2. (r · r):

(r · r) = (r · r)(ω̂ · ω̂)
ω T I3 ω̂
= (r · r)ˆ

Where I3 is the 3 × 3 identity matrix.


Therefore, the inertia matrix becomes:
ZZZ
I= (r · r − (r · ω̂)2 )dm
O
ZZZ
= (r · r − (r · ω̂)2 )dm
O
ZZZ
= (r · r)ω̂ T I3 ω̂ − ω̂ T (rrT )ω̂)dm
O
 ZZZ 
T
= ω̂ ((r · r)I3 − rrT )dm ω̂
O

From this expression, we want to find the extrema. We can solve for the extrema by solving for the minimium and maximums
of this objective:
ˆ T Aω̂
1. Minimum: minω̂ ω
2. Maximum: maxω̂ ω ˆ T Aω̂
Δ RRR
Where A = O
((r · r)I3 − rrT )dm. This matrix is known as the “inertia matrix”.

How can we solve this problem? We can do so by looking for the eigenvectors of the inertia matrix:
• For minimization, the eigenvector corresponding to the smallest eigenvalue of the inertia matrix corresponds to our
solution.
• For maximization, the eigenvector corresponding to the largest eigenvalue of the inertia matrix corresponds to our
solution.
• For finding the saddle point, our solution will be the eigenvector corresponding to the middle eigenvalue.

7
Since this is a polynomial system of degree 3, we have a closed-form solution! These three eigenvectors will form a coordinate
system for the lefthand system.

Taking a step back, let us look at what we have done so far. We have taken the cloud of points from the left frame of
reference/coordinate system and have estimated a coordinate system for it by finding an eigenbasis from solving these opti-
ˆ T Aω̂. With this, we can then repeat the same process for the righthand system.
mization problems over the objective ω

Why does this approach work better?

• We use all the data in both point clouds of 3D data.


• We weight all the points equally, unlike in teh previous case.

But why is this approach not used in practice?

• This approach fails under symmetry - i.e. of the inertia is hte same in all directions (for shapes such as spheres, polyhedra,
octahedra, cube, etc.)
• “Double-Edged Sword” - using correspondences can provide you with more information, but can run into issues with
symmetry.

1.2.6 Computing Rotations


Next, we will dive into how we can compute and find rotations. To do this, let us first go over the following properties of
rotations:
1. Rotations preserve dot products: R(a) · R(b) = a · b

2. Rotations preserve length: R(a) · R(a) = a · a

3. Rotations preserve angles: |R(a) × R(b)| = |a × b|

4. Rotations preserve triple products: [R(a) R(b) R(c)] = [a b c] (Where the triple product [a b c] = a · (b × c)).

Using these properties, we are now ready to set this up as least squares, using correspondences between points measured between
two coordinate systems:

Transform: rr = R(rl ) + r0

Error: ei = rr,i − R(rl,i ) − r0

And we can write our optimization as one that minimizes this error term:
N
X
R∗ , r∗0 ||ei ||2
i=1

Next, we can compute the centroids of the left and right systems:
N N
1 X 1 X
r̄l = rl,i , r̄r = rr,i
N i=1 N i=1

We can use these computed centroids from points so we do not have to worry about translation. A new feature of this system
is that the new centroid is at the origin. To prove this, let us “de-mean” (subtract the mean) of our coordinates in the left and
righthand coordinate systems:

r0l,i = rl,i − r̄l , rr,i


0
= rr,i − r̄r

Because we subtract the mean, the mean of these new points now becomes zero:
N N
1 X 0 1 X 0
r̂0l = rl,i = 0 = r̂0r = r
N i=1 N i=1 r,i

8
Substituting this back into the objective, we can solve for an optimal rotation R:
N
X
R∗ = min0 ||r0i − R(r0l,i − r00 )||22
R,r0
i=1
N
X N
X
= min0 ||r0i − R(rl,i
0
)||22 − 2r00 (r0r,i − R(rl,i )) + N ||r00 ||22
R,r0
i=1 i=1
N
X
= min0 ||r0i − R(rl,i
0
)||22 + N ||r00 ||22
R,r0
i=1

Since only the last term depends on r00 , we can set r00 to minimize. Moreover, we can solve for the true r0 by back-solving later:

r0 = r̄r − R(r̄l )

Intuitively, this makes sense: the translation vector between these two coordinate systems/point clouds is the difference between
the centroid of the right point cloud and the centroid of the left point cloud after it has been rotated.

Since we now have that r00 = 0 ∈ R3 , we can write our error term as:

ei = r0r,i − R(r0l,i )

Which in turn allows us to write the objective as:


N
X N
X
min ||ei ||22 = (r0r,i − R(r0l,i )(r0r,i − R(r0l,i ))
i=1 i=1
N
X N
X N
X
= ||r0r,i ||22 − 0
(rr,i − R(r0l,i )) − ||r0l,i ||22
i=1 i=1 i=1

Where the first of these terms is fixed, the second of these terms is to be maximized (since there is a negative sign in front of
this term, this thereby minimizes the objective), and the third of these terms is fixed. Therefore, our rotation problem can be
simplified to:
N
X N
X
min ||ei ||22 = −2 (r0r,i − R(r0l,i )) (3)
i=1 i=1

d
To solve this objective, we could take the derivative dR (·) of the objective, but constraints make this optimization problem
difficult to solve (note that we are not optimizing over a Euclidean search space - rather, we are optimizing over a generalized
transformation). We will look into a different representation of R for next class to find solutions to this problem!

1.3 References
1. Bundle Adjustment, ____________________________________________
https://en.wikipedia.org/wiki/Bundle adjustment

9
MIT OpenCourseWare
https://ocw.mit.edu

6.801 / 6.866 Machine Vision


Fall 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

You might also like