Notes Perspective
Notes Perspective
p = f
A+ D
A
Z
+ D
Z
= f
D
D
Z
i.e. the image of the line terminates in a vanishing point with coordinates
(fD
X
/D
Z
, fD
Y
/D
Z
), unless the line is parallel to the image plane (D
Z
= 0).
Note, the vanishing point is unaected (invariant to) line position, A, it only
depends on line orientation, D. Consequently, the family of lines parallel to
D have the same vanishing point.
Under spherical perspective, a line in the scene projects to half of a great
circle. This circle is dened by the intersection of the viewing sphere with the
plane containing the line and center of projection. There are two vanishing
points here corresponding to the endpoints of the half great circle. You should
convince yourself that these are the same for a family of parallel straight lines.
1.3 Projection of planes
A plane of points in 3D can be represented as X.N = d where N is the unit
plane normal, and d the perpendicular distance of the plane from the origin.
A point X on the plane is imaged at p = fX/Z. Taking the scalar product
of both sides with N gives p.N = fX.N/Z = fd/Z. In the limit of points
very distant:
lim
Z
p.N = 0
which is the equation of a plane through the origin parallel to the world plane
(i.e. which has the same normal N). The plane p.N = 0 intersects the image
4 CHAPTER 1. GEOMETRY OF IMAGE FORMATION
plane in a vanishing line
xN
x
+ yN
y
+ fN
z
= 0
Note, the vanishing line is unaected (invariant to) plane position, d, it only
depends on plane orientation, N. All planes with the same orientation have
the same vanishing line, also called the horizon.
Consider a line on the plane. It can be shown (exercise) that the vanishing
points of all lines on the plane lie on the vanishing line of the plane. Thus,
two vanishing points determine the vanishing line of the plane.
Under spherical perspective, the horizon of a plane is a great circle, found
by translating the plane parallel to itself until it passes through the center
of projection, and then intersecting it with the viewing sphere.
1.4 Terrestrial Perspective
Consider an observer standing on a ground plane looking straight ahead of
her. Since the ground plane has surface normal N = (0, 1, 0), the equation
of the horizon is y = 0. In this canonical case, the horizon lies in the middle
of the eld of view, with the ground plane in the lower half and the sky in
the upper half.
Let us work out how objects of dierent heights and at dierent locations
on the ground plane project. We will suppose that the eye, or camera,
is a height h
c
above the ground plane. Consider an object of height Y
resting on the ground plane, whose bottom is at (X, h
c
, Z) and top is at
(X, Y h
c
, Z). The bottom projects to (fX/Z, fh
c
/Z) and the top to
(fX/Z, f(Y h
c
)/Z).
We note the following:
1. The bottoms of nearer objects (small Z) project to points lower in the
image plane, farther objects have bottoms closer to the horizon.
2. If the object has the same height as the camera (Y = h
c
), the projec-
tion of its top lies on the horizon.
3. The ratio of the height of the object to the height of the camera, Y/h
c
is the ratio of the apparent vertical height of the object in the image
to the vertical distance of its bottom from the horizon (Verify).
1.5 Orthographic Projection
If the object is relatively shallow compared with its distance from the camera,
we can approximate perspective projection by scaled orthographic projection.
The idea is as follows: If the depth Z of points on the object varies within
some range Z
0
Z, with Z Z
0
, then the perspective scaling factor f/Z
1.6. SUMMARY 5
can be approximated by a constant s = f/Z
0
. The equations for projection
from the scene coordinates (X, Y, Z) to the image plane become x = sX and
y = sY . Note that scaled orthographic projection is an approximation that is
valid only for those parts of the scene with not much internal depth variation;
it should not be used to study properties in the large. For instance, under
orthographic projection, parallel lines stay parallel instead of converging to
a vanishing point!
1.6 Summary
Plane perspective
(X, Y, Z) (
fX
Z
,
fY
Z
, f) (1.1)
Spherical perspective
(X, Y, Z) (
X, Y, Z
X
2
+ Y
2
+ Z
2
) (1.2)
Lines vanishing points
A + D (
fD
x
D
z
,
fD
y
D
z
) (1.3)
Planes vanishing lines (horizons)
X N = d xN
x
+ yN
y
+ fN
z
= 0 (1.4)
1.7 Exercises
1. Show that the vanishing points of lines on a plane lie on the vanishing
line of the plane.
2. Show that, under typical conditions, the silhouette of a sphere of radius
r with center (X,0, Z) under planar perspective projection is an ellipse
of eccentricity X/
(X
2
+ Z
2
r
2
). Are there circumstances under
which the projection could be a parabola or hyperbola? What is the
silhouette for spherical perspective?
3. An observer is standing on a ground plane looking straight ahead. We
want to calculate the accuracy with which she will be able to estimate
the depth Z of points on the ground plane, assuming that she can
visually discriminate angles to within 1
cos sin
sin cos
rotation, det=+1
or
cos sin
sin cos
reection, det=1
Under a rotation by angle ,
1
0
cos
sin
and
0
1
sin
cos
The reection matrix above corresponds to reection around the line with
angle
2
(verify). Note that two rotations one after the other give another
rotation, while two reections give us a rotation.
Let us now construct some examples in 3D. Just as in 2D, rotations
are characterized by orthogonal matrices with det = +1. For orthogonal
matrices, each column vector has length 1, and the dot product of any two
dierent columns is 0. This gives rise to six constraints (3 pairwise dot
product constraints, and 3 length constraints), so for a 3 dimensional rotation
matrix
A =
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(2.14)
with 9 total parameters, there are really only three free parameters. There
are several methods by which these parameters can be specied, as we will
study later. Here are a few example rotation matrices.
Rotation about z-axis by :
R =
cos sin 0
sin cos 0
0 0 1
(2.15)
10 CHAPTER 2. POSE, SHAPE AND GEOMETRIC TRANSFORMATIONS
Rotation about x-axis by :
R =
1 0 0
0 cos sin
0 sin cos
(2.16)
2.1.2 Group structure of isometries
Theorem 2.1 Any isometry can be expressed as the combination of an or-
thogonal transformation followed by a translation as follows:
(a) = Aa +t (2.17)
where A represents the orthogonal matrix and t is the translation vector.
The set of rigid body motions constitutes a group
1
. In our notation,
1
2
,
1
composed with
2
, denotes that we apply
2
rst and then
1
.
We will show rst that isometries are closed under composition. Consider
two rigid body motions,
1
and
2
:
1
(a) = A
1
a +t
1
2
(a) = A
2
a +t
2
. (2.18)
Then we have
1
2
(a) = A
1
(A
2
a +t
2
) +t
1
(2.19)
= A
1
A
2
a +A
1
t
2
+t
1
(2.20)
= (A
1
A
2
)a + (A
1
t
2
+t
1
) (2.21)
= A
3
a +t
3
(2.22)
where A
3
= A
1
A
2
and t
3
= A
1
t
2
+t
3
. Thus,
1
2
=
3
is also a rigid body
motion, under the assumption that the product of two orthogonal matrices
is orthogonal (Verify!)
Note that translations and rotations are closed under composition, but
reections are not.
We can verify the remaining axioms for showing that isometries constitute
a group
Identity: A = I, d = 0 .
Inverse: We need A
1
A
2
= I and t
3
= A
1
t
2
+t
1
= 0. This means that
for
1
to be the inverse of
2
, A
1
= A
T
2
and d
2
= A
1
1
t
1
Associativity: left as an exercise for the reader.
1
A group (G, ) is a set G with a binary operation that satises the following four
axioms: Closure: For all a, b in G, the result of a b is also in G. Associativity: For all
a, b and c in G, (a b) c = a (b c). Identity element: There exists an element e in G
such that for all a in G, e a = a e = a. Inverse element: For each a in G, there exists
an element b in G such that a b = b a = e, where e is an identity element.
2.2. PARAMETRIZING ROTATIONS IN 3D 11
2.2 Parametrizing Rotations in 3D
Recall that rotation matrices have the property that each column vector
has length 1 and the dot product of any 2 dierent columns is 0. These
6 constraints leave only 3 degrees of freedom. Here are some alternative
notations used to represent orthogonal matrices in 3-D:
Euler angles which specify rotations about 3 axes
Axis plus amount of rotation
Quaternions which generalize complex numbers from 2-D to 3-D. (Note,
a complex number can represent a rotation in 2-D)
We will use the axis and rotation as the preferred representation of an
orthogonal matrix: s, , where s is the unit vector of the axis of rotation and
is the amount of rotation.
Denition 4 A matrix S is skew-symmetric if S = S
T
.
Skew symmetric matrices can be used to represent cross products or vector
products. Recall:
a
1
a
2
a
3
b
1
b
2
b
3
a
2
b
3
a
3
b
2
a
3
b
1
a
1
b
3
a
1
b
2
a
2
b
1
We dene a as:
a
def
=
0 a
3
a
2
a
3
0 a
1
a
2
a
1
0
b
1
b
2
b
3
a
3
b
2
+ a
2
b
3
a
3
b
1
a
1
b
3
a
2
b
1
+ a
1
b
2
= a b
Consider now, the equation of motion of a point q on a rotating body:
q(t) = q(t)
12 CHAPTER 2. POSE, SHAPE AND GEOMETRIC TRANSFORMATIONS
where the direction of species the axis of rotation and || species the
angular speed. Rewriting with
q(t) = q(t)
The solution of this dierential equation involves the exponential of a matrix.
(In matlab, this is the operator expm.)
q(t) = e
b t
q(0)
Where,
e
b t
= I + t +
( t)
2
2!
+
( t)
3
3!
+ ...
Collecting the odd and even terms in the above equation, we get to Roderigues
Formula for a rotation matrix R.
R = e
b s
= I + sin s + (1 cos )s
2
Here s is a unit vector along and = ||t is the total amount of rotation.
Given an axis of rotation, s, and amount of rotation we can construct s
and plug it in.
2.3 Ane transformations
Thus far we have focused on Euclidean transformations, (a) = Aa +t,
where A is an orthogonal matrix. If we allow A to be any non-singular
matrix (i.e., det A ,= 0), then we get the set of ane transformations. Note
that the Euclidean transformations are a subset of the ane transformations.
2.3.1 Degrees of freedom
Let us count the degrees of freedom in the parameters that specify a transfor-
mation. For : R
2
R
2
, Euclidean transformations have 3 free parameters
(1 rotation, 2 translation), whereas Ane transformations have 6 (4 in A and
2 in t). For : R
3
R
3
, Euclidean transformations have 6 free parameters
(3 rotation, 3 translation), whereas Ane transformations have 12 (9 in A
and 3 in t).
2.4. EXERCISES 13
2.4 Exercises
1. Show that in 1
2
reection about the = line followed by reection
about the = is equivalent to a rotation of 2( ).
2. Verify Roderigues formula by considering the powers of the skew-symmetric
matrix associated with the cross product with a vector.
3. Write a Matlab function for computing the orthogonal matrix R cor-
responding to rotation about the axis vector s. Find the eigenvalues
and eigenvectors of the orthogonal matrices and study any relationship
to the axis vector. Verify the formula cos =
1
2
trace(R) 1. Show
some points before and after the rotation has been applied.
4. Write a Matlab function for the converse of that in the previous problem
i.e. given an orthogonal matrix R, compute the axis of rotation s and
). Hint: Show that RR
T
= (2 sin )s
14 CHAPTER 2. POSE, SHAPE AND GEOMETRIC TRANSFORMATIONS
Chapter 3
Dynamic Perspective
3.1 Optical Flow
Motion in the 3D world, either of objects or of the camera, projects to motion
in the image. We call this optical ow. At every point (x, y) in the image
we get a 2D vector, corresponding to the motion of the feature located at
that point. Thus optical ow is a 2D vector eld. As rst pointed out by
Gibson, the optical ow eld of a moving observer contains information to
infer the 3D structure of the scene, as well as the movement of the observer,
so-called egomotion. An example ow eld is shown in Figure 3.1.
Figure 3.1: The optical ow eld of a pilot just before takeo
3.2 From 3-D Motion to 2-D Optical Flow
X = (X, Y, Z): 3-D coordinates in the world
(x, y): 2-D coordinates in the image
t = (t
x
, t
y
, t
z
): translational component of motion
= (
x
,
y
,
z
): rotational component of motion
15
16 CHAPTER 3. DYNAMIC PERSPECTIVE
(u, v) = ( x, y): optical ow eld
Let us start by deriving the equations relating motion in the 3-D world
to the resulting optical ow eld on the 2-D image plane. For simplicity we
will focus on a single point in the scene X = (X, Y, Z).
Assume that the camera moves with translational velocity t = (t
x
, t
x
, t
z
)
and angular velocity = (
x
,
y
,
z
). Eq.(3.1) is used to characterize the
movement of X,
X = t X, (3.1)
which can be written out in coordinates as Eq.(3.2):
t
x
t
y
t
z
y
z
z
y
z
x
x
z
x
y
y
x
. (3.2)
Assume the image plane lies at f = 1, then x =
X
Z
and y =
Y
Z
. Taking the
derivative, we have
x =
XZ
ZX
Z
2
, y =
Y Z
ZY
Z
2
. (3.3)
Substitute
X,
Y ,
Z in Eq.(3.3) using Eq.(3.2), plug in x =
X
Z
, y =
Y
Z
, and
simplify it, we get
u
v
x
y
=
1
Z
1 0 x
0 1 y
t
x
t
y
t
z
xy (1 + x
2
) y
1 + y
2
xy x
(3.4)
We can use these equations to solve the forward (graphics) problem of
determining the movement in the image given the movement in the world.
If we assume that the parameters t, are the same for all the points, that
is equivalent to a rigidity assumption. It is obviously true if only the cam-
era moves. Else if we have independently moving objects, then we have to
consider each object separately.
Can all the unknowns be recovered, given enough points at which the
optical ow is known? There is a scaling ambiguity about which we can
do nothing. Consider a surface S
2
that is a dilation of the surface S
1
by
a a factor of k, i.e. suppose that the corresponding point of surface S
2
is
at depth kZ(x, y). Furthermore suppose that the translational motion is k
times faster. It is clear that the optical ow would be exactly the same for
the two surfaces. Intuitively, farther objects moving faster generate the same
optical ow as nearby objects moving slower. This is very convenient for
generating special eects in Hollywood movies!
3.3. PURE TRANSLATION 17
3.3 Pure translation
If the motion of the camera is purely translational, the terms due to rotation
in Eq. (3.4) can be dropped and the ow eld becomes
u(x, y) =
t
x
+ xt
z
Z(x, y)
, v(x, y) =
t
y
+ yt
z
Z(x, y)
. (3.5)
We can gain intuition by considering the even more special case of trans-
lation along the optical axis, i.e. t
z
,= 0, t
x
= 0, t
y
= 0, the ow eld in
Eq.(3.5) becomes
u(x, y) =
xt
z
Z(x, y)
, v(x, y) =
yt
z
Z(x, y)
; (3.6)
or equivalently
[u, v]
T
(x, y) =
t
z
Z
[x, y]
T
(3.7)
This ow eld has a very simple structure, as shown in Figure 3.2. It
is zero at the origin, and at any other point, the optical ow vector points
radially outward from the origin. We say that the origin is the Focus of
Expansion of the ow eld. The proportionality factor
tz
Z
is signicant
because it is the reciprocal of the time to collision
Z
tz
There is considerable
evidence that this variable is used by ies, birds, humans etc as a cue for
controlling locomotion. Note that while we are unable to estimate either the
true speed (t
z
) or the distance to the obstacle (Z), we are able to estimate
what truly matters for controlling locomotion. Sometimes nature is kind!
Figure 3.2: Optical ow eld of an observer moving along the z-axis towards a
frontoparallel wall
The case of general translation is essentially the same. We dene the
Focus of Expansion (FOE) of the optical ow eld to be the point, where
the optical ow is zero. Set (u, v) = (0, 0) in Eq.(3.5), we can solve for the
coordinates of the FOE,
(x
FOE
, y
FOE
) = (
t
x
t
z
,
t
y
t
z
). (3.8)
18 CHAPTER 3. DYNAMIC PERSPECTIVE
Note that the coordinates of the FOE tell us the direction of motion (we
cant hope to know the speed, anyway!). It is also worth remarking that the
FOE is just the vanishing point of the direction of translation.
Suppose we change the origin to the FOE by applying the following co-
ordinate change to Eq.(3.5),
x
= x
t
x
t
z
, y
= y
t
y
t
z
, (3.9)
then the optical ow eld becomes
[u, v]
T
(x
, y
) =
t
z
Z
[x
, y
]
T
. (3.10)
which should look very familiar. Thus the general case too corresponds to
optical ow vectors pointing outwards from the FOE, justifying the choice of
the term. Figure 3.3 shows such an optical vector eld.
Figure 3.3: Optical ow vector eld for general translational motion
We can also detect depth discontinuities from the optical ow eld. If
there is a sharp change in the lengths of ow vectors of two neighboring
points, that indicates a discontinuity in depth. The ratio of their lengths
tells us the ratio of their depths (
Z
1
Z
2
) ; however, we cant deduce the absolute
depths (Z
1
, Z
2
), which is illustrated in Figure 3.4.
Thus optical ow is one of the most important cues to image segmentation
(video segmentation, actually!). Even camouaged animals (and snipers)
must learn to stay very still to avoid detection.
3.4 General Motion
We begin by studying pure rotation. The most important thing to note is that
the rotational component, obtaining by setting t to zero, has no dependence
on Z. Therefor it conveys no information about the scene depth, only about
the rotation of the observer. For moving animals in a stationary scene, this
3.5. SUMMARY 19
Figure 3.4: Depth discontinuity in optical ow eld
commonly arises due to eye movements, which correspond to a rotation about
the center of projection.
Thus the optical ow eld corresponding to a general motion can be
thought of as having a translational component very useful for inferring time
to collision, depth boundaries in the scene, etc., and a rotational component
which carries no information about the external 3D world. In the context of
a moving animal where the rotational component is due to eye movements,
some part of the animal brain has access to the rotational signal, since the
eye movement was commanded by the brain itself. Hence the so-called eer-
ence copy carries information that can be used to subtract the rotational
component. The residual is a purely translational ow eld which can be
be analyzed more straightforwardly. Amazingly, this is actually the case in
humans (and probably in other animals with eye movements).
3.5 Summary
Optical ow is the motion of the 3-D world projected on to the 2-D
image. It can be used to derive cues about the structure of the 3-D
scene as well as egomotion.
The optical ow eld for pure translation enables us to infer
The direction of movement, but not the absolute speed
The time to collision
Locations of depth discontinuities
20 CHAPTER 3. DYNAMIC PERSPECTIVE
3.6 Exercises
1. Implement the equations which relate the point wise optical ow to
the six parameters of rigid body translation and rotation, and depth.
Construct displays for some interesting cases.
2. As a test for the code that you have written in the previous exercise,
suppose that I am driving my car along a straight stretch of freeway at
a speed of 25 m/s. My eye height above the surface of the road is 1.25
m. What is the ow vector (in degrees/s)
(a) At a point on the ground 25 m straight ahead.
(b) At a point on the ground to my left at a distance of 25 m.
(c) At points on the rear end of a 2 m wide car at a height of 1.25 m
above the ground. This car has a headway of 25 m in front of me
and is travelling at a speed of 20 m/s.