Epipolar Geometry
Epipolar Geometry
1 Introduction
Previously, we have seen how to compute the intrinsic and extrinsic param-
eters of a camera using one or more views using a typical camera calibration
procedure or single view metrology. This process culminated in deriving
properties about the 3D world from one image. However, in general, it is not
possible to recover the entire structure of the 3D world from just one image.
This is due to the intrinsic ambiguity of the 3D to the 2D mapping: some
information is simply lost.
1
The focus of these lecture notes is to show how having knowledge of geom-
etry when multiple cameras are present can be extremely helpful. Specifically,
we will first focus on defining the geometry involved in two viewpoints and
then present how this geometry can aid in further understanding the world
around us.
2 Epipolar Geometry
Figure 2: The general setup of epipolar geometry. The gray region is the
epipolar plane. The orange line is the baseline, while the two blue lines are
the epipolar lines.
2
Figure 3: An example of epipolar lines and their corresponding points drawn
on an image pair.
planes are known as the the epipoles e and e0 . Finally, the lines defined by
the intersection of the epipolar plane and the two image planes are known as
the epipolar lines. The epipolar lines have the property that they intersect
the baseline at the respective epipoles in the image plane.
Figure 4: When the two image planes are parallel, then the epipoles e and
e0 are located at infinity. Notice that the epipolar lines are parallel to the u
axis of each image plane.
3
and will be covered in greater detail in the subsequent section on image
rectification.
In real world situations, however, we are not given the exact location of
the 3D location P , but can determine its projection in one of the image planes
p. We also should be able to know the cameras locations, orientations, and
camera matrices. What can we do with this knowledge? With the knowledge
of camera locations O1 , O2 and the image point p, we can define the epipolar
plane. With this epipolar plane, we can then determine the epipolar lines1 .
By definition, P ’s projection into the second image p0 must be located on the
epipolar line of the second image. Thus, a basic understanding of epipolar
geometry allows us to create a strong constraint between image pairs without
knowing the 3D structure of the scene.
Figure 5: The setup for determining the essential and fundamental matrices,
which help map points and epipolar lines across views.
We will now try to develop seamless ways to do map points and epipolar
lines across views. If we take the setup given in the original epipolar geometry
framework (Figure 5), then we shall further define M and M 0 to be the camera
projection matrices that map 3D points into their respective 2D image plane
locations. Let us assume that the world reference system is associated to the
first camera with the second camera offset first by a rotation R and then by
a translation T . This specifies the camera projection matrices to be:
M 0 = K0 R T
M =K I 0 (1)
1
This means that epipolar lines can be determined by just knowing the camera centers
O1 , O2 and a point in one of the images p
4
3 The Essential Matrix
In the simplest case, let us assume that we have canonical cameras, in
which K = K 0 = I. This reduces Equation 1 to
M0 = R T
M= I 0 (2)
Furthermore, this means that the location of p0 in the first camera’s refer-
ence system is RT p0 − RT T . Since the vectors RT p0 − RT T and RT T lie in the
epipolar plane, then if we take the cross product of RT T × (RT p0 − RT T ) =
RT T × RT p0 = RT (T × p0 ), we will get a vector normal to the epipolar
plane. This also means that p, which lies in the epipolar plane is normal to
RT (T × p0 ), giving us the constraint that their dot product is zero:
(RT (T × p0 ))T p = 0
(3)
(T × p0 )T Rp = 0
([T× ]p0 )T Rp = 0
p0T [T× ]T Rp = 0 (5)
0T
p [T× ]Rp = 0
p0T Ep = 0 (6)
5
with the epipoles equate to zero: Ee = E T e0 = 0. Because for any point x
(other than e) in the image of camera 1, the corresponding epipolar line in
the image of camera 2, l0 = Ex, contains the epipole e0 . Thus e0 satisfies
e0T (Ex) = (e0T E)x = 0 for all the x, so e0T E = 0. Similarly Ee = 0.
M 0 = K0 R T
M =K I 0 (7)
First, we must define pc = K −1 p and p0c = K 0−1 p0 to be the projections of
P to the corresponding camera images if the cameras were canonical. Recall
that in the canonical case:
p0T
c [T× ]Rpc = 0 (8)
6
Figure 6: Corresponding points are drawn in the same color on each of the
respective images.
7
require eight of these constraints to determine the Fundamental matrix:
0 0 0 0 0 0
F11
u1 u1 v1 u1 u1 u1 v1 v1 v1 v1 u1 v1 1
u2 u02 v2 u02 u02 u2 v20 v2 v20 v20 u2 v2 1 F12
u3 u03 v3 u03 u03 u3 v30 v3 v30 v30 u3 v3 1 F13
u4 u04 v4 u04 u04 u4 v40 v4 v40 v40 u4 v4 1 F21
u5 u05 v5 u05 u05 u5 v50 v5 v50 v50 u5 v5 1 F22 = 0 (11)
F
u6 u0 v6 u0 u0 u6 v 0 v6 v 0 v 0 u6 v6 1 23
6 6 6 6 6 6
u7 u07 v7 u07 u07 u7 v70 v7 v70 v70 u7 v7 1 F31
F32
u8 u08 v8 u08 u08 u8 v80 v8 v80 v80 u8 v8 1
F33
This can be compactly written as
Wf = 0 (12)
minimize kF − F̂ kF
F (13)
subject to det F = 0
8
10+ pixels. To reduce this error, we can consider a modified version of the
Eight-Point Algorithm called the Normalized Eight-Point Algorithm.
The main problem of the standard Eight-Point Algorithm stems from the
fact that W is ill-conditioned for SVD. For SVD to work properly, W should
have one singular value equal to (or near) zero, with the other singular values
being nonzero. However, the correspondences pi = (ui , vi , 1) will often have
extremely large values in the first and second coordinates due to the pixel
range of a modern camera (i.e. pi = (1832, 1023, 1)). If the image points
used to construct W are in a relatively small region of the image, then each
the vectors for pi and p0i will generally be very similar. Consequently, the
constructed W matrix will have one very large singular value, with the rest
relatively small.
To solve this problem, we will normalize the points in the image before
constructing W . This means we pre-condition W by applying both a trans-
lation and scaling on the image coordinates such that two requirements are
satisfied. First, the origin of the new coordinate system should be located
at the centroid of the image points (translation). Second, the mean square
distance of the transformed image points from the origin should be 2 pix-
els (scaling). We can compactly represent this process by a transformation
matrices T, T 0 that translate by the centroid and scale by the scaling factor
2
mean distance
for each respective image.
Afterwards, we normalize the coordinates:
Using the new, normalized coordinates, we can compute the new Fq using
the regular least-squares Eight Point Algorithm. However, the matrix Fq is
the fundamental matrix for the normalized coordinates. For it to be usable
on regular coordinate space, we need to de-normalize it, giving
F = T 0T Fq T (16)
5 Image Rectification
Recall that an interesting case for epipolar geometry occurs when two images
are parallel to each other. Let us first compute the Essential matrix E in the
case of parallel image planes. We can assume that the two cameras have the
same K and that there is no relative rotation between the cameras (R = I.
9
In this case, let us assume that there is only a translation along the x axis,
giving T = (Tx , 0, 0). This gives
0 0 0
E = [T× ]R = 0 0 −Tx (17)
0 Tx 0
Once E is known, we can find the directions of the epipolar lines associ-
ated with points in the image planes. Let us compute the direction of the
epipolar line ` associated with point p0 :
0
0 0 0 u 0
` = E T p0 = 0 0 Tx v 0 = Tx (18)
0 −Tx 0 1 −Tx v 0
10
Figure 8: The rectification problem setup: we compute two homographies
that we can apply to the image planes to make the resulting planes parallel.
Rectifying a pair of images does not require knowledge of the two camera
matrices K, K 0 or the relative transformation R, T between them. Instead, we
can use the Fundamental matrix estimated by the Normalized Eight Point
Algorithm. Upon getting the Fundamental matrix, we can compute the
epipolar lines `i and `0i for each correspondence pi and p0i .
From the set of epipolar lines, we can then estimate the epipoles e and e0 of
each image. This is because we know that the epipole lies in the intersection
of all the epipolar lines. In the real world, due to noisy measurements, all the
epipolar lines will not intersect in a single point. Therefore, computing the
epipole can be found by minimizing the least squared error of fitting a point
to all the epipolar lines. Recall that each epipolar line can be represented
as a vector ` such that all points on the line (represented in homogeneous
coordinates) are in the set {x|`T x = 0}. If we define each epipolar line as
T
`i = `i,1 `i,2 `i,3 , then we can we formulate a linear system of equations
and solve using SVD to find the epipole e:
`T1
..
. e = 0 (19)
T
`n
After finding the epipoles e and e0 , we will most likely notice that they
are not points at infinity along the horizontal axis. If they were, then, by
definition, the images would already be parallel. Thus, we gain some insight
into how to make the images parallel: can we find a homography to map an
epipole e to infinity along the horizontal axis? Specifically, this means that we
want to find a pair of homographies H1 , H2 that we can apply to the images
to map the epipoles to infinity. Let us start by finding a homography H2 that
11
maps the second epipole e0 to a point on the horizontal axis at infinity (f, 0, 0).
Since there are many possible choices for this homography, we should try to
choose something reasonable. One condition that leads to good results in
practice is to insist that the homography acts like a transformation that
applies a translation and rotation on points near the center of the image.
The first step in achieving such a transformation is to translate the second
image such that the center is at (0, 0, 1) in homogeneous coordinates. We
can do so by applying the translation matrix
1 0 − width
2
T = 0 1 − height2
(20)
0 0 1
H2 = T −1 GRT (23)
12
Although the derivation2 is outside the scope of this class, we can actually
prove that the matching H1 is of the form:
H1 = HA H2 M (25)
M = [e]× F (28)
Notice that if the columns of M were added by any scalar multiple of e, then
the F = [e]× M still holds up to scale. Therefore, the more general case of
defining M is
M = [e]× F + ev T (29)
for some vector v. In practice, defining M by setting v T = 1 1 1 works
very well.
To finally solve for H1 , we need to compute the a values of HA . Recall that
we want to find a H1 , H2 to minimize the problem posed in Equation 24. Since
we already know the value of H2 and M , then we can substitute p̂i = H2 M pi
and p̂0i = H2 p0i and the minimization problem becomes
X
arg min kHA p̂i − p̂0i k2 (30)
HA
i
In particular, if we let p̂i = (x̂i , ŷi , 1) and p̂0i = (x̂0i , ŷi0 , 1), then the mini-
mization problem can be replaced by:
X
arg min (a1 x̂i + a2 ŷi + a3 − x̂0i )2 + (ŷi − ŷi0 )2 (31)
a
i
2
If you are interested in the details, please see Chapter 11 of Hartley & Zisserman’s
textbook Multiple View Geometry
13
Since ŷi − ŷi0 is a constant value, the minimization problem further reduces
to X
arg min (a1 x̂i + a2 ŷi + a3 − x̂0i )2 (32)
a
i
14