Single View Metrology
Single View Metrology
reference plane
t
π /
l
Figure 1: Basic geometry: The plane’s vanishing line is the intersection b
of the image plane with a plane parallel to the reference plane and passing π
v
through the camera centre. The vanishing point is the intersection of
the image plane with a line parallel to the reference direction through the
camera centre.
b
Figure 2: Cross ratio: The point on the plane corresponds to the
fects such as radial distortion (often arising in slightly wide- t v
point on the plane 0 . They are aligned with the vanishing point . The
angle lenses typically used in security cameras) which cor- vtb i
four points , , and the intersection of the line joining them with the
rupt the central projection model can generally be removed vanishing line define a cross-ratio. The value of the cross-ratio determines
a ratio of distances between planes in the world, see text.
[6], and are therefore not detrimental to our methods (see,
for example, figure 9).
Although the schematic figures show the camera centre of the cross-ratio provides an affine length ratio. In fact we
at a finite location, the results we derive apply also to the obtain the ratio of the distance between the planes contain-
case of a camera centre at infinity, i.e. where the images are t b
ing and , to the camera’s distance from the plane (or
obtained by parallel projection. The basic geometry of the 0 depending on the ordering of the cross-ratio). The abso-
plane’s vanishing line and the vanishing point are illustrated lute distance can be obtained from this distance ratio once
l
in figure 1. The vanishing line of the reference plane is the the camera’s distance from is specified. However it is
projection of the line at infinity of the reference plane into usually more practical to determine the distance via a sec-
v
the image. The vanishing point is the image of the point ond measurement in the image, that of a known reference
at infinity in the reference direction. Note that the reference length.
direction need not be vertical, although for clarity we will Furthermore, since the vanishing line is the imaged axis
often refer to the vanishing point as the “vertical” vanishing of the pencil of planes parallel to the reference plane, the
point. The vanishing point is then the image of the vertical knowledge of the distance between any pair of the planes
“footprint” of the camera centre on the reference plane. is sufficient to determine the absolute distance between an-
It can be seen (for example, by inspection of figure 1) other two of the planes.
that the vanishing line partitions all points in scene space. Example. Figure 3 shows that a person’s height may be
Any scene point which projects onto the vanishing line is computed from an image given a vertical reference height
at the same distance from the plane as the camera centre; elsewhere in the scene. The formula used to compute this
if it lies “above” the line it is further from the plane, and if result is given in section 3.1.
“below” the vanishing line, then it is closer to the plane than
the camera centre. 2.2. Measurements on parallel planes
Two points on separate planes (parallel to the reference
plane) correspond if the line joining them is parallel to the If the reference plane is affine calibrated (we know
reference direction; hence the image of each point and the its vanishing line) then from image measurements we can
vanishing point are collinear. For example, if the direction compute: (i) ratios of lengths of parallel line segments on
is vertical, then the top of an upright person’s head and the the plane; (ii) ratios of areas on the plane. Moreover the
sole of his/her foot correspond. vanishing line is shared by the pencil of planes parallel to
the reference plane, hence affine measurements may be ob-
2.1. Measurements between parallel planes tained for any other plane in the pencil. However, although
affine measurements, such as an area ratio, may be made on
We wish to measure the distance between two parallel a particular plane, the areas of regions lying on two parallel
t
planes, specified by the image points and , in the refer- b planes cannot be compared directly. If the region is parallel
ence direction. Figure 2 shows the geometry, with points projected in the scene from one plane onto the other, affine
t b
and in correspondence. The four points marked on the measurements can then be made from the image since both
figure define a cross-ratio. The vanishing point is the image regions are now on the same plane, and parallel projection
of a point at infinity in the scene [15]. In the image the value between parallel planes does not alter affine properties.
v
t
/ π /
l
X x
π
/
/
T
b
x
π π
X
B
b br
The projection matrix P from the world to the image is The solution to this set of equations is given (using Cramer’s
defined above with respect to a coordinate frame on the ref- rule) by
erence plane. In this section we determine the projection
Xc = ,det l?2 v ^l, Yc = det l?1 v ^l
matrix P0 referred to the parallel plane 0 and we show how ^l , Wc = det l?1 l?2 v
Zc = ,det l?1 l?2 (7)
the homology between the two planes can be derived di-
rectly from the two projection matrices. Note that once again we obtain structure off the plane up to
Suppose the world coordinate system is translated from
the plane onto the plane 0 along the reference direction,
the affine scale factor . As before, we may upgrade the
distance to metric with knowledge of , or use knowledge
then it is easy to show that we can parametrize the new pro-
jection matrix P0 as:
of camera height to compute and upgrade the affine struc-
ture.
Z v + ^l
P0 = p1 p2 v Note that affine viewing conditions (where the camera
centre is at infinity) present no problem to the expressions
>
where Z is the distance between the planes. Note that if in (7), since in this case we have = 0 0 ^l
Z = 0 then P0 = P correctly.
and
>
The plane to image homographies can be extracted from v= 0 . Hence Wc = 0 so we obtain a cam-
the projection matrices ignoring the third column, to give: era centre on the plane at infinity, as we would expect. This
point on 1 represents the viewing direction for the paral-
p1 p2 ^l ; Z v + ^l
H= H0 = p1 p2 lel projection.
Then H~ = H0 H,1 maps image points on the plane onto If the viewpoint is finite (i.e. not affine viewing condi-
tions) then the formula for Zc may be developed further
points on the plane 0 and so defines the homology.
~ as: by taking the scalar product of both sides of (6) with the
^l ^l v
vanishing line . The result is: Zc = ,( ),1 .
A short computation gives the homology matrix H
H~ = I + Z vl^> (5)
4. Uncertainty Analysis
Given the homology between two planes in the pencil we
can transfer all points from one plane to the other and make Feature detection and extraction – whether manual or au-
affine measurements in the plane (see fig 5 and fig 7). tomatic (e.g. using an edge detector) – can only be achieved
t Λt ^t
b Λb ^b timated (measurements are in cm). (left) The height of the man and the
associated uncertainty are computed as 190.6cm (c.f. ground truth value
190cm). The vanishing line for the ground plane is shown in white at
the top of the image. When one reference height is used the uncertainty
Figure 8: Maximum likelihood estimation of the top and base points (3-sigma) is 4:1cm, while (right) it reduces to 2:9cm as two more ref-
(closeup of fig. 9): (left) The top and base uncertainty ellipses, respec- erence heights are introduced (the filing cabinet and the table on the left).
tively t and b , are shown. These ellipses are specified by the user, and
indicate a confidence region for localizing the points. (right) MLE top and Now, assuming the statistical independence of and P ^z
t b
base points ^ and ^ are aligned with the vertical vanishing point (outside we obtain a first order approximation for the variance of the
the image).
distance measurement:
to a finite accuracy. Any features extracted from an image, 2
h = rh 0 rh >^z 0
(8)
therefore, are subject to errors. In this section we consider P
how these errors propagate through the measurement for-
mulae in order to quantify the uncertainty on the final mea- where rh is the 1 10 Jacobian matrix of the function
surements. which maps the projection matrix and top and base points to
When making measurements between planes, uncer- a distance between them (4). The validity of all approxima-
tainty arises from the uncertainty in P, and from the uncer- tion has been tested by Monte Carlo simulations and by a
tain image locations of the top and base points and . The t b number of measurements on real images where the ground
uncertainty in P depends on the location of the vanishing truth was known.
line, the location of the vanishing point, and on , the affine Example. An image obtained from a poor quality security
scale factor. Since only the final two columns contribute, camera is shown in figure 9. It has been corrected for ra-
we model the uncertainty in P as a 6 6 homogeneous co- dial distortion using the method described in [6], and the
variance matrix, P . Since the two columns have only five floor taken as the reference plane. Vertical and horizontal
v
degrees of freedom (two for , two for and one for ),l lines are used to compute the P matrix of the scene. One
the covariance matrix is singular, with rank five. Details reference height is used to obtain the affine scale factor
of its computation are given in [4] and are omitted here for from (4), so other measurements in the same direction are
brevity. metric.
Likewise, the uncertainty in the top and base points (re- The computed height of the man and an associated 3-
sulting largely from the finite accuracy with which these standard deviation uncertainty are displayed in the figure.
features may be located in the image) is modelled by covari- The height obtained differs by only 6mm from the known
ance matrices b and t . Since in the error-free case, these true value. As the number of reference distances is in-
points must be aligned with the vertical vanishing point we creased, so the uncertainty on P (in fact just on ) de-
can determine maximum likelihood estimates of their true creases, resulting in a decrease in uncertainty of the mea-
locations (^ and ^ ) by minimising the objective
t b sured height, as theoretically expected.
(b2 , b^2 )> ,b21 (b2 , b^2 ) + (t2 , ^t2 )> ,t21 (t2 , ^t2 )
5. Applications
(which is the sum of the Mahalanobis distances between
the input points and the ML estimates, the subscript 2 in-
5.1. Forensic science
dicates inhomogeneous 2-vectors) subject to the alignment
v ^t b^
A common requirement in surveillance images is to ob-
constraint ( ) = 0. Using standard techniques [7] tain measurements from the scene, such as the height of a
we obtain a first order approximation to the 4 4, rank three felon. Although, the felon has usually departed the scene,
covariance of the parameters = ( ^> ^z2
^ >2 )> . Figure 8
t b reference lengths can be measured from fixtures such as ta-
illustrates the idea. bles and windows.
Figure 10: Measuring the height of a person in an outdoor scene: The
ground plane is the reference plane, and its vanishing line is computed
from the slabs on the floor. The vertical vanishing point is computed from
the edges of the phone box, whose height is known and used as reference.
The veridical height is 187cm, but note that the person is leaning slightly
on his right foot.
In figure 10 the edges of the paving stones on the floor lengths. The position of the camera centre is also estimated
are used to compute the vanishing line of the ground plane; and is shown in the figure.
the edges of the phonebox to compute the vertical vanishing
point; and the height of the phone box provides the metric 5.3. Modelling paintings
calibration in the vertical direction. The height of the person
is then computed using (4). Figure 13 shows a masterpiece of Italian Renaissance
Figure 11 shows an example where the homology is used painting, “La Flagellazione di Cristo” by Piero della
to project points between planes so that a vertical distance Francesca (1416 - 1492). The painting faithfully follows the
may be measured given the distance between a plane and geometric rules of perspective, and therefore we can apply
the reference plane. the methods developed here to obtain a correct 3D recon-
struction of the scene.
5.2. Virtual modelling Unlike other techniques [8] whose main aim is to cre-
ate convincing new views of the painting regardless of the
In figure 12 we show an example of complete 3D re- correctness of the 3D geometry, here we reconstruct a geo-
construction of a scene. Two sets of horizontal edges are metrically correct 3D model of the viewed scene.
used to compute the vanishing line for the ground plane, In the painting analyzed here, the ground plane is chosen
and vertical edges used to compute the vertical vanishing as reference and its vanishing line can be computed from
point. Four points with known Euclidean coordinates deter- the several parallel lines on it. The vertical vanishing point
mine the metric calibration of the ground plane and thus for follows from the vertical lines and consequently the relative
the pencil of horizontal planes which share the vanishing heights of people and columns can be computed. Further-
line. The distance of the top of the window to the ground, more the ground plane can be rectified from the square floor
and the height of one of the pillars are used as reference patterns and therefore the position on the ground of each
vertical object estimated [5, 10]. The measurements, up to
an overall scale factor, are used to compute a three dimen-
sional VRML model of the scene. Two different views of
the model are shown in figure 13.
References
[1] L. B. Alberti. De Pictura. Laterza, 1980.
[2] M. Berger. Geometry II. Springer-Verlag, 1987.
[3] R. T. Collins and R. S. Weiss. Vanishing point calculation
as a statistical inference on the unit sphere. In Proc. ICCV,
pages 400–403, Dec 1990.
[4] A. Criminisi, I. Reid, and A. Zisserman. Computing 3D eu-
clidean distance from a single view. Technical Report OUEL
2158/98, Dept. Eng. Science, University of Oxford, 1998.
[5] A. Criminisi, I. Reid, and A. Zisserman. A plane measuring
device. Image and Vision Computing, 17(8):625–634, 1999.
[6] F. Devernay and O. Faugeras. Automatic calibration and re-
moval of distortion from scenes of structured environments.
In SPIE, volume 2567, San Diego, CA, Jul 1995.
[7] O. Faugeras. Three-Dimensional Computer Vision: a Geo-
metric Viewpoint. MIT Press, 1993.
[8] Y. Horry, K. Anjyo, and K. Arai. Tour into the picture: Using
a spidery mesh interface to make animation from a single
image. In Proc. ACM SIGGRAPH, pages 225–232, 1997.
[9] T. Kim, Y. Seo, and K. Hong. Physics-based 3D position
analysis of a soccer ball from monocular image sequences.
Proc. ICCV, pages 721 – 726, 1998.
[10] D. Liebowitz, A. Criminisi, and A. Zisserman. Creating ar-
chitectural models from images. In Proc. EuroGraphics, Sep
1999.
[11] D. Liebowitz and A. Zisserman. Metric rectification for per-
spective images of planes. In Proc. CVPR, pages 482–488,
Jun 1998.
[12] G. F. McLean and D. Kotturi. Vanishing point detection by
line clustering. IEEE T-PAMI, 17(11):1090–1095, 1995.
[13] M. Proesmans, T. Tuytelaars, and L. Van Gool. Monoc- Figure 13: Complete 3D reconstruction of a Renaissance painting:
ular image measurements. Technical Report Improofs- (top) La Flagellazione di Cristo, (1460, Urbino, Galleria Nazionale delle
M12T21/1/P, K.U.Leuven, 1998. Marche). (middle) A view of the reconstructed 3D model. The patterned
[14] I. Reid and A. Zisserman. Goal-directed video metrology. In floor has been reconstructed in areas where it is occluded by taking advan-
R. Cipolla and B. Buxton, editors, Proc. ECCV, volume II, tage of the symmetry of its pattern. (bottom) another view of the model
pages 647–658. Springer, Apr 1996. with the roof removed to show the relative positions of people and archi-
[15] C. E. Springer. Geometry and Analysis of Projective Spaces. tectural elements in the scene. Note the repeated geometric pattern on the
floor in the area delimited by the columns (barely visible in the painting).
Freeman, 1964.
Note that the people are represented simply as flat silhouettes since it is not
[16] L. Van Gool, M. Proesmans, and A. Zisserman. Planar ho- possible to recover their volume from one image, they have been cut out
mologies as a basis for grouping and recognition. Image and manually from the original image. The columns have been approximated
Vision Computing, 16:21–26, Jan 1998. with cylinders.