0% found this document useful (0 votes)
0 views9 pages

fernandes_et_al_SIBGRAPI_2005

This paper presents a real-time method for computing the dimensions of boxes from single perspective images using a camera and laser beams. It introduces algorithms for automatic box dimension measurement, silhouette extraction, background removal, and collinear segment identification. The proposed approach is demonstrated through a scanner prototype that effectively computes box dimensions even when the box is partially occluded.

Uploaded by

haiheee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views9 pages

fernandes_et_al_SIBGRAPI_2005

This paper presents a real-time method for computing the dimensions of boxes from single perspective images using a camera and laser beams. It introduces algorithms for automatic box dimension measurement, silhouette extraction, background removal, and collinear segment identification. The proposed approach is demonstrated through a scanner prototype that effectively computes box dimensions even when the box is partially occluded.

Uploaded by

haiheee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4225382

Computing Box Dimensions from Single Perspective Images in Real Time

Conference Paper · November 2005


DOI: 10.1109/SIBGRAPI.2005.22 · Source: IEEE Xplore

CITATIONS READS
3 213

4 authors:

Leandro A. F. Fernandes Manuel M. Oliveira


Universidade Federal Fluminense Universidade Federal do Rio Grande do Sul
59 PUBLICATIONS 1,044 CITATIONS 132 PUBLICATIONS 5,635 CITATIONS

SEE PROFILE SEE PROFILE

Roberto Da Silva Gus Crespo


Universidade Federal do Rio Grande do Sul Stony Brook University
155 PUBLICATIONS 1,295 CITATIONS 4 PUBLICATIONS 18 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Leandro A. F. Fernandes on 02 June 2014.

The user has requested enhancement of the downloaded file.


Computing Box Dimensions from Single Perspective Images in Real Time

Leandro A. F. Fernandes 1 Manuel M. Oliveira 1 Roberto da Silva 1 Gustavo J. Crespo 2


1 2
UFRGS – Instituto de Informática Stony Brook University
{laffernandes, oliveira, rdasilva}@inf.ufrgs.br

Abstract ages by projecting two parallel laser beams, apart from each
other by a known distance and perpendicular to the camera’s
We present a new method for computing the dimen- image plane, onto one of the visible faces of the box. We
sions of boxes from single perspective projection images demonstrate this technique by building a scanner prototype
in real time. Given a picture of a box, acquired with a for computing box dimensions and using it to compute the
camera whose intrinsic parameters are known, the dimen- dimensions of boxes in real time (Figure 1).
sions of the box are computed from the extracted box sil-
houette and the projection of two parallel laser beams on
one of its visible faces. We also present a statistical model
for background removal that works with a moving camera,
and an efficient voting scheme for identifying approximately
collinear segments in the context of a Hough Transform.
We demonstrate the proposed approach and algorithms by
building a prototype of a scanner for computing box dimen-
sions and using it to automatically compute the dimensions
of real boxes. The paper also presents some statistics over
Figure 1. Scanner prototype: (left) Its oper-
measurements obtained with our scanner prototype.
ation. (right) Camera’s view with recovered
edges and vertices.

1 Introduction
The main contributions of this paper include:
The ability to measure the dimensions of three-
• An algorithm for computing the dimensions of boxes
dimensional objects directly from images has many prac-
in a completely automatic way in real-time (Section 3);
tical applications including quality control, surveillance,
analysis of forensic records, storage management and cost • An algorithm for extracting box silhouettes in the pres-
estimation. Unfortunately, unless some information relating ence of partial occlusion of the box edges (Section 3);
distances measured in image space to distances measured in
3D is available, the problem of making measurements di- • A statistical model for detecting background pixels un-
rectly on images is not well defined. This is a result of the der different lighting conditions, for use with a moving
inherent ambiguity of perspective projection caused by the camera (Section 4);
loss of depth information.
• An efficient voting scheme for identifying approxi-
This paper presents a method for computing boxes di-
mately collinear line segments using a Hough Trans-
mensions from single perspective projection images in a
form (Section 5).
completely automatic way. This can be an invaluable tool
for companies that manipulate boxes on their day-by-day
operations, such as couriers, airlines and warehouses. The 2 Related Work
approach uses information extracted from the silhouette of
the target boxes and can be applied when at least two of Many optical devices have been created for making mea-
their faces are visible, even when the target box is partially surements in the real world. Those based on active tech-
occluded by other objects in the scene (Figure 1). We elimi- niques project some kind of energy onto the surfaces of the
nate the inherent ambiguity associated with perspective im- target objects and analyze the reflected energy. Examples of
active techniques include optical triangulation [1] and laser 3 Computing Box Dimensions
range finding [18] to capture the shapes of objects at proper
scale [13], and ultrasound to measure distances [6]. In con- We model boxes as parallelepipeds although real boxes
trast, passive techniques rely only on the use of cameras for can present many imperfections (e.g., bent edges and cor-
extracting the three-dimensional structure of a scene and ners, asymmetries, etc.). The dimensions of a paral-
are primarily based on the use of stereo [14]. In order lelepiped can be computed from the 3D coordinates of four
to achieve metric reconstruction [9], both optical triangu- of its non-coplanar vertices. Conceptually, the 3D coordi-
lation and stereo-based systems require careful calibration. nates of the vertices of a box can be obtained by intersect-
For optical triangulation, several images of the target object ing rays, defined by the camera’s center and the projections
with a superimposed moving pattern are usually required of the box vertices on the camera’s image plane, with the
for more accurate reconstruction. planes containing the actual faces of the box in 3D. Thus,
before we can compute the dimensions of a given box (Sec-
Labeling schemes for trihedral junctions [3, 11] have tion 3.3), we need to find the projections of the vertices on
been used to estimate the spatial orientation of polyhedral the image (Section 3.1), and then find the equations of the
objects from images. These techniques tend to be com- planes containing the box faces in 3D (Section 3.2).
putationally expensive when too many junctions are identi- In the following derivations, we assume that the origin of
fied. Additional information from the shading of the objects the image coordinate system is at the center of the image,
can be used to improve the labeling process [21]. Silhou- with the X-axis growing to the right and the Y-axis growing
ettes have been used in both computer vision and computer down, and assume that the imaged boxes have three visible
graphics for object shape extraction [12, 17]. These tech- faces. The case involving only two visible faces is similar.
niques require precise camera calibration and use silhou- Also, assume that the images used for computing the dimen-
ettes obtained from multiple images to define a set of cones sions were obtained through linear projection (i.e., using a
whose intersections approximate the shapes of the objects. pinhole camera). Although images obtained with real cam-
eras contain some amount of radial distortion, such distor-
Criminisi et al. [4] presented a technique for making 3D tions can be compensated for with the use of simple warping
affine measurements from a single perspective image. They procedures [9].
show how to compute distances between planes parallel to
a reference one. In case of some distance from a scene ele- 3.1 Finding the Projections of the Vertices
ment to the reference plane is known, it is possible to com-
pute the distances between scene points and the reference The projection of the vertices can be obtained as the cor-
plane. The technique requires user interaction and cannot be ners of the box silhouette. Although edge detection tech-
used for computing dimensions automatically. Photogram- niques [2] could be used to find the box silhouette, these
metrists have also made measurements based on single im- algorithms tend to be sensitive to the presence of other high-
ages. However, these techniques can only be applied to pla- frequency contents in the image. In order to minimize the
nar objects and require user intervention. occurrence of spurious edges and support the use of boxes
with arbitrary textures, we perform silhouette detection us-
In a work closely related to ours, Lu [16] described a ing a model for the background pixels. Since the images
method for finding the dimensions of boxes from single are acquired using a handheld camera, proper modeling of
gray-scale images. In order to simplify the task, Lu assumes the background pixels is required and will be discussed in
that the images are acquired using parallel orthographic pro- detail in Section 4.
jection and that three faces of the box are visible simultane- However, as shown in Figure 2 (a, b and c), a naive ap-
ously. The computed dimensions are approximately correct proach that just models the background and applies simple
up to a scaling factor. Also, special care is required to dis- image processing operations, like background removal and
tinguish the actual box edges from lines in the box texture, high-pass filtering, does not properly identify the silhouette
causing the method not to perform in real time. pixels of the target box (selected by the user by pointing the
laser beams onto one of its faces). This is because the scene
Our approach computes the dimensions of boxes from may contain other objects, whose silhouettes possibly over-
single perspective projection images, producing metric re- lap with the one of the target box. Also, the occurrence of
constructions in real time and in a completely automatic some misclassified pixels (see Figure 2, c) may lead to the
way. The method can be applied to boxes with arbitrary detection of spurious edges. Thus, a suitable method was
textures, can be used when only two faces of the box are developed to deal with these problems. The steps of our
visible, even when the edges of the target box are partially algorithm are shown in Figures 2 (a, d, e, f and g).
occluded by other objects in the scene. The target box silhouette is obtained starting from one
Figure 2. Identifying the target box silhouette. Naive approach: (b) Background segmentation,
followed by (c) High-pass filter (note the spurious "edge" pixels). Proposed approach: (d) Contouring
of the foreground region, (e) Contour segmentation, (f) Grouping candidate segments for the target
box silhouette, and (g) recovery of supporting lines for silhouette edges and vertices.

of the laser dots, finding a silhouette pixel and using a houette. In order to be considered a valid combination, the
contour-following procedure [8]. The seed silhouette pixel groups must satisfy the following validation rules: (i) they
for the contour-following procedure is found stepping from must characterize a convex polygon; (ii) the silhouette must
the laser dot within the target foreground region and check- have six edges (the silhouette of a parallelepiped with at
ing whether the current pixel is matches the background least two visible faces); (iii) the laser dots must be at the
model. In order to be a valid silhouette, both laser dots need same box face; and (iv) the computed lengths for pairs of
to fall inside of the contouring region. Notice this procedure parallel edges in 3D must be approximately the same. In
produces a much cleaner set of border pixels (Figure 2, d) the case of more than one combination of groups pass the
compared to results shown in Figure 2 (c). But the resulting validation tests, the system discards this ambiguous data
silhouette may include overlapping objects, and we need to and starts processing a new frame (our system is capable
identify which border pixels belong to the target box. To fa- of processing frames at the rate of about 34 fps).
cilitate the handling of the border pixels, the contour is sub- Once the box silhouette is known, the projections of the
divided into its most perceptually significant straight line six vertices are obtained intersecting pairs of adjacent sup-
segments [15] (Figure 2, e). Then, the segments resulting porting lines for the silhouette edges (Figure 2, g). Section 5
from the clipping of a foreground object against the limits discusses how to obtain those supporting lines.
of the frame (e.g., segments h and i in Figure 2, e) are dis-
carded. Since a box silhouette defines a convex polygon, the 3.2 Computing the Plane Equations
remaining segments whose two endpoints are not visible by
both laser dots can also be discarded. This test is performed The set of all parallel lines in 3D sharing the same di-
using a 2D BSP-tree [7]. In the example of Figure 2, only rection intersect at a point at infinite whose image under
segments c, d, e, o, p and q pass this test. perspective projection is called a vanishing point ω. The
line defined by all vanishing points from all sets of paral-
Still, there is no guarantee that all the remaining seg- lel lines on a plane Π is called the vanishing line λ of Π.
ments belong to the target box silhouette. In order to re- The normal vector to Π in a given camera’s coordinate sys-
strict the amount of possible combinations, the remaining tem can be obtained multiplying the transpose of the cam-
chains of segments defining convex fragments are grouped era’s intrinsic-parameter matrix by the coefficients of λ [9].
(e.g., groups A and B in Figure 2, f). We then try to find the Since the resulting vector is not necessarily a unit vector, it
largest combination of groups into valid portions of the sil- needs to be normalized. Equations (1) and (2) show the re-
lationship among the vanishing points ωi , vanishing lines λi
and the supporting lines ej for the edges that coincide with
the imaged silhouette of a parallelepiped with three visible
faces. The supporting lines are ordered clockwise.
ωi = ei × ei+3 (1)
λi = ωi × ω(i+1)mod3 (2)
where 0 ≤ i ≤ 2, 0 ≤ j ≤ 5, λi = (aλi , bλi , cλi )T and ×
is the cross product operator. The normal NΠi to plane Πi
is then given by
RK T λi Figure 3. Top view of a scene. Two laser
N Πi = (3)
RK T λi  beams apart in 3D by dlb project onto one box
face at points P0 and P1 , whose distance in 3D
where NΠi = (AΠi , BΠi , CΠi ), 0 ≤ i ≤ 2. K is the matrix
is dld . α is the angle between −L and NXZ .
that models the intrinsic camera parameters [9] and R is a
reflection matrix (Equation 4) used to make the Y-axis of
the image coordinate system grows in the up direction.
    where
f
γ ox 1 0 0 AΠ xp1 + BΠ yp1 + CΠ
 sx
 k= (9)
K =  0 sfy oy  R =  0 −1 0  (4) AΠ xp0 + BΠ yp0 + CΠ
0 0 1 0 0 1
Now, let dlb and dld be the distances, in 3D, between
In Equation (4), f is the focal length, and sx and sy are the two parallel laser beams and between the two laser dots
the dimensions of the pixel in centimeters. γ, ox and oy rep- projected onto one of the faces of the box, respectively (Fig-
resent the skew and the coordinates of the principal point, ure 3). Section 6 discusses how to find the laser dots on the
respectively. image. dld can be directly computed from NΠ , the nor-
Once we have NΠi , finding DΠi , the fourth coefficient mal vector of the face onto which the dots project, and the
of the plane equation, is equivalent to solving the projec- known distance dlb :
tive ambiguity and will require the introduction of one more
dlb dlb
constraint. Thus, consider the situation depicted in 2D in dld = = (10)
Figure 3 (right), where two laser beams, parallel to each cos(α) −(NXZ · L)
other and to the camera’s XZ plane, are projected onto one
where α is the angle between NXZ , the normalized projec-
of the faces of the box. Let the 3D coordinates of the laser
tion of NΠ onto the plane defined by the two laser beams.
dots defined with respect to the camera coordinate system
By construction, such a plane is parallel to the camera’s
be P0 = (XP0 , YP0 , ZP0 )T and P1 = (XP1 , YP1 , ZP1 )T ,
XZ plane. Therefore, NXZ is obtained by dropping the
respectively (Figure 3, left). Since P0 and P1 are on the
Y coordinate of NΠ and normalizing the resulting vector.
same plane Π, one can write
L = (0, 0, 1)T is the vector representing the laser beam di-
AΠ XP0 + BΠ YP0 + CΠ ZP0 = AΠ XP1 + BΠ YP1 + CΠ ZP1 rection. dld can also be expressed as the Euclidean distance
(5) between the two laser dots in 3D:
Using the linear projection model and given
pi = (xpi , ypi , 1)T , the homogeneous coordinates d2ld = (XP1 − XP0 )2 + (YP1 − YP0 )2 + (ZP1 − ZP0 )2 (11)
of the pixel associated with the projection of point Pi , one
can reproject pi on the plane Z = 1 (in 3D) using Substituting Equations (7), (8) and (10) into (11) and solv-
ing for ZP1 , one gets
pi = RK −1 pi (6) 
and express the 3D coordinates of the laser dots on the face d2ld
ZP1 = (12)
of the box as ak 2 − 2bk + c

XPi = xpi ZPi , YPi = ypi ZPi and ZPi (7) where a = (xp0 )2 + (yp0 )2 + 1, b = xp0 xp1 + yp0 yp1 + 1
Substituting the expression for XP0 , YP0 , XP1 and YP1 and c = (xp1 )2 +(yp1 )2 +1. Given ZP1 , the 3D coordinates
(Equation 7) in Equation (5) and solving for ZP0 , we obtain of P1 can be computed as

ZP0 = kZP1 (8) P1 = (XP1 , YP1 , ZP1 ) = (xp1 ZP1 , yp1 ZP1 , ZP1 ) (13)
The projective ambiguity can be finally removed by com- the first step, we compute E, the average color of all pixels
puting the DΠ coefficient for the plane equation of the face in images Ii , and the eigenvalues and eigenvectors associ-
containing the two dots: ated with the colors of those pixels. E and the eigenvector
associated with the highest eigenvalue define an axis in the
DΠ = −(AΠ XP1 + BΠ YP1 + CΠ ZP1 ) (14) RGB color space, called the chromaticity axis. The chro-
maticity distortion d of a given color C can be computed as
3.3 Computing the Box Dimensions the distance from C to the chromaticity axis.
After discarding the pixels whose projections on the
Having computed the plane equation, one can recover chromaticity axis have at least one saturated channel (they
the 3D coordinates of vertices of that face. For each such lend to misclassification of bright foreground pixels), we di-
vertex v on the image, we compute v  using Equation (6). vide the chromaticity axis into m slices (Figure 4). For each
We then compute its corresponding ZV coordinate by sub- slice, we compute d¯j and σd¯j , the mean and the standard de-
stituting Equation (7) into the plane equation for the face. viation, respectively, for the chromaticity distortion of the
Given ZV , both XV and YV coordinates are computed us- pixels in the slice. Then, we compute a threshold dT j for
ing Equation (7). Since all visible faces of the box share the maximum acceptable slice chromaticity distortion con-
some vertices with each other, the D coefficients for the sidering a confidence level of 99% as dT j = d¯j + 2.33σd¯j .
other faces of the box can also be obtained, allowing the Finally, the coefficients of the polynomial background
recovery of the 3D coordinates of all vertices on the box model are computed by fitting a curve through the dT val-
silhouette, from which the dimensions are computed. ues at the centers of the slices. Once the coefficients have
Although not required for computing the dimensions of been computed, the dT values are discarded and the tests
the box, the 3D coordinates of the inner vertex (see Fig- are performed against the polynomial. Figure 4 illustrates
ure 1, right) can also be computed. Its 2D coordinates can the case of a color Ck being tested against the background
be obtained as the intersection between three lines (Fig- color model. Ck is the projection of Ck on the chromaticity
ure 1, right). Each such a line is defined by a vanishing axis. In this example, as the distance between Ck and the
point and the silhouette vertex falling in between the two chromaticity axis is bigger than the threshold defined by the
box edges used to compute that vanishing point. Since it is polynomial, Ck will be classified as foreground.
unlikely that these three lines will intersect exactly at one
point, we approximate it using least-squares. Given the in-
ner vertex 2D coordinates, its corresponding 3D coordinates
can be computed using the same algorithm used to compute
the 3D coordinates of the other vertices.

4 A Model for Background Pixels

One of the most popular techniques for object segmenta-


tion is chroma keying [20]. Unfortunately, standard chroma
Figure 4. Chromaticity axis. The curve is the
keying techniques do not produce satisfactory results for our
polynomial fit to the chromaticity distortions.
application. Shading variations in the background and shad-
ows cast by the boxes usually lend to misclassification of
background pixels. Horprasert et al. [10] describe a statis- Changing the background color only requires obtaining
tical method that computes a per-pixel model of the back- samples of the new background and computing the new val-
ground from a set of static background images. While this ues for the chromaticity axis and the coefficients of the poly-
technique is fast and produces very good segmentation re- nomial. According to our experiments, 100 slices and a
sults for scenes acquired from a static camera, it is not ap- polynomial of degree 3 produce very satisfactory results.
propriate for use with moving cameras.
In order to support a moving camera, we developed an
approach that works under different lighting conditions us- 5 Identifying Almost Collinear Segments
ing a background with a known color. The approach com-
putes a statistical model of the background considering mul- To compute the image coordinates of the box vertices,
tiple possible shades of the background color and proved to first we need to obtain the supporting lines for the silhouette
be robust, lending to very satisfactory results. edges. We do this using a Hough Transform procedure [5].
The algorithm takes as input a set of n images Ii of the However, the conventional voting process and the detection
background acquired under different lighting conditions. In of the most significant lines had shown to be a bottleneck to
our system. To reduce the amount of computation, an alter- 6 Finding the Laser Dots
native to the conventional voting process was developed.
Although the silhouette pixels are organized into its most The ability to find the proper positions of the laser dots
perceptually significant straight line segments, we don’t in the image can be affected by several factors such as the
known whether two or more of these segments are pieces camera’s shutter speed, the box materials and textures, and
of the same box edge. The new voting scheme consists in ambient illumination. Although we are using a red laser
casting votes directly for the segments, instead of for indi- (650nm class II), we cannot rely simply on the red channel
vidual pixels. Thus, for each perceptually significant seg- of the image to identify the positions of the dots. Such a
ment, the parameters of the line are computed using the av- procedure would not distinguish between the laser dot and
erage position of the set of pixels that are represented by red texture elements on the box. Since the pixels corre-
the segment and by the eigenvector of the highest eigen- sponding to the laser dots present very high luminance, we
value of the pixel distribution. The use of the eigenvector identify them by thresholding the luminance image. How-
allows handling lines with arbitrary orientations in a con- ever, just simple thresholding may not work for white boxes
sistent way. or boxes containing white regions, which tend to have large
We distribute the votes in the parameter space by means areas with saturated pixels. We solved this problem by set-
of a Gaussian kernel, with votes weighted by the segment ting the camera’s shutter speed so that the laser dots are the
length. The use of a Guassian kernel distributes the votes only elements in the image with high luminance.
around a neighborhood, allowing the identification of ap-
Since the image of a laser spot is composed by several
proximately collinear segments. This is a very important
pixels, we approximate the actual position of the dot by the
feature, allowing the system to better handle discretization
centroid of its pixels. According to our experiments, a vari-
errors and boxes with slightly bent edges. The size of
ation of one pixel in the coordinates of the estimated center
the used Gaussian kernel was experimentally defined as a
of the laser spot produces a variation of a few millimeters
11 × 11 pixels. Special care must be taken when the θ pa-
in the computed dimensions. These numbers were obtained
rameter is close to 0◦ or 180◦ . In this situation, the voting
assuming a camera standing about two meters from the box.
process continues in the diagonally opposing quadrant, at
Before the position of the laser dots can be used for
the −ρ position (see peaks labeled as d and p in Figure 5).
computing dimensions, one needs to identify the face onto
which the dots fall. This is done after computing the posi-
tion of the inner vertex (Section 3.3) and checking whether
both dots fall inside one of the three quadrilaterals defined
by the edges of the box (Figure 1, right).
The system may fail to properly detect the laser dots if
they project onto some black region or if the surface con-
tains specularities that can lead to peaks in the luminance
image. This, however, can be avoided by aiming the beams
on other portions of the box. Due to the construction of
the scanner prototype and some epipolar constraints [9], we
only need to search for the laser dots inside a small window
in the image.

Figure 5. Hough Transform parameter space:


7 Results
conventional (top) and obtained with the new
voting scheme (bottom). Labeled dots repre-
sent the peaks. We have built a prototype of a scanner for comput-
ing box dimensions and implemented the techniques de-
scribed in the paper using C++. The system was tested
Using the new approach, the voting process and the peak on several real boxes. For a typical scene, such as the
detection are improved because the amount of cells that re- one shown in Figure 2, it can process the video and com-
ceive votes is greatly reduced. Figure 5 shows the parameter pute box dimensions at about 34 fps. As we replace the
space after the traditional (top) and the new (bottom) voting line-based voting scheme with the traditional pixel-based
processes have been applied to the segments shown in Fig- Hough Transform voting scheme (Figure 5, top), the rate
ure 2 (f). The parameter space was discretized using 360 drops to only 9 fps. This illustrates the effectiveness of
angular values in the range θ = [0, 180] as well as 1600 ρ the proposed voting solution. These measurements were
values in the range [−400, 400]. made on a 1.91 GHz PC with 768 MB of memory. A
video illustrating the use of our scanner can be found at ated with a pinhole camera model using computer graphics
http://www.inf.ufrgs.br/˜laffernandes/boxdimensions. techniques. In this case, the boxes are exact parallelepipeds.
Figure 1 (left) shows the scanner prototype whose hard- The positions of the laser dots on a box face are determined
ware is comprised of a firewire color camera (Point Grey by intersecting two parallel rays with one box face. As in
Research DragonFly with 640x480 pixels, with sx = sy = the case of the physical device, the camera can move freely
7.4µm [19]), a 16mm lens (Computar M1614, with man- in the 3D scene. Using images generated by the simulator,
ual focus, no iris and 30.9 degrees horizontal field of view) our system can recover the dimensions of the box with an
and two laser pointers. The camera is mounted on the plas- average relative error of 1.07%. Next, we analyze some of
tic box and the laser pointers were aligned and glued to the the results obtained on real boxes.
sides of this box. In such an assembly, the laser beams are
15.8 cm apart. For our experiments, we set the camera’s 7.1 Statistical Analysis on Real Boxes
shutter to 0.01375 seconds and acquired pictures of boxes
from distances varying from 1.7 to 3.0 meters to the camera. In order to evaluate the proposed approach, we carried
The background was created using a piece of green cloth out a few statistical experiments. Due to space limitations,
and its statistical model was computed from a set of 23 im- we will describe only one of these experiments in detail
ages. Figure 6 shows some examples of boxes used to test and will present some data about other results. We selected
our system. The boxes in the bottom row are particularly a well-constructed box (shown in Figure 6, top right) and
challenging: the one on the left is very bright and has a re- manually measured the dimensions of all its edges with a
flective plastic finishing; the one on the right is mostly cov- ruler. Each edge was measured twice, once per shared face
ered with red texture. The dimensions of these boxes vary of the edge. The eight measurements of each box dimen-
from 13.9 to 48.3 cm. In our experiments, we assumed that sions were averaged to produce a single value per dimen-
the acquired images have no skew (i.e., γ = 0) and the prin- sion. All measurements were made in centimeters. The
cipal point is at the center of the image (i.e., ox = oy = 0). average values for this box are 29.45 cm, 22.56 cm and
Due to the small field of view, we also assumed the images 15.61 cm, respectively. We then used our system to col-
contain no significant radial distortion. lect a total of 30 measurements of each dimension of the
same box. For each collected sample, we projected the laser
beams on different parts of the box. We used this data to
compute the mean, standard deviation and confidence inter-
vals for each of the computed dimensions.
 The confidence
intervals were computed as CI = x̄ − tγ √σn , x̄ + tγ √σn ,
where x̄ is the mean, σ is the standard deviation, n is the
size of sample and tγ is a t−Student variable with n − 1
degrees of freedom, such that the probability of a measure
x belongs to CI is γ. The tightest the CI, the more precise
are the computed values.
Table 1 shows the computed confidence intervals for val-
ues of γ = 80%, 90%, 95% and 99%. Note that the values
of the actual dimensions fall inside these confidence inter-
Figure 6. Examples of boxes used for testing. vals, indicating accurate measurements.

The geometry of the box is somewhat different from that Table 1. Confidence intervals for the measure-
of a parallelepiped because of imperfections introduced dur- ments for the box in Figure 6 (top right)
ing the construction process and handling. For instance,
bent edges, different sizes for two parallel edges of the same CI(γ) dim 1 dim 2 dim 3
face, lack of parallelism between faces expected to be par- CI(80%) [27.99, 29.60] [21.83, 22.60] [15.12, 15.82]
allel, and warped corners are not unlikely to be found in CI(90%) [27.76, 29.83] [21.72, 22.72] [15.02, 15.92]
practice. Such inconsistencies lend to errors in the orien- CI(95%) [27.55, 30.03] [21.62, 22.81] [14.93, 16.01]
tation of the silhouette edges, which are cascaded into the CI(99%) [27.39, 30.20] [21.54, 22.89] [14.86, 16.08]
computation of the box dimensions.
In order to estimate the inherent inaccuracies of the Another estimate of the error can be expressed as the
proposed algorithm, we implemented a simulator that per- relative error  = σ/x̄. Table 2 shows data about the di-
forms the exact same computations, but on images gener- mensions of five other boxes and the relative errors in the
measurements obtained with our scanner prototype. The er- References
rors were computed with respect to the average of the com-
puted dimensions over a set of 20 measurements. In these [1] P. Besl. Active Optical Range Imaging Sensors. Advances
experiments, the operator tried to keep the device still while in Machine Vision, in Jorge Sanz (editor), pages 1–63.
the samples were collected. The relative errors varied from Springer-Verlag, 1988.
0.20% up to 11.20%, which is in accordance with the ac- [2] J. Canny. A computational approach to edge detection. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
curacy predicted by the experiment summarized in Table 1.
8(6):679–698, November 1986.
[3] M. B. Clowes. On seeing things. Artificial Intelligence,
2:79–116, 1971.
[4] A. Criminisi, I. Reid, and A. Zisserman. Single view metrol-
Table 2. Relative errors for five real boxes. ogy. In Proceedings of the 7th IEEE International Confer-
Real Size (cm) Relative Error (%) ence on Computer Vision (ICCV-99), volume I, pages 434–
dim 1 dim 2 dim 3 n dim 1 dim 2 dim 3 441, Kerkyra, Greece, September 20-27 1999. IEEE.
35.5 32.0 26.9 68 1.44 2.68 1.40 [5] R. O. Duda and P. E. Hart. Use of the Hough transformation
to detect lines and curves in pictures. Communications of the
28.6 24.1 16.8 61 6.81 0.80 11.20
ACM, 15(1), 1972.
36.0 25.2 13.8 50 5.19 10.15 9.43 [6] F. Figueroa and A. Mahajan. A robust method to determine
29.9 29.1 22.9 69 0.20 5.47 10.22 the coordinates of a wave source for 3-D position sensing.
48.2 45.5 28.3 20 3.98 4.55 0.91 ASME Journal of Dynamic Systems, Measurements and Con-
trol, 116:505–511, September 1994.
[7] H. Fuchs et al. On visible surface generation by a priori tree
structures. In Proc. of SIGGRAPH 1980, pages 124–133.
[8] J. Gauch. KUIM, image processing system.
8 Conclusions and Future Work http://www.ittc.ku.edu/ jgauch/research, 2003.
[9] R. I. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, 2000.
[10] T. Horprasert, D. Harwood, and L. S. Davis. A statis-
We have presented a completely automatic approach for tical approach for real-time robust background subtraction
computing the dimensions of boxes from single perspective and shadow detection. In Proc. of the 7th IEEE ICCV-99,
projection images in real time. The approach uses infor- FRAME-RATE Workshop, 1999.
mation extracted from the silhouette of the target box and [11] D. A. Huffman. Impossible objects as nonsense sentences. In
removes the projective ambiguity with the use of two par- Machine Intelligence 6, pages 295–324. Edinburg University
Press, 1971.
allel laser beams. We demonstrated the effectiveness of the [12] A. Laurentini. The visual hull concept for silhouette-based
proposed techniques by building a prototype of a scanner image understanding. IEEE Trans. on PAMI, 16(2):150–162,
and using it to compute the dimensions of several real boxes February 1994.
even when the edges of the target box are partially occluded [13] M. Levoy et al. The digital Michelangelo project: 3D scan-
by other objects and under different lighting conditions. We ning of large statues. In Proc. of SIGGRAPH 2000, pages
also presented a statistical discussion of the measurements 131–144, 2000.
[14] H. C. Longuet-Higgins. A computer algorithm for recon-
made with our scanner prototype. structing a scene from two projections. Nature, 293:133–
We have also introduced an algorithm for extracting 135, September 1981.
box silhouettes in the presence of partially occluded edges, [15] D. G. Lowe. Three-dimensional object recognition from sin-
an efficient voting scheme for grouping approximately gle two-dimensional images. Artificial Intelligence, 31:355–
collinear segments using a Hough Transform, and a statis- 395, March 1987.
[16] K. Lu. Box dimension finding from a single gray-scale im-
tical model for background that works with a moving cam- age. Master’s thesis, SUNY Stony Brook, New York, 2000.
era. Our algorithm for computing box dimensions can still [17] W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and
be used with applications requiring heterogeneous back- L. McMillan. Image-based visual hulls. In Proc. of SIG-
grounds. In these situations, background detection can be GRAPH 2000, pages 369–374, 2000.
performed using a technique like the one described in [10]. [18] L. Nyland et al. The impact of dense range data on com-
puter graphics. In Proc. of Multi-View Modeling and Analy-
In this case, the camera should remain static while the boxes
sis Workshop (MVIEW99) (Part of IEEE CVPR99), 1999.
are moved on some conveyor belt. [19] Point Grey Research Inc. Dragonfly IEEE-1394 Digital
We believe that these ideas may lend to optimizations Camera System, 2.0.1.10 edition, 2002.
on several procedures that are currently based on manual [20] P. Vlahos. Composite color photography. U.S. Patent
measurements of box dimensions. We are currently explor- 3.158.477, 1964.
[21] D. L. Waltz. Generating semantic descriptions from draw-
ing ways of using arbitrary backgrounds, the use of a single ings of scenes with shadows. Technical Report MAC-TR-
laser beam, and analyzing the error propagation through the 271, Cambridge, MA, USA, 1972.
various stages of the algorithm.

View publication stats

You might also like