0% found this document useful (0 votes)
16 views

Shape Mode S and Object Recognition

Uploaded by

sahilgenius12345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Shape Mode S and Object Recognition

Uploaded by

sahilgenius12345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Shape Models and Object Recognition

Jean Ponce, Martha Cepeda, Sung-il Pae, and Steve Sullivan

Dept. of Computer Science and Beckman Institute


University of Illinois, Urbana, IL 61801, USA

1 Introduction
This paper discusses some problems that should be addressed by future object
recognition systems.
In particular, there are things that we know how to do today, for example:
1. Computing the pose of a free-form three-dimensional object from its outline
(e.g. [106]).
2. Identifying a polyhedral object from point and line features found in an
image (e.g., [46, 89]).
3. Recognizing a solid of revolution from its outline (e.g., [59]).
4. Identifying a face with a fixed pose in a photograph (e.g., [10, 111]).
There are, however, things that we do not know how to do today, for example:
1. Assembling local evidence into global image descriptions (grouping) and us-
ing those to separate objects of interest from the background (segmentation).
2. Recognizing objects at the category level: instead of simply identifying Bar-
ney in a photograph, recognize that he is a dinosaur.
This is of course a bit of an exageration: there is a rich body of work on
grouping and segmentation, ranging from classical models of these processes in
human vision (e.g., [65, 116]) to the ever growing number of computer vision
approaches to edge detection and linking, region merging and splitting, etc..
(see any recent textbook, e.g., [38, 73] for surveys). Likewise, almost twenty
years ago, ACRONYM recognized parameterized models of planes in overhead
images of airports [17], and the recent system described in [33] can retrieve
pictures that contain horses from a large image database. Still, segmentation
algorithms capable of supporting reliable recognition in the presence of clutter
are not available today, and there is no consensus as to what constitutes a good
representation/recognition scheme for object categories.
This paper examines some of these issues, concentrating on the role of shape
representation in recognition. We first illustrate some of the capabilities of cur-
rent approaches, then lament about their limitations, and finally discuss current
work aimed at overcoming (or at least better understanding) some of these lim-
itations.

This work was partially supported by the National Science Foundation under grant
IRI-9634312 and by the Beckman Institute at the University of Illinois at Urbana-
Champaign. M. Cepeda is now with Qualcomm, Inc. and S. Sullivan is now with
Industrial Light and Magic.

D.A. Forsyth et al. (Eds.): Shape, Contour ..., LNCS 1681, pp. 31–57, 1999.

c Springer-Verlag Berlin Heidelberg 1999
32 Jean Ponce et al.

2 The State of the Art and Its Limitations

Let us start with an example drawn from our own work to illustrate the capa-
bilities and limitations of today’s recognition technology. While it can certainly
be argued that more powerful approaches already exist (and we will indeed dis-
cuss alternatives in a little while), this will help us articulate some of the issues
mentioned in the introduction.

2.1 An Example of What Can Be Done Today

Here we demonstrate that the pose of a free-form surface can be reliably esti-
mated from its outline in a photograph. Two obvious challenges in this task are
(1) constructing a model of the free-form surface that is appropriate for pose
estimation and (2) computing the six pose parameters of the object despite the
absence of any three-dimensional information.
We have developed a method for constructing polynomial spline models of
solid shapes with unknown topology from the silhouette information contained
in a few registered photographs [106]. Our approach does not require special-
purpose hardware. Instead, the modeled object is set in front of a calibration
chart and photographed from various viewpoints. The pictures are registered
using classical calibration methods [110], and the intersection of the visual cones
associated with the photographs [9] is used to construct a G1 -continuous trian-
gular spline [22, 30, 60, 100] that captures the topology and rough shape of the
modeled object. This approximation is then refined by deforming the spline to
minimize the true distance to the rays bounding the visual cones [108]. Figure
1(a)-(b) illustrates this process with an example.
The same optimization procedure allows us to estimate the pose of a modeled
object from its silhouette extracted from a single image. This time, the shape
parameters are held constant, while the camera position and orientation are
modified until the average distance between the visual rays associated with the
image silhouette and the spline model is minimized. In fact, the residual distance
at convergence can be used to discriminate between competing object models in
recognition tasks. Figure 1(c)-(d) shows some examples. But..

2.2 Is This Really Recognition?

Of course not: we have relied on an oracle to tell us which pieces of contours


belong to which object, since the spline representation does not provide any sup-
port for top-down grouping (as shown by Fig. 1(d), occlusion is not the prob-
lem). Although it is possible that some bottom-up process (e.g., edge detection
and linking, maybe followed by the elimination of small gaps and short contour
pieces and the detection of corners and T-junctions) would yield appropriate
contour segments, this is not very likely in the presence of textured surfaces and
background clutter. Indeed, the contours used as input to the pose estimation
algorithm in Fig. 1 were selected manually [107].

You might also like