brooks1981
brooks1981
Rodney A. Brooks
A I L a b o r a t o r y , S t a n f o r d University, P a l o A l t o ,
C A 94305, U . S . A .
ABSTRACT
We describe model-based vision systems in terms of four components: models, prediction of image
features, description of image features, and interpretation which relates image features to models. We
describe details of modelling, prediction and interpretation in an implemented model-based vision
system. Both generic object classes and specific objects are represented by volume models which are
independent of viewpoint. We model complex real world object classes. Variations of size, structure
and spatial relations within object classes can be modelled. New spatial reasoning techniques are
described which are useful both for prediction within a vision system, and for planning within a
manipulation system. We introduce new approaches to prediction and interpretation based on the
propagation of symbolic constraints. Predictions are two pronged. First, prediction graphs provide a
coarse filter for hypothesizing matches of objects to image feature. Second, they contain instructions on
how to use measurements of image features to deduce three dimensional information about tentative
object interpretations. Interpretation proceeds by merging local hypothesized matches, subject to
consistent derived implications about the size, structure and spatial configuration of the hypothesized
objects. Prediction, description and interpretation proceed concurrently from coarse object subpart and
class interpretations of images, to fine distinctions among object subclasses and more precise three
dimensional quantification of objects. We distinguish our implementations from the fundamental
geometric operations required by our general image understanding scheme. We suggest directions for
future research for improved algorithms and representations.
1. Introduction
W e present both a general philosophy of model-based vision and a specific
implementation of m a n y of those ideas in the ACRONYM model-based vision
system. An earlier version of ACRONYM was described in [18]. H e r e we describe
a new version of ACRO~'YMwhich is almost a completely new implementation. It
includes new methods for modelling generic classes of objects, new techniques
for geometric reasoning, and a m e t h o d for using noisy m e a s u r e m e n t s from
images to gain three dimensional understandings about objects.
ACRONYM is a domain independent model-based vision system. The user
describes to ACatOr¢~4 classes of three dimensional objects, and their relation-
Artificial Intelligence 17 (1981) 285-348
0004--3702/81/0000-0000/$02.50 O North-Holland
286 R.A. BROOKS
ships in the world. The system tries to interpret images by locating instances of
modelled objects. The same models may be used for other purposes, such as
planning manipulator assemblies.
1.2. An o v e r v i e w of ACRONYM
1We have not yet tried to incorporate model acquisition from images. Techniques of seg-
mentation and description were developed by Nevatia and Binford [39] to build tree structured
generalized cone models of objects detected using a laser range finder. Wiston [50] has shown how
to infer object classes over variations in both size and structure from examples and non-examples
of objects. Together these techniques seem to provide a strong basis for future work on teaching
object class descriptions to ACRONYM,by showing it examples whose component parts it would first
instantiate to specializations of a library of qualitatively different generalized cone models,
including both single cones and joined cones.
288 R.A. BROOKS
The ACRONYMsystem has been used for a number of tasks other than image
understanding. D. Michael Overmeyer implemented a set of rules in the rule
language used for the predictor and interpreter, useful for planning manipula-
tor tasks. The system, GRASP[10], was given ACRONYMmodels of simple objects,
from which it automatically deduced positions and orientations which could be
grasped by a manipulator arm, and which would provide a firm stable grip on
the object. Soroka [44] has built SIMULATORon top of ACRONYM.SIMULATORis a
system for off-line debugging of manipulator programs. It uses the ACRONYM
modelling system to model manipulator arms and their environmant. The
graphics system is used to provide stereo pair of images of the scenes, so that
the user perceives a three dimensional model. Currently the system can be
driven by the output of AL [22], which is normally used to drive manipulator
arms directly. Instead SIMULATOROlives models of manipulators in real time, by
specializing the spatial relations between manipulator links. Work is underway
to extend SIMULATORto interface to other manipulator languages.
2. Model Representation
The world is described to ACRONYM as volume elements and their spatial
relationships, and as classes of objects and their subclass relationships.
A single simple mechanism is used within the geometric models to represent
variations in size, structure, and spatial relationships. Sets of constraints on
such variations specify classes of three dimensional objects. Adding constraints
specializes classes to subclasses and eventually to specific instances.
The model representation scheme used in a vision system must be able to
represent the classes of objects which the system is required to recognize.
When the representation is in world terms rather than image terms, it is
necessary that observables be computable from the representation.
Previous model-based vision systems have not made a distinction between
models of objects in world terms and models of objects in terms of directly
observable image features. The models themselves have been descriptions of
observable two dimensional image features and relations among them. Msvs [6]
models objects as usually homogeneous image regions, isis [21] includes
brightness, hue and saturation of image regions in its object models which are
constrained to meet viewpoint-dependent spatial relations. Ohta et al. [40] also
model objects as image regions, but they include shape descriptions in two
dimensions. Again viewpoint--dependent spatial relations are used.
For the general vision problem where exact contexts are unknown, and often
even approximate orientations are unknown with viewpoint-dependent image
models there must be multiple models or descriptions of a given object or
object class. Instead, viewpoint-independent models should be given to the
system. The resolution of the problem of multiple appearances from multiple
viewpoints then becomes the responsibility of the vision system itself. For a
model to be completely viewpoint-independent yet still provide shape in-
formation, it must embody the three dimensional structure of the object being
modelled. Volume descriptions are useful for other applications too. Planning
how to manipulate objects, while avoiding collisions requires volume descrip-
tions (e.g. [31, 44]). Objects can be recognized from range data, given volume
descriptions (e.g. [39, 47]). For individual applications additional information
might be included; e.g. surface properties for image understanding and density
of subparts for manipulation planning. Volume descriptions provide a common
representational basis for various distinct but possibly interacting processes,
each of which need models of the world.
SYMBOLICREASONING AMONG 3-D MODELS AND 2-D IMAGES 291
Consider the situation where the vision system is one component of a much
larger system which deals with models or representations of objects which will
appear in the images to be examined. For example in a highly automated
production system we might wish to use the CAD (computer aided design)
model of some industrial part as the only description necessary for a vision
system. It would be able to recognize, locate and orient instances of the part
when they later appear on a conveyor belt leading to a coordinated vision and
manipulation assembly station, with no description further than the CAD
model. It should not be necessary to have a human in the control path, whose
task is to understand the CAD model and then to translate it into a description
of observable features for the vision system. CAD systems for industrial parts
deal in models which are viewpoint independent and which embody a three
dimensional description of the volume occupied by the part (e.g. both the PADL
system [45] and that of Braid [16] meet these requirements; see also the survey
[4]). The representation scheme should also facilitate automatic computation of
observable features from models. Lieberman's system [28] provides for
automatic computation of silhouettes of objects as they will appear in
binary images. In general, more comprehensive descriptions of observable
features provide for robust vision in situations which are not completely con-
trolled.
ACRONYM is by no means the first model-based vision system to use volu-
metric models. Baumgart [8] and Lieberman [28] both used polyhedral
representations of objects. Nevatia and Binford [39] used generalized cones.
However ACRONYM goes beyond these systems. It has the capability to
represent generic classes of objects as well as individual specific objects, and
situations which are only partially specified and constrained, as well as specific
situations.
We do not claim that ACRONYM'Sclass mechanism is adequate for all image
interpretation tasks. In fact some of the examples below may seem to have
been carried out successfully in spite of the representation mechanism. Other
vision and modelling systems, however, do not have even that capability.
The following description of our model representation centers around the
types of things which must be represented about objects, for a variety of image
interpretation tasks. We first describe a volumetric representation for objects.
A method for describing variations in such models by describing allowed
variations in place holders for object parameters is given. This method allows
representation of variations in size, structure and position, and orientation of
objects. A class mechanism, based on specialization of variations, is built
orthogonally to the volumetric representations.
FIG. 2.1. A selection of generalized cones used by ACRONYM as primitive volume elements.
language for descriptive processes working from range data [2, 39, 43] and for
modelling systems for vision [25, 32, 34, 37].
Generalized cones [9] provide a compact, viewpoint-independent represen-
tation of volume elements. A generalized cone is defined by a planar cross
section, a space curve spine, and a sweeping rule. It represents the volume
swept out by the cross section as it is translated along the spine, held at some
constant angle to the spine, and transformed according to the sweeping rule.
Each generalized cone has its own local coordinate system. We use a right
handed system such that the initial end of the spine is at the origin, the initial
cross section lies in the y-z plane, and the x component of the directional
tangent to the spine at the origin is positive. Thus for cones where the cross
section is normal to a straight spine the latter lies in the positive x-axis.
Fig. 2.1 gives examples of generalized cones used as the primitive volume
elements in ACRONYM's representation. They include straight and circular
spines, circles and simple polygons for cross sections and sweeping rules which
can be constant, linear contractions or more generally, contractions linear in
two orthogonal directions. Cross sections may be held at any constant angle to
noncircular spines.
The internal representation of all ACRONYM data structures is frame-like in
that each data object is an instance of a unit. Units have a set of associated slots
whose fillers define their values [14]. Fig. 2.2 shows the unit representation of a
generalized cone representing the body of a particular electric motor. Its cross
section, spine and sweeping rule units are also shown. It is a simple right
S Y M B O L I C R E A S O N I N G A M O N G 3-D M O D E L S A N D 2-D I M A G E S 293
Node: ELECTRIC_MOTOR_CONE
CLASS: SIMPLE CONE
SPINE: ZOO14
SWEEPINGRULE: CONSTANT_SWEEPINGRULE
CROSS SECTION: ZO013
Node: ZOO14
CLASS: SPINE
TYPE: STRAIGHT
LENGTH: 8.0
Node: CONSTANT SWEEPING_RULE
CLASS: SWEEPINGRULE
TYPE: CONSTANT
Node: ZOO13
CLASS: CROSS_SECTION
TYPE: CIRCLE
RADIUS: 2.5
circular cylinder of length 8.0 and radius 2.5 (our system currently does not
enforce any particular units of measurement).
ACRONYM'S volumetric representation is built around units of class object (a
unit's class is given by its class slot; this corresponds roughly to the self slot of
KaL units [14]). Objects are the nodes of the object graph. The arcs are units of
class subpart and class affixment. Objects have slots for optional cone-descriptor
(which is filled with a pointer to a unit, representing a generalized cone),
subparts and a]]ixments which are filled with a list of pointers to instances of
the appropriate classes of units, and a few more which we will not discuss here.
Subpart and aflixment arcs are directional; pointing from the object whose unit
references them, to the object referenced in their object slot.
The object graph has two natural subgraphs defined by the two classes of
directional arcs. Connected components of the subpart subgraph are required
to be trees. It is intended that each such tree be arranged in a coarse to fine
hierarchy. Cutting the tree off at different depths gives models with different
levels of detail. For example the subpart tree for the electric motors illustrated
in Fig. 2.4 has a root-node whose cone descriptor is the large cylindrical body
of the motor. At the next lower level of the tree are the smaller flanges and
spindle. The coarse to fine representation has obvious utility in image under-
standing tasks. Unless ACRONYM has already hypothesized an interpretation of
some images features as an instance of an object with its own generalized cone
descriptor, it does not search for subparts of the object in the image.
Currently the user inputs the subpart trees directly; there is no enforcement
of coarse to fine levels of representation. It is certainly within the capabilities of
ACRONYM'S geometric reasoning system (see Section 4) to detect when the
condition is violated. It is eminently reasonable that in such cases the system
should build its own internal coarse to fine structure, while maintaining the
user's hierarchical decomposition for future interaction. We have not diverted
294 R.A. BROOKS
Node: Z0014
CLASS: SPINE
TYPE: STRAIGHT
LENGTH: MOTORLENGTH
Node: CONSTANT_SWEEPINGRULE
CLASS: SWEEPINGRULE
TYPE: CONSTANT
Node: ZOO13
CLASS: CROSS SECTION
TYPE: CIRCLE
RADIUS: MOTORRADIUS
FIG. 2.4. Three specializations of the generic class of small electric motors.
298 R.A. BROOKS
Notice however that ACRONYMmodels need not be restricted to use only such
simple plus-minus tolerances as are models in the PADLsystem. Tolerances can be
specified using arbitrary algebraic expressions.
2.2.2. Variations in structure
The fact that ACRONYM'Ssubpart and affixment arcs are units with quantity slots
allows a limited form of structural variation to be included in model classes.
Filling the quantity slot of a subpart arc with 1 or 0 can be used to indicate the
presence or absence of a subpart. The slot can alternately be filled with a
quantifier, constrained to be 1 or 0, to model the possibility that the subpart
may or may not be present. Similarly a variable number of identical subparts of
an object can be indicated; e.g. the number of flanges on the electric motors in
Fig. 2.4 or the number of engines on an aircraft wing.
Fig. 2.4 shows the generic model of an electric m o t o r under three different
sets of constraints which each fully determine values for the quantifiers
B A S E - Q U A N T I T Y and F L A N G E - Q U A N T I T Y which fill the obvious quantity
slots.
Given such a mechanism for representing structural variations, we must
consider what class of structure varying models we can describe. Suppose we
wish to specify that an electric motor has either a base or flanges but not both.
Furthermore if there are flanges, then there are between 3 and 6 of them. This
could be expressed with the following constraint.
((3 <~ F L A N G E - Q U A N T I T Y ~< 6) ^ (0 = B A S E - Q U A N T I T Y ) )
v ((0 = F L A N G E - Q U A N T I T Y ) ^ (1 = B A S E - Q U A N T I T Y ) ) .
Such a constraint is beyond the currently implemented capabilities of
ACRONYM. Constraints must be algebraic inequalities, with an implicit con-
junction over sets of such constraints. The explicit inclusion of logical dis-
junction requires a more comprehensive reasoning system for prediction and
interpretation than our current system (see Section 5).
Since our algebraic constraints can be nonlinear it is possible to represent
many disjunctions without overtaxing our theorem prover. In fact the above
constraint is equivalent to the following set of linear constraints:
0 ~< BASE-QUANTITY ~< 1,
0~ FLANGE-QUANTITY ~< 6,
F L A N G E - Q U A N T I T Y + 6 x B A S E - Q U A N T I T Y ~< 6,
3 ~<F L A N G E - Q U A N T I T Y + 3 x B A S E - Q U A N T I T Y .
In an ideal situation the modelling language should provide easy and natural
means for the user to specify objects and classes in as much detail as is wished.
The system should then sift out just enough detail of constraint for its own
purposes. We have not tackled these problems.
2.2.3. Variations in spatial relationships
An affixment specifies the spatial relationship between two objects by provi-
ding a product of coordinate transforms which relate the local coordinate
systems of the objects. Each coordinate transform consists of a rotation and a
translation vector. The slots in the units representing these can naturally be
filled with quantifiers or even expressions on quantifiers. Thus variable spatial
relationships can be represented.
Suppose that members of the class of electric motors with bases are going to
be placed at a work station, upright but with arbitrary orientation about the
vertical, and at a constrained but inexact position. The coordinate system of the
motor has its x-axis running along the center of the spindle, and its z-axis
vertical. The work station coordinates have a vertical Z-axis also. The position
and orientation of the motor relative to the work station could then be
represented by the transform
((2, ORI), (X-POS, Y-POS, B A S E - T H I C K N E S S + M O T O R -
RADIUS))
where ~ as usual denotes a unit vector in the z direction. Typically X-POS and
Y-POS might be constrained by
0 ~< X-POS ~ 24,
18 ~< Y-POS ~ 42,
and O R I would be left free. The geometric reasoning system described in
Section 3 manipulates such underconstrained transforms.
There is an inadequacy in such specifications of spatial relations. It is
possible to represent that aircraft can be found on runaways or on taxiways, for
instance, by affixing the generic model of an aircraft to both, using similar
coordinate transforms to those described above. It is not possible however to
specify that the one and only motor cover which will appear in an image will be
located on either the left or right parts feeder. The only way within ACRONYM'S
representational mechanism to allow for such a possibility is to express some
connected area between both parts feeders where it might be found. Such
inexactness might lead to much greater searches in locating the motor cover in
an image, given that the feeders have already been located. Again, the
reasoning systems described in the rest of this paper need major additions to
handle a more concise specification language. Furthermore the current inter-
pretation algorithm treats affixments as defining a necessary condition on
where objects are located. A more flexible scheme would allow the user to give
300 R.A. BROOKS
FIG. 2.5. Variable affixments are used to model articulated objects such as this piston assembly.
advice to look first in one location for a particular object and then in another
higher cost location if that fails.
Variable affixments can also be used to model articulated objects. Fig. 2.5
shows two views of a piston model with different values assigned to a quantifier
filling the rotation magnitude slot of the coordinate transform between the
piston and the con-rod. Constraints on the quantifier express the range of
travel of the con-rod.
T h e representation of articulated objects may be important if manipulator
arms are present in images, and it is desired to visually calibrate or servo them.
Soroka's [44] simulator is based on these representations.
Variable camera geometry can also be represented by filling the slots of the
transforms affixing the camera to world coordinates with quantifiers. Section 4
gives two examples of variable camera geometries. If the characteristics of the
imaging camera are not known exactly, the focal-ratio slot can be filled with a
quantifier rather than a number. Any image interpretation will provide in-
formation which can be used to constrain this quantifier (see Section 5 for an
example of how this comes about).
constraints. That set may be empty. We call this set the satisfying set of the
restriction node. Set inclusion on the satisfying sets provides a natural partial
order on restriction nodes, defining a distributive lattice on them. The lattice
meet operation ( A ) is used during image interpretation (see Section 5). Arcs of
the restriction graph must be directed from a less restrictive node (a larger
satisfying set) to a more restrictive node (a smaller satisfying set). Restriction-
nodes keep track of the arc relations in which they participate via suprema and
infima slots which are filled with lists of sources and destinations of incoming
and outgoing arcs respectively. It is permissible that comparable restriction
nodes do not have an explicit arc indicating that fact. In fact the restriction
graph is just that part of the restriction lattice which has been computed.
A restriction graph always includes a base-restriction node, which has an
empty set of constraints, and is thus the least restrictive node in the graph.
Every other node in the graph must be an explicitly indicated infimum of
another restriction node.
The user specifies part of the restriction graph to the system. Other parts are
added by ACRONYMwhile carrying out image understanding tasks. By contrast
the object graph is completely specified by the user, perhaps from a C A D
data-base, and remains static during image interpretation. Eventually we plan
to build from examples, using techniques of Nevatia and Binford [39].
Restriction nodes have type and specialization-of slots. In nodes specified by
the user the type slot is filled with the atom model-specialization and the
specialization-of slot with an object node from the object graph.
A restriction node specified by the user represents an object class; those
objects which have the volumetric structure modelled by the object in the
specialization-of slot subject to the constraints associated with the restriction
node.
Thus the arcs of the subgraph defined by the user specify object class
specialization. The arcs added later by ACRONYM also indicate specialization,
but of a slightly different nature. They can specialize a model for case analysis
during prediction (see Section 5.1), or they can indicate specialization implied
for a particular instance of the model by a hypothesized match with an image
feature or features (see Section 5.2).
Fig. 2.6 is a typical example of the portion of the restriction graph which the
user might specify. The constraints associated with the node generic-electric-
motor would be those described in the previous sections. The motor-with-base
node includes the additional constraints
BASE-QUANTITY = 1,
F L A N G E - Q U A N T I T Y = 0,
BASE RESTRICTION
GENERIC
ELECTRIC MOTOR
/
CARBONATOR~
MOTOR MOTOR - GAS PUMP
FIG. 2.6. Part of the restriction graph: a model class hierarchy defined by the user.
3. Constraint Manipulation
In this paper we propose a number of refined or new techniques to be used in
understanding what in a three dimensional world produces a given image.
These include volumetric representation of generic classes of three dimensional
objects, concise representation of generic spatial relationships, geometric
reasoning about uncertain situations, generic prediction of appearance of
objects, and use of information from matches of predicted and discovered,
from goal-directed search, image features to gain three dimensional knowledge
of what is in the world. In ACRONYMwe tie all these pieces together by using
systems of symbolic constraints. We do not solve such systems, but rather
propagate their implications both downward during prediction and upward
during interpretation. In our current implementation those constraints are
algebraic inequalities over a set of variables (quantifiers). Our methods pro-
pagate these nonlinear constraints and handles them algebraically, rather than
restoring to numerical approximations and tradition numerical methods for
solution.
In this section we describe some implementation-independent requirements
for a 'constraint manipulation system' (a CMS) in an ACRONYM-likesystem, and
then the particular system which we have implemented and use.
Systems of algebraic constraints have arisen in a number of domains of
artificial intelligence research.
Bobrow [13] used algebraic problems stated in English as a domain for an
early natural language understanding system. The proof of understanding was
to find a correct solution to the algebraic constraints implied by the English
sentences. The domain was restricted to those sentences which could be
represented as single linear equations and for which constraint problems could
always be solved by simple linear algebra.
Fikes [20] developed a heuristic problem-solving program, where problems
were described to the system in a nondeterministic programming language. The
constraints were a mixture of algebraic relations and set inclusion statements
over finite sets of integers, Fikes could thus solve constraint systems by
backtracking, although he included a number of heuristic constraint pro-
pagation techniques to prune the search space.
A common source of algebraic constraint systems is in systems for computer
aided design of electronic circuits. Stallman and Sussman [46] and de Kleer and
Sussman [27] describe systems for analysis and synthesis of circuits respectively.
In each case, systems of constraints are solved by using domain knowledge to
order the examination of constraints, and propagate maximal information from
one to the next. An algebraic simplifier is able to reduce these guided
constraints to simple sets of (perhaps nonlinear) constraints which can be
solved by well-known numeric methods. The original constraints usually form
too large a system to be solved in this way.
304 R.A. BROOKS
1 ma (¼
min(A, B)
cannot be made when A and B are of different signs. If their signs cannot be
determined in advance therefore, the simplification should not be made. When
invoked by the CMS, however, our simplifier will not return an expression
having the form of the left of the above equation; doing so can lead to
non-monotonicity of the system as described in point (M2) of Section 3.1.
Instead it returns expressions which may not be equal to the supplied expres-
sion. For instance, given the expression on the left above, the simplifier is
guaranteed to return an expression smaller or equal. Since such an expression
can only arise as a lower bound on some quantity (see Section 3.3.2), this
'simplification' results in at worst a weaker bound. For instance, given the
expression
1
rain(A, B, C, D ) '
the simplifier interacts with the CMS to try to determine the sign of the
expressions A, B, C and D using a method described in Section 3.3.1. Suppose
it determines that A and C are strictly negative, D is strictly positive, and the
CMS cannot determine the sign of expression B. Then the simplifier will return
the possibly smaller expression
Since such an expression could only arise as an upper bound, the result is
merely a weaker bound.
Finally we note that every term in a simplified expression is invariant when
simplified by the simplifier.
determined by the variable which appears alone on one side of the inequality.
Constraints with single variables on both sides appear twice, once in each
s u b s e t w f o r example a ~< b is associated both with variable a and variable b.
A new constraint is split into one or more inequalities. Constraints involving
an equality are split into two inequalities: A = B becomes A ~< B and A/> B.
Thus for instance the constraint x = y eventually becomes four inequalities:
y <~ x and x ~< y which are associated with x, and x ~< y and y ~< x which are
associated with y. A constraint such as A E [B, C] where A, B and C are
expressions, can similarly be broken into two inequalities. The operators max
and min are removed, and equivalent constraints derived where possible (if not
possible, then the constraint is discarded, and if externally generated, the user
is warned; there should never be such constraints generated internally). Thus
for instance m a x ( A , B ) ~ m i n ( C , D ) becomes the four constraints A ~< C,
A < ~ D , B <~C and B < ~ D .
Next the constraints are 'solved' for each variable which occurs in them; i.e.
each variable is isolated on one side of the inequality. Since inequalities are
involved, the signs of variables and expressions are important for these
solutions. Sometimes the signs cannot be determined but often they can be
deduced simply from explicit numeric bounds on variables given in earlier
constraints (see the discussion of parity below). Finally inequalities using ' t > '
are converted to use '<~'
For example, given prior constraints of y ~<-1 and x ~>0, the addition of
constraint My ~< min(-100, 2 0 0 - z ) generates the following set of constraints:
0~<X, X<~Oo,
- 100y ~< x ,
200y - yz ~< x,
-oo~< y, y~<-l,
- x / l O 0 ~< y,
1
200/x - z / x <~y'
--oo ~ Z, Z ~ oo,
z ~< 200 - My.
Constraint sets generated by the CMS always contain single numeric upper
and lower bounds on each variable--defaulted to oo or -oo if nothing more
definite is known. If a new numeric bound is added for some variable, it is
compared to the current bound (since they are both numeric or oo, or -oo they
are comparable), and the tighter bound is used.
Constraint sets are accessed by two pairs of functions. Given a set of
constraints S and a variable v, H I V A L s ( v ) and L O V A L s ( v ) return the
numeric upper and lower bounds respectively that are represented explicitly in
S. For instance, given the example set E of constraints above, HIVALE(x)
return o0 and L O V A L E ( x ) returns 0. (The CMS using SUP defined below, is
310 R.A. BROOKS
able to determine that 100 is the largest value x can have and still satisfy all the
constraints in E.)
More generally, the constraint sets are accessed via the functions U P P E R
and L O W E R which return the symbolic upper and lower bounds on a variable,
represented explicitly in the constraint set. U P P E R s ( v ) constructs an expres-
sion which applies min to the set of upper bounds on x appearing explicitly in
S. The algebraic simplifier SIMP is applied and the simplified expression
returned. Similarly L O W E R s ( v ) returns the symbolic max of the explicit lower
bounds. T h u s , for instance L O W E R s ( x ) returns max(0, -100y, 2 0 0 y - y z ) ,
while U P P E R s ( z ) constructs min(~, 2 0 0 - x / y ) which gets simplified to 2 0 0 -
x/y. These definitions of U P P E R and L O W E R closely follow those used by
Bledsoe l11] and Shostak [42]. They did not use H I V A L and L O V A L .
We digress briefly to explain an important use of H I V A L and L O V A L . They
are used by the algebraic simplifier to try to determine whether an expression
is always nonnegative (we will loosely refer to this as positive) or always
negative. We call this information the parity of an expression. If L O V A L s ( v )
and H I V A L s ( v ) have the same sign for a variable v, then v has a parity
determined by the sign. If the lower and upper numeric bounds on v have
different signs, then we say v has unknown parity. A few simple rules are used
to try to determine the parity of more complex expressions. For instance the
sum or product of two terms with the same known parity shares that parity.
The inverse of a term with known parity has that same parity. More complex
rules are possible--we have not used them.
We return now to producing a normal form for constraint sets. As symbolic
bounds are added, an attempt is made to compare them to existing bounds.
This is done by symbolically subtracting the new bound from each of the old,
simplifying the resulting expressions and applying the parity determining
function. Whenever a parity for the difference can be found, the bounds are
comparable over the ranges of variables given by H I V A L and L O V A L , and
the stronger bound can be determined from that parity.
These techniques can be used to meet requirement (A3) of Section 3.1. In
fact they also meet the ideal requirement (I3), but they do more than merely
form the union of constraint sets. Instead an equivalent set of constraints is
produced which allows for efficient operation of the bounding algorithms
described in the next section.
3.3.2. Bounding algorithms
In this section we describe algorithms used to estimate upper and lower bounds
on expressions over satisfying sets of constraint sets. They satisfy the require-
ments of (A2) of Section 3.1. They are monotonic also. Our partial decision
procedure is based on these algorithms (see Section 3.3.3).
The major algorithms SUP, SUPP and SUPPP are described in Figs. 3.1, 3.2
and 3.3 respectively. There are three similarly defined algorithms INF, INFF
S Y M B O L I C R E A S O N I N G A M O N G 3-D M O D E L S A N D 2-D I M A G E S 311
Algorithm SUPs(J, H )
IF ACTION RETURN
1. J is a number
2. J is a variable
2.1 J e l l J
2.2 8 U P s ( J , H ) is already
on the stack HIVALs(J)
2.3 y / H Let A ~- UPPERs(J)
B *-~ S U P s ( a , H U{J}) 8UPPs(J, SIMP(B),H)
4. J = " r v @ A " w h e r e r is
a number, v a variable Let B ~- SUPs(A, I f I,J{v})
4.1 v occurs in B Let C *-- SIMP("rv @ B") SUPs(C, H)
4.2 v does not occur in B Let C *- S O P s ( " r v " , H ) " C -F B "
9. J = "l/A"
9.1 A has known parity Let B *-- INFs(A, H) "I/B"
9.2 A has unknown parity Let b *~ INFs(A, 0)
c *- SUPs(A, 0)
9.2.1 b > c --OO
9.2.2 bc > 0 "l/b"
9.2.3 bc < 0 UP
Algorithm SUPcont.
ACTION R~TURN
14. SUPPPs(J,H)
INF is defined exactly symmetrically to SUP above, with the following textual substitutions: SUP -+
INF, INF ~ SUP, SUPP -~ INFF, HIVAL -~ LOVAL, UPPER -+ LOWER, rain -~ max, m a x --~ rain,
co -* - - c o and - - c o --+ co, except in the a-ction columns of 12.1, 12.2 and 12.3, SUP and INF are
not changed, while the inequalities in those if columns are reversed.
FiG. 3.1. Definition of algorithm SUP and iexical changes needed to define algorithm INF.
and INFFF whose definitions can be derived from the others by simple textual
substitutions. The necessary substitutions for each algorithm are described in
the captions of the appropriate figures.
The double quote marks around expressions in the figures mean that the
values of variables within their range should be substituted into the expression,
but no evaluation should occur. Thus, for instance, if the value of variable A is
S Y M B O L I C R E A S O N I N G A M O N G 3-D M O D E L S A N D 2-D I M A G E S 313
A l g o r i t h m SUPPs(z, Y, H )
IZ ACTION RETURN
2. z = Y co
5. SUPPPs(Y,H)
INFF is defined exactly symmetrically to SUPP above, with the following textual substitutions:
SUPP -* INFF, SUPPP --* INFFF, s i n --* max, co ~ - - c o and - - c o --* co. Also the inequalities in
4.3.2 and 4.3.3 are reversed.
FIG. 3.2. Definition of algorithm S U P P and lexical changes needed to define algorithm INFF.
Algorithm SUPPPs(Y, H )
1. Y is a number Y
2. Y is a variable
2.1 Y e l l Y
2.2 Y / H HIVALs(Y)
7. OO
FIG. 3.3. Definition of algorithm SUPPP and lexical changes needed to define algorithm INFFF.
invocation can change the running time of the algorithms by a factor of four.
This gives some indication of just how recursion-intensive these algorithms are.
Algorithms SUP, INF, SUPP and INFF are extensions to algorithms of the
same names given by Shostak [42]. Algorithms SUPPP and INFFF are new (as
is algorithm TRIG). The first five steps of our SUP and INF, minus step 2.2,
comprise Shostak's SUP and INF. Our additional steps (6-13) handle non-
linearities. Our algorithms SUPP and INFF are identical to those of Shostak,
with the addition of a final step which invokes SUPPP or INFFF in the
respective cases. For a set of linear constraints and a linear expression to
bound, our algorithms behave identically to those of Shostak.
Given a set of constraints S and an expression E, SUPs(E, r) produces an
upper bound on the values achieved by E over the satisfying set of S, and
INFs(E, r) a lower bound.
S Y M B O L I C R E A S O N I N G A M O N G 3-D M O D E L S A N D 2-D I M A G E S 315
The following descriptions give an intuitive feel for what each of algorithms
SUP, SUPP and S U P P P compute. Dual statements hold for INF, INFF and
INFFF, respectively. S is always a set of constraints and H a set of variables (i.e.
quantifiers) which occur in S.
SUPs(J, H ) : were J is a simplified (by SIMP) expression in variables constrained by S, returns an
expression E in variables in /4. In particular if H = fir, the SUP returns a number. In general, if
numerical values are assigned to variables in H and E evaluated for those assignments, then its
value is an upper bound on the value achievable by expression J over the assignments in the
satisfying set of S which have the same assignments as fixed for the variables in H.
SUPPs(x, Y, H): where x is a variable, x is not in H, and Y is a simplified expression in
variables in H O{x}, returns an upper bound for x, which is an expression in variables in H and is
computed by 'solving' x ~< Y, e.g. solving x ~ 9 - 2x yields an upper bound of 3 for x.
SUPPPs(Y, H): where Y is a simplified expression, returns an upper bound on Y, as does SUP,
but in general the bounds are weaker than those of SUP. Essentially SUP uses SUPPP when it
hasn't got specific methods to handle Y.
Algorithm T R I G is called from both SUP and INF. It is invoked with three
arguments, the first an expression, the second the symbol 'sin' or 'cos' and the
third is the symbol SUP or INF. Implicitly it has a fourth argument S which is
the constraint set. It takes lower and upper bounds on A using INFs(A, fl) and
SUPs(A, fl) and then finds the indicated bound on the indicated trigonometric
function over that interval.
Consider the example of Fig. 3.4. The given constraints are a ~> 2, b >i 1 and
ab ~<4. These are normalized by the procedure described in Section 3.3.1. Then
a trace of SUPs(a, g) is shown. It eventually returns 4 as an upper bound for a
over the satisfying set Cs of constraint set S. In fact 4 is the maximum value
which a can achieve on Cs.
Fig. 3.5 demonstrates finding an upper bound for a2b, by invoking SUPs(a2b,
fl) which returns 16. Again 16 is also the maximum value which can be achieved
by a2b over the satisfying set of S. In general, SUP will not return the
maximum value for an expression, merely an upper bound. Shostak [42] gives
an example of a linear constraint set and a linear expression to bound where it
fails to return the maximum.
Bledsoe [11] and Shostak [42] proved a number of properties of the al-
gorithms SUP and INF for sets of linear constraints and linear expressions to
be bound. The properties of interest to us are:
(P1) The algorithms terminate.
(P2) The algorithms return upper and lower bounds on expressions.
(P3) When the expression is a variable and the auxiliary set ( H in our
notation) is empty, the algorithms return a maximum and minimum (including
_+oowhen appropriate).
W e can extend the proofs of (P1) and (P2) (due to Bledsoe [11]) to our
extended algorithms.
First note that algorithms SUPPP and INFFF terminate, since all recursive
calls reduce the number of symbols in their first argument and they exit simply
316 R.A. B R O O K S
Given constraints a > 2, b > 1 and ab < 4 the normalization procedure produces as set S the
constraints:
SUPs(a, 0) =
SUPPs(a, 8IMP(SUPs(UPPERs(a), {a})), 0) Step 2.3
SUPPs(a, SIMP(SUPs(min(4, 4 X 1/b), {a})),@)
- - SUPPs(a, SIMP(min(SUPs(4, {a}), SUPs(4 X 1/b, {a}))), 0) Step 5
SUPs(4, {a}) ~ 4 Step I
SUPs(4 X 1/b, {a}) =
4 X SUPs(l/b, {a}) Step 3.2
:= 4 X 1/INFs(b, {a}) Step 9.1
4 X 1/INFFs(b, SIIdP(INFs(LOWERs(b),{a,b})),{a}) Step 2.3
4 x ,/INZFs(b, SIMP(INFs(1, {a, b})), {a})
= 4 X 1/INFFs(b, 1, {a}) Step 1
: : 4 X 1/1 Step 1 of INFF
= SOPPs(a, SIblP(rnin(4,4 X 1/1),0)
= SUPPs(a, 4, O)
--- 4 Step 1 of SUPP
FIG. 3.4. Example of algorithm S U P bounding a variable over the satisfying set of a set of constraints.
SUPs(a%, 0) :
Let B = SUPs(b, {a}) Step 10.1
= SUPPs(b, SIMP(SUPs(UPPERs(b), {b, a})), {a}) Step 2.3
= SUPPs(b, SIMP(SUPs(min(2, 4 X 1/a), {b, a})), {a})
SOPPs(b, SIgP(min(SUPs(2, {b, a}),
SUPs(4 X l/a, {b,a}))),{a}) Step 5
SUPs(2, {b,a}) = 2 Step 1
SUPs(4 X 1/a, {b,a}) =
4 X SUPs(1/a, {b, a}) Step 3.2
: 4 X 1/INFs(a, {b,a}) Step 9.1
~- 4 X 1/a Step 2.1
= sUPes(b, SIl~P(min(2, 4 X l/a)), {a})
= SUPPs(b, min(2, 4 X 1/a), { a } )
= min(2,4 X 1/a) Step 1 of SUPP
= SUPs(SIMP(a 2 rain(2, 4 X l/a)), O) Step lO.n.1
: SUPs(min(2a 2, 4a), 0)
: mi~SUPs(2a 2, O), SUPs(4a, O)) Step 5
SUPs(2a 2, @) ----~
2 X SUPs(a 2, @) Step 3.2
Let B --~ SUPs(l, {a}) Step 10.1
= 1 Step 1
= 2 X (SUPs(a, O))a Step lO.n.2
= 2 X 42 as in fig. 3.4
sues(4a, O) =
4 X SUPs(a,@) Step 3.2
= 4 X 4 as in fig. 3.4
= rain(2 X 4 ~ , 4 x 4)
= 16
FIO. 3.5. Example of algorithm S U P bounding a nonlinear expression subject to a set of nonlinear
constraints.
an identical call is made further down the computation tree. But all of these
steps make recursive calls with first arguments containing fewer symbols. The
only place the number of symbols can grow is step 2.3, and there the first
argument is a single variable. Since there are only a finite number of pairs
consisting of a variable and a subset of the variables, any infinite recursion
must include an infinite recursion on some form SUPs(v, H ) , and similarly for
INF. But step 2.2 explicitly checks for duplication of such calls on the execution
stack, so step 2.3 will not be reached (not that we can't only check for
duplications of calls of the form SUPs(v, fl), because steps 4 and 10, besides
step 2.3, can also increase the size of set H ) . That SUP and INF bound their
first argument is a straightforward extension of the proof of Bledsoe [11].
Finally, we note that many of the recursive calls to SUP and INF are of the
form SUPs(v, fl) for some variable v. Each such evaluation generates a large
computation tree. Therefore we have modified the algorithms to check for this
case explicitly. The first time such a call is made for a given set S, the result is
compared to the numeric bound on the variable v amongst the constraints in S
318 R.A. BROOKS
(as indexed with function H I V A L - - r e c a l l the normal form for constraint sets).
If the calculated bound is better, then S is changed to reflect this. Subsequent
invocations of SUPs(v, 0) on an unchanged S simply use H I V A L to retrieve the
previously calculated result. This is similar to the notion of a m e m o function as
described by Michie [36].
~/12~< T I L T ~<~/6,
-rr/12~< PAN ~<xr/12,
- oo~< S H - O R I ~< ~,
then INFs(E, {SH-X, SH-Y}) produces a lower bound of
-4.637 - 0.5 SH-X - 0.129 SH-Y,
and SUPs(E, {SH-X, SH-Y}) gives
27.188 - 0.25 SH-X + 0.129 SH-Y
as an upper bound. The only alterations we have made to the expressions
320 R.A. BROOKS
actually used and generated by the system for this example, is to reintroduce
the ' - ' sign, reorder the terms in the sums and around the numeric constants,
all to increase readability.
4. Geometric Reasoning
Geometric reasoning is making deductions about spatial relationships of
objects in three dimensions, given some description of their positions, orien-
tations and shapes. There are many straightforward, and some not straight-
forward, ways to calculate properties of spatial relationships numerically when
situations are completely specified. Given the generic classes of objects which
we model in ACRONYM and generic positions and orientations which our
representation admits, purely numerical techniques are obviously inadequate.
A number of other workers have faced similar problems in the area of planning
manipulation tasks. We briefly compare a few of their solutions to these
problems below. They can be characterized as applying analytic algebraic tools
to geometry. That is the general approach that we take. We deal with more
general situations, however, There are other approaches to these problems;
most rely on simplifying the descriptive terms to coarse predicates. Deductive
results must necessarily be similarly unrefined in nature.
Ambler and Popplestone [3] assume they are given a description of a goal
state of spatial relationships between a set of objects, such as 'against' and 'fits',
and describe a system for determining the relative positions and orientations of
objects which satisfy these relations. The method assumes that there are at
least two distinct expressions for relative positions and orientations derivable
from the constraints. These are equated to give a geometric equation. They
then use a simplifier for geometric expressions which can handle a subset of
that of our system described below in Section 4.1. Finally they use special
purpose techniques to solve the small class of simplified equations that can be
produced from the problem which can be handled by the system. The solution
may retain degrees of freedom.
Lozano-Prrez [30] attacks a similar problem, but with more restrictions on
the relationships specifiable. H e is therefore able to use simpler methods to
solve cases where there are no variations allowed in parameters. He describes a
method for extending these to cases where parameters can vary over an
interval by propagating those intervals through the constraints. H e relies on
strong restrictions on the allowed class of geometric situations for this to work.
Taylor [48] tackles a problem similar to ours. H e has positions and locations
of objects represented as parameterized coordinate transforms, and looks for
bounds on the position coordinates of objects, given constraints on the
parameters. An incomplete set of rules is used to simplify transform expres-
sions as much as possible, based on the constraints. Then, if only one rotational
degree of freedom remains, the transform is expanded into explicit coordinate
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 321
(a) (b)
Flo. 4.1. Two views of the electric screw-driver in its holder. The left (a) is from a camera a little
above the table, with variable pan and tilt. The right (b) is from a camera directly above the table,
with variable pitch and roll.
spatial relations described in the previous paragraph and due to the simple
algebraic relation in axis-magnitude representation between a rotation and its
inverse. Second, as we show in Section 4.2, we are able to make better use of
expressions describing spatial relations as combinations of simple geometric
transforms than we could make use of a single rotation and translation
expression, where the axis and magnitude of the rotation are both complex
trigonometric forms.
4.1.1. Products of rotations
Rotations of three space form a group under composition. The group is
associative so it is permissible to simplify a symbolic product of rotations by
collapsing adjacent ones with algebraically equal axis expressions (recall that we
represent rotations as magnitudes about an axis) by adding their magnitudes.
The group is not commutative, however. It is not possible to merely collect all
rotations with common axis expressions. There is a slightly weaker condition
on the elements of the group which allows partial use of this idea. Let al and a2
be vectors, and ml and mz be scalars. Then the following two identities are true
(the proof is simple but tedious and omitted here).
(al, mO*(a2, m2)= (a~, m~)*(((a~, -m2)®aO, m3,
(am, ml)*(a2, m2)--(((abrnl)@a2), m2)*(al, ml).
The geometric reasoning system of Ambler and Popplestone [3] collapses
adjacent rotations sharing common axis expressions, and uses the special case
of the above identities where a~ = ~, a2 = )3 and rn2 = ar to simplify geometric
expressions.
We use a more general special case here (and the general case in parts of the
system--see Section 5) to 'shift' rotations to the left and right in the product
expression. However, as a rotation is shifted it leaves rotations with complex
axis expressions in its wake. There is a subgroup of rotations for which these
axis expresssions are no more complex than the originals. This is the group of
24 rotations which permute the positive and negative x-, y- and z-axes among
themselves. When they are used with the above identities, the new axis
expression is a permutation of the original axis components, perhaps with some
sign changes.
Notice that these rotations are precisely the ones which relate two coordinate
systems with two (or three) parallel pairs of axes; they are very common in
models of human-made objects. We are particularly interested in a generating
subset of this rotational subgroup. It consists of the identity rotation i, and
rotations about the three coordinate axes whose magnitudes are multiples of
~r/2. We write them xl, x2, x3, yl, )'2, )/3, Zl, 22 and z3. The subscript indicates
the magnitude of the rotation as a multiple of ~r/2. We call these ten rotations
elementary. The fifteen other axis preserving rotations cannot be expressed as
rotations about a coordinate axis, but they can be expressed as a product of at
S Y M B O L I C R E A S O N I N G A M O N G 3-D M O D E L S A N D 2-D I M A G E S 325
(SR3), since at each step the application of (SR2) may produce an x-axis
elementary rotation left of that which was previously left-most. Observe
however that if ze and xe are elementary z-axis and x-axis rotations, respec-
tively, and w * ze = ze * xe, then w must be an elementary y-axis rotation. Using
this, termination follows from showing that the number of elementary rotations
in the expression, apart from a left-most z-axis elementary rotation, is reduced
by one at each phase of (SR3).
The following expression is the 'raw' product of rotations expressing the
orientation of the screwdriver tool (the only cylinder in the left hand illus-
tration of Fig. 4.1) relative to the camera. It was obtained by inverting the
rotation expression for the camera relative to world coordinates and composing
that with the expression for the orientation of the tool in world coordinates,
found by tracing down the affixment tree.
(.f, TILT) * ()3, - P A N ) * 2"3 * y3 * i * (~, SH-ORI)
• i*y3*Yl*yl*i*i*i*i. (4.2)
When we apply the five rules (SR1)-(SR5) we obtain the much simpler
expression
z3 * ()3, T I L T ) * (.~, PAN - SH-ORI). (4.3)
(In this case (SR3) had no effect.)
The appearance of a given object may be invariant with respect to certain
changes in relative orientation of object and camera. The standard form for the
rotation expressions was chosen to make it easy to further simplify the
expression by making use of such invariants. Section 4.3.2 gives an example of
this. The standard form for rotation expressions also happens to be very
convenient for the simplification of the translational component of a coordinate
transform.
4.1.2. Simplification of translation expressions
Simplification of translation expressions is quite straightforward and relies on
the rules given below. Rule (SR6) is applicable to a product of rotations in the
standard form described in the previous section. Rules (SR7)--(SRll) are
applicable to a sum of terms, each of which is a product of rotations applied to
a vector.
(SR6) Shift elementary z-axis rotations to the right end of products of
rotations.
(SR7) For each term in the sum, apply rule (SR6) to the rotation product,
then apply the elementary rotations at the right to the vector by permuting its
components and changing their signs appropriately.
(SR8) R e m o v e terms in the sum where the vector is zero.
(SR9) Collect terms with symbolically identical rotation expressions by
symbolically summing the components of the vectors to which they are applied,
then apply rule (SR8)
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 327
(SR10) In each term remove a right-most rotation from the rotation product
if its axis vector is collinear with the vector to which the product is being
applied.
(SR11) While there is a term whose right-most rotation has an axis which is
neither collinear with, nor normal to, the vector to which the product is
applied, split the vector into collinear and normal component vectors, replace
the single term with the two new ones formed in this way, and apply rule
(SR10).
In the process of determining the translation component of a transform
expression by using (4.1) the geometric simplification system simplifies all the
rotation products in the terms of the sum. T o simplify the final translation
expression, rules (SR7), (SR9) and ( S R l l ) are applied in order. The following is
the expression for the position of the screwdriver tool in camera coordinates in
the situation shown in Fig. 4.1a.
(£, TILT) t~) (0, -21.875, 0)
+ (£, TILT) • ()~, S H - O R I - PAN) t~) (0, 0, 1)
+ (£, T I L T ) • ()3, - P A N ) @ (SH-Y - 30, 0, SH-X - 83.5). (4.4)
The original unsimplified form is far too large to warrant inclusion here. The
simplified form is both tractable and useful as we will see in the next section.
Finally we note that it is simple to subtract one translation expression from
another. In the translation to be subtracted, simply negate each component of
the vector in each of its terms, symbolically add the two translations by
appending the lists of terms, and then simplify as above.
SUP of Section 3.3. The bounds are then compared to the extreme visible
image coordinates (-0.5 and 0.5 by convention in ACRONYM), and whatever
deductions possible are made (one of 'definitely invisible', 'definitely in field of
view').
For example, the expression E of Section 3.3.4 is the y camera coordinate cy
of the origin of the coordinate frame of the screwdriver tool in the camera
geometry in Fig. 4.1a. The z camera coordinate is of similar complexity. For a
focal ratio r of 2.42, algorithms INF and SUP provide bounds of -2.658 and
3.326, respectively, for the y image coordinate of the screwdriver tool. Thus
ACROr~ can deduce that the screwdriver tool may indeed be visible. For other
constraints on the position quantifiers (SH-X and SH-Y) it can be deduced
that the screwdriver tool is invisible even though its position and orientation
and the pan and tilt of the camera are all uncertain.
Similar techniques can be used to decide whether an object might occlude
another; whether over the whole range of variations in their sizes, structures
and spatial relations, over some ranges, or never. In this case it is better to
examine the translation between the origins of their coordinate frames. This
can be calculated by symbolically differencing their coordinates in the camera
frame and simplifying as in the previous section. Various heuristics (im-
plemented as rules in ACROrq'CM'S predictor) can then be used to decide about
occlusion possibilities.
For example, consider the camera overhead geometry which gives rise to the
illustration in Fig. 4.lb. The expression for the position of the screwdriver
holder base minus the position of the screwdriver tool is
(£, - R O L L ) * ()3, - P I T C H ) @ (0, 0, -2.625)
+(.f, - R O L L ) * (9, - P I T C H ) * (2, SH-ORI) @ (-1, 0, 0).
Expanding this out and applying algorithms INF and SUP gives bounds of
-1.679 and 1.679 on the x component, -1.746 and 1.746 on y, and -3.143 and
-1.932 on the z component. Thus ACaONVM can conclude that the screwdriver
holder base is always further from the camera than the screwdriver tool. One
heuristic rule concludes that since the x and y components are comparable in
size to the z component, it is possible that the screwdriver tool will appear in
front of the screwdriver holder base in images. Another rule, however, says
that since the view of the tool that can be seen (see Section 4.3 for the
deduction of the view to be seen) is small compared to the view that will be
seen of the holder base, it will not interfere significantly with observation of the
latter. (Actually in this case it is also concluded that the screwdriver tool is
occluded always by the screwdriver motor above it. Also other subparts of both
the screwdriver holder and the screwdriver itself interfere more with obser-
vation of the screwdriver holder base.)
Before leaving the subject of visibility consider the following. If an object is
visible, then its image coordinates must be within the bounds of the visible part
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 329
of the image plane. Thus the expressions for the image coordinates, as
calculated above, can be bounded above and below by 0.5 and - 0 . 5 respec-
tively, and those constraints can be merged into the constraint set. If the object is
visible it must satisfy those constraints anyway. Having the constraints expli-
citly stated may help prune some incorrect hypotheses as we will see in Section
5. Note that if the decision procedure as described in Section 3 was actually a
complete decision procedure, then we could simply merge the constraints and
test the constraint set for satisfiability to decide whether the object was visible.
However, since the decision procedure we use is only partial and cannot always
detect unsatisfiable sets of constraints, we use the less direct procedure as
described above. Even with the new constraints merged into the constraint set,
algorithms SUP and INF may not produce image coordinate bounds of 0.5 and
-0.5. This is because the bound on the expressions must be reconstructed from
the normal form of the constraints, rather than referring directly to the newly
added constraints. As we have seen in Section 3.3, SUP and INF produce only
upper and lower bounds on expressions, not suprema and infima. Futhermore,
to keep the number of symbols in the constraint set at a reasonable level we do
not use the image coordinate expressions directly in the bounds, but rather use
simplified bounding expressions as demonstrated in Section 3.3.4.
/~-"~1 - ~
• .C] / J ~ "-~-~ - - e - - ~ . . / ~- " "-~-'~"--~
~
,. ~ (/// ~ - - , . S s 7 ~
k . . " ~,~.~.'-
. ~ "~,/
(b)
FIG. 4.2. Top figure (a) shows the low level input to ACRONYM. Bottom figure (b) shows the ribbon
descriptions returned by the descriptive process, when directed by predictions to look for shapes
generated by the fuselage and wings.
332 R.A. BROOKS
cones is the angle between the spines of the two ribbons in the image
corresponding to their swept surfaces. We do not have a quantitative theory of
these correspondences.
Ellipses are a good way of describing the shapes generated by the ends of
generalized cones. The perspective projections of ends of cones with circular
cross-sections are exactly ellipses. For other cross-sections, ellipses can some-
times provide better descriptions of the ends. For example, over a given class
of orientations of a cone relative to the camera any axis of symmetry of the
cross-section is strongly skewed. Thus the axis of symmetry might be the
obvious choice for the spine of a ribbon in a geometrically simpler situation. In
a more complex situation an ellipse can provide a more tolerant prediction and
an easier descriptive hypothesis.
The descriptive module consists of two algorithms [17]: first an edge linking
algorithm based on best-first search, and second an algorithm to fit ribbons and
ellipses to sets of linked edges. The descriptive module returns a graph
structure, the observation graph. The nodes are ribbon (ellipse) descriptions.
The arcs are observed image relations between ribbons; currently we use only
image connectivity. The module produces ribbons which have straight spines
and sweeping-rules which describe linear scalings. The module provides in-
formation regarding orientation and position of the spine in image coordinates.
Fig. 4.2 demonstrates the action of the descriptive module. In Fig. 4.2a are the
692 edges found by Nevatia and Babu's [38] line finder in an 8 bit aerial image
taken from above San Francisco airport. In Fig. 4.2b are 39 ribbons found by
ACRONYM'S descriptive module when searching for shapes generated by the
fuselage and wings. There are 161 connectivity arcs in the observation graph so
produced.
4.3.2. Invariant shapes
The most important factor in predicting shape is the orientation of the object
relative to the camera. It is therefore potentially interesting to consider under
what variations in orientation of an object relative to the camera does its
perceived shape remain invariant. In fact such invariants are very useful for
reducing the complexity of the expressions derived, using the methods des-
cribed in Section 4.1.1 for object orientations, to manageable levels where shape
can be predicted directly.
Note first that for a generalized cone which is small compared to its distance
from the camera, perspective effects are small. There may still be strong
perspective effects between such objects, however. (For instance cones with
parallel spines defining a plane which is not parallel to the image plane will still
have a vanishing point.) In any case it is therefore true that in predicting the
apparent shape of such generalized cones, we can approximate the perspective
projection with a slightly simpler projection. In ACRONYMwe carry out shape
prediction using a 'perspective-normal' projection. For a generalized cone
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 333
whose closest point to the camera has z coordinate z', the projection of a point
(x, y, z) in three space into the image plane of a camera with focal ratio r is
(rx/(-z'), ry/(-z')). Intuitively we think of this as a normal projection into a
plane which is parallel to the image plane, intersects the generalized cone, and
is the closest such plane to the camera, followed by a perspective projection of
the image into the camera. It is also equivalent to a normal projection scaled
according to the distance of the object from the camera. We will see examples
of why this is so useful in Section 4.3.3. We further simplify the perspective-
normal projection in ACRO~rYMby using the z camera coordinate of the origin
of the cone coordinate frame, rather than z' as defined above.
We return now to the problem of simplifying orientation expressions while
keeping the implied shape invariant. The normal form described in Section
4.1.1 was designed with such problems in mind. First note that a rotation about
the z-axis at the left of a rotation product corresponds to a rotation in the
image plane (recall the definition of camera geometry in Section 2.1). Our two
dimensional shape descriptions are invariant with respect to orientation. So all
shape prediction is unaffected by ignoring such rotations. Thus, for instance,
(4.3) for the orientation of the screwdriver tool in Fig. 4.1a, is equivalent to
(9, TILT) • (£, PAN - SH-ORI) (4.5)
for the purpose of predicting the shape of the image of the screwdriver tool. In
general our standard form for rotation expressions has all elementary z-axis
rotations moved to the left--ready to be ignored.
The screwdriver tool is a cylinder, with its spine (an axis of radial symmetry)
along the x-axis of its coordinate system. Thus the appearance of the tool is
invariant with respect to a rotation about its £-axis. The right rotation of (4.5)
can thus be ignored for the purpose of shape prediction, leaving
()~, TILT) (4.6)
to be analyzed. In physical terms this says that the camera tilt is the only
variable of the case in Fig. 4.1a that is important for shape prediction.
Expression (4.6) is simple enough that special case rules are applicable. One
says that the cylinder will appear as a ribbon generated by its swept surface and
an ellipse generated by its initial cross-section. Furthermore they will be
connected in the image. (If the descriptive process which found ellipses was
able to accurately determine their major axis, then another useful rule could
come into play. From (4.6) it would deduce that in the image the major axis of
the ellipse will be normal to the spine of the ribbon.) Later in the prediction it
is decided that the ellipse corresponding to the top of the screwdriver tool will
actually be occluded (as described in Section 4.2), but that need not concern us
here.
The screwdriver modelled in Fig. 4.1 is actually a particular screwdriver with
specific dimensions. To make this example more general, suppose that the
334 R.A. BROOKS
screwdriver tool has variable size, with its length represented by the quantifier
T O O L - L E N G T H and its radius by T O O L - R A D I U S . Using the perspective-
normal projection approximation, the length to width ratio of the ribbon
corresponding to the swept surface of the screwdriver tool can be predicted to
be
T O O L - L E N G T H × cos(TILT)
TOOL-RADIUS.
Consider the ellipse corresponding to the top of the screwdriver tool. The ratio
of its minor axis to its major axis is simply
sin(TILT).
Thus the range of shapes that can be generated have been comprehensively
predicted. The actual form in which these predictions are used is not just to
establish a predicate against which hypothesized shape matches will be tested.
They are used in a more powerful way, described in Section 5, to actually
extract three dimensional information about the viewed scene.
Beside shape, bounds on the dimensions of objects can also be predicted.
This too is used to extract three dimensional information as described in
Section 5. Size bounds have a more immediate application, however. They are
used to direct the low level descriptive processes [17], which search the image
for candidate shapes to be matched to predictions. Given that the focal ratio is
2.42 and the length of the screwdriver tool is 1, and using the expanded z
component of (4.4), the algebraically simplified prediction of the length of the
ribbon in the image is
-2.42 cos(TILT)/(30 cos(TILT) s i n ( - P A N ) + SH-X x c o s ( - P A N )
+cos(TILT) cos(SH-ORI - PAN) - 21.875 sin(TILT)
-83.5 cos(TILT) c o s ( - P A N ) - SH-Y x cos(TILT) sin(-PAN)).
Algorithms INF and SUP are used to determine that this quantity is bounded
by 0.0190 and 0.0701, information which can be used by the descriptive
processes to limit the search for candidate ribbons in the image. These are not
particularly accurate estimates on the infimum and supremum of the above
expression because it contains sines and cosines which have coupled arguments,
but our constraint manipulation system treats them independently and makes
the most pessimistic bounds. However, they are still exceedingly useful for
limiting search.
In general, at the right of the standard form for a product of rotations is one
of six elementary rotations: i (implicitly only), Yl, y3, xl, x: and x3. If there are
no other rotations in the expression these correspond to the six views of a
generalized cone from along the positive and negative coordinate axes. Rota-
tions yl and y3 correspond to viewing the initial and final views of the swept
surface of the generalized cone. In (4.3) the right-most elementary rotation is
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 335
cases. Later we may include rules which can carry out analyses dynamically
using differential approximations to nonlinear expressions.
The preceding is a particular case of a more general phenomenon, involving
local maxima, minima, or points of inflexion of expressions. Often some
prediction is very nearly invariant over the modelled range of variations.
Where an invariant is found by ignoring a small effect of some term we call it a
quasi-invariant. The most common instances of quasi-invariants arise from
ignoring cosine terms with small arguments.
5.1. Prediction
Prediction is used to build the prediction graph which provides a description of
features and their relations which should be matched by features in an image.
Prediction has two other major uses, however. The first is to provide direction
to the low level descriptive processes. This was described in Section 4.3. l. The
SYMBOLICREASONING AMONG 3-D MODELSAND 2-D IMAGES 337
driver tool as it appears in the camera geometry illustrated in Fig. 4.1a. Given
the constraints on the location of the screwdriver holder (in terms of table
coordinates SH-X and SH-Y) we have already seen that the length of the
image ribbon corresponding to the screwdriver tool will lie between 0.0186 and
0.0701, which was obtained by bounding the formula rl/(-z') where r is the
camera focal ratio (2.42), l the normal projection length of the tool cylinder
(cos(TILT)), and z' the distance of the origin of the cylinder coordinates to the
image plane (an expression in PAN, TILT, SH-ORI, SH-X and SH-Y). We
know that if the object is visible, then it must be that z' ~< O. This information is
not derivable by the algebraic simplifier in this case because z' is a complex
expression. The prediction rules can safely assume, however, and so they
specify that when a ribbon with length estimate [mr, m~] is hypothesized as the
image of the screwdriver tool in the context of restriction node S, then
constraints obtained by evaluating the evaluating expressions within the two
inequalities
ml x INFs(z', H ) ~< SUPs(2.42 cos(TILT), H ) ,
m, x SUPs(z', H ) ~ INFs(2.42 cos(TILT), H )
should be added to the constraints already in S, where H = {PAN, TILT,
SH-ORI}. (The exact mechanism for selection of node S is given in the next
section.) In this case the constraints will further constrain SH-X and SH-Y, the
table position coordinates of the screwdriver holder. In general, addition of
such constraints may constrain positions, orientations, model size, or camera
parameters.
We demonstrate the effect of these additional constraints. We add them to
the initial modelled set of constraints only. Recall that in that case we have
0<~SH-X~<24 and 18~<SH-Y~<42.
Suppose the interpretation processes described below hypothesize a match of
the swept surfaces of the screwdriver tool with a ribbon in the image. The
descriptive processes return image measurements as nominal values with
fractional error estimates. In the example at hand, suppose that the ribbon is
measured to have length of 0.05 units, with plus or minus 10% error. Then the
additional constraints generated ensure that
4.762 ~< SH-X~< 24 and 18 ~< SH-Y ~<42.
If the length is measured as 0.07 with an error bound of 10%, then the
constraints imply that
20.127 ~< SH-X <~ 24 and 27.035 ~< SH-Y ~< 42.
Even with a 40% error estimate, a measurement of 0.07 contributes three
dimensional information. This is to be expected as it is very close to an extreme
of the predicted range of measurement.
340 R.A. BROOKS
Note that the constraints added actually contain more information than is
reflected in examining the resulting ranges on individual parameters. The
constraints added actually chip off (in general nonlinear) portions of the
original rectangle of satisfying values achievable for SH-X and SH-Y. The
actual constraints added in the first example above were
3.017 - 0.0435 x SH-X - 0.0113 x SH-Y ~< 2.338,
5.503 - 0.0460 x SH-X + 0.0138 x SH-Y I> 2.096
which are much stronger than simple linear inequalities in SH-X from these
two constraints.
There are other image measurements even from a single ribbon which can be
used to constrain three dimensional parameters. Obviously ribbon width and
taper can be used analogously to ribbon length. Position of the ribbon within
the image can also be used. In the above example it will tend to constrain
camera parameters such as PAN, TILT, and also SH-Y. Prediction rules set up
the appropriate instructions for building constraints based on these measure-
ments.
5.1.3. Multiple back constraints
The previous example deals only with constraints derivable from hypothesizing
a match with a single ribbon. In identifying instances of an object whose
description is more complex than a single generalized cone, there will be more
than one primitive shape feature matched. Each provides a number of such
back constraints which combine to further constrain the individual parameters.
Suppose an object is modelled with a well-determined size, position and
orientation. When constraints from hypothesized matches for many objects are
combined, that particular object will be extremely useful for determining
parameters of the camera and other objects. If there are many such tightly
constrained modelled objects, then they are even more useful. Thus a mobile
robot can use known reference objects to visually determine its absolute
location and orientation, and the absolute location and orientation of other
movable objects.
In a bin picking task the camera parameters and location of the bin are
probably well determined (although ACI~Or,,VMwould not be at a loss if this were
not the case). The problem is to distinguish instances of an object and
determine its orientation so that a manipulator can be commanded to pick it
from the bin. There will be many instances of each predicted image feature as
there will be many instances of each object. The back constraints provide a
mechanism for the interpretation algorithm to find mutually consistent fea-
tures, and thus identify object instances. Furthermore the back constraints
provide information on the position and orientation of the object instance.
In aerial photographs the back constraints tend to relate scale factors to
camera height and focal ratio. In aerial photographs an identifiable landmark
SYMBOLICREASONING AMONG 3-D MODELSAND 2-D IMAGES 341
can provide one tight relationship between these parameters. Derived back
constraints from other objects interact to give relatively tight bounds on all
unknowns.
5.2. Interpretation
Interpretation proceeds by combining local matches of shapes to individual
generalized cones into more global matches for more complete objects
(ACaOr~YM currently relies on shapes only). The global interpretations must be
consistent in two respects. First, they must conform to the requirements
specified by the arcs of the prediction graph. Second, the constraints that each
local match implies on the three dimensional model must be globally con-
sistent; i.e. the total set of constraints must be satisfiable.
At a given time the interpreter looks for matches for a set of generalized
cones, called the search set. They are cones determined by the predictor to
have approximately equal importance in determining an interpretation for an
image. Smaller generalized cones, corresponding to finer image features, are
searched for later. Feature predictions include both an estimated range for
feature parameters (e.g. ribbon length) and constraints on the model implied by
hypothesizing a match with an image feature. The descriptive processes are
invoked with direction from the first aspect of the predictions. The observation
graph of features and observed relations between features is the result. Since
the search set in general contains more than one generalized cone, not all the
described features will match all, or even any, generalized cones in the search
set. A comparison of all the image feature parameters with their range
predictions is carried out to determine possible matches for each generalized
cone in the search set (e.g. a ribbon's length and width must both fall in the
predicted ranges to be considered further).
There is a question of partial matches for predicted features. The current
descriptive processes used [17], partially take care of this problem in a fairly
undirected manner. If edges associated with the two ends of a ribbon are
observed by the line finder [38], then the edge linking algorithm will probably
hypothesize a ribbon, despite possible interference in the middle sections. (The
strategy which works successfully is to make as many plausible hypotheses as
possible at the lowest levels, so that the likelihood of missing a correct
hypothesis which may be locally weak is low, and use the higher level know-
ledge of the system to prune away the excess later.) Sometimes, also the
predictor will predict specific obscurations and adjust its feature prediction
accordingly. In general, however, an additional mechanism which hypothesizes
image features as partial matches for larger predictions may be very useful.
Thus a ribbon might be hypothesized as being only one end of a larger ribbon
by not requiring that it fits the length prediction. It is also necessary in this case
to increase the error estimate in the length measurement for the next stage of
342 R.A. B R O O K S
current scheme it would be forced to carry around the variables for, say, the
lengths of the engine pods of each aircraft. At best this is aesthetically
unpleasing. Worse, the increasing complexity of constrafnt sets overwhelms the
constraint manipulation system. In our current scheme, the individual inter-
pretations for the aircraft contribute knowledge derived about the rest of the
world from their local hypothesis, but then can be treated simply as atomic
aircraft instances--a higher level abstraction.
At this stage of interpretation we now have hypothesized connected com-
ponents of the interpretation graph. These may be complete components, in
that they have instances of all predicted arcs and nodes, or they may only be
partial (e.g. an interpretation may correspond to an aircraft except that no
feature was found corresponding to the port wing). With each compont is a
restriction node which describes the constraints on the three dimensional world
implied by accepting that hypothesis. A combinatorial search is now carried out
to find consistent connected components. Essentially this is done by deciding
whether the restriction node produced as the conglomeration of the component
restriction nodes is satisfiable. Conglomeration can also add constraints (equal-
ities) on quantifiers used to describe variable numbers of subparts (e.g. the
variable numbers of flanges on the electric motor in Section 2). These con-
straints too, of course, must be consistent with all the conglomerated restriction
nodes.
Eventually then, a number of interpretation graphs may be hypothesized. In
general, some will be large and mostly correct interpretation graphs and the
others will be small, consisting of individual incorrect interpretations of parts of
the image. The large graphs will be very similar in gross aspects but may differ
locally where they have accepted slightly different local interpretations. A
single interpretation can be synthesized from the gross similarities. Our
experience with our earlier interpretation algorithms suggests that the number
of large interpretation graphs will typically be on the order of less than five and
most likely only one or two. A large correct interpretation graph has associated
with it a restriction node which specializes both object models and their spatial
relations to the three dimensional understanding of the world derived from the
feature prediction hypothesized matches in the interpretation graph. Other
restriction nodes associated with components of the total interpretation may
contain extra three dimensional information pertinent to the appropriate local
interpretation.
A final aspect of this scheme for interpretation is the ease with which
subclass identification can be carded out once class identification has been
achieved. Suppose we have an interpretation of a set of image features as an
electric motor (see Section 2 for the subclass definitions of this example).
Associated with that interpretaton is a restriction node. We can immediately
check whether the interpretation is consistent with the object being an instance
of some subclass of electric motors, e.g. carbonator motor, by taking the lattice
346 R.A. BROOKS
6. Conclusion
We have concentrated on the predictive aspects of vision in this paper and
indeed in the ACRONYMsystem as a whole. This is not to say that descriptive
processes are not vitally important for robust and accurate vision. Rather, we
are investigating the question of how to use models independently of particular
descriptive processes which may eventually be available.
In investigating the use of models for vision we have found that many of the
requirements for the modelling and spatial understanding system are exactly
those needed in other areas of motor-sensory functions. The same models and
geometric reasoning capabilities are extremely useful for robot mobility and
manipulation. We have derived techniques to automatically deduce three
dimensional information from descriptions of monocular images in a general
way.
The particular class representation is not universal. We have shown,
however, how to use classes of models for understanding images. A more
general representation of classes, e.g. inclusion of disjunctions in constraints,
would require an upgrade of the various computing engines described (e.g. the
constraint manipulation system, and the geometric reasoning system).
However, the interaction of these parts of the system could still operate in
much the same manner.
Finally notice that there is no notion of assigning probabilities to local or
global interpretations, nor is there any underlying statistical model. ACRONYM
only 'labels' parts of an image for which it can find a globally consistent
interpretation.
ACKNOWLEDGMENT
Much of this work, especially that of Sections 2 and 5 has been carried out under close advice
from my thesis advisor Thomas Binford.
REFERENCES
1. Abraham, R.G., Csakvary, T., Korpela, J., Shum, L., Stewart, R.J.S. and Taleff, A., Program-
mable assembly research technology:Transfer to industry, 4th Bi-Monthly Report, NSF Grant
ISP 76-2416,1, Westinghouse R&D Center, Pittsburgh, June 1977.
2. Agin, G. J., Representation and description of curved objects, Memo AIM 173, Stanford
University AI Lab (1972).
SYMBOLIC REASONING AMONG 3-D MODELS AND 2-D IMAGES 347
3. Ambler, A.P. and Popplestone, R.J., Inferring the positions of bodies from specified spatial
relationships, Artificial Intelligence 6 (1975) 175-208.
4. Baer, A., Eastman, C. and Henrion, M., A survey of geometric modeling, CMU Institute of
Physical Planning, Research Rept. No. 66 (1977).
5. Baker, H.H., Edge based stereo correlation, Proceedings ARPA Image Understanding Work-
shop, College Park, MD (1980) 168-175.
6. Barrow, H.G. and Tenenbaum, J.M., MSYS: A system for reasoning about scenes, SRI AI
Center, Tech. Note 121 (1976).
7. Barrow, H.G. and Tenenbaum, J.M., Recovering intrinsic scene characteristics from images,
in: A. Hanson and E. Riseman, Eds., Computer Vision Systems (Academic Press, New York,
1978).
8. Baumgart, B.G., Geometric modeling for computer vision, Memo AIM 249, Stanford Uni-
versity AI Lab, (1974).
9. Binford, T.O., Visual perception by computer, invited paper at IEEE Systems Science and
Cybernetics Conference, Miami, Dec. 1971.
10. Binford, T.O., Computer integrated assembly systems, Proceedings NSF Grantees Conference
on Industrial Automation, Cornell Univ., Ithaca, Sep. 1979.
11. Bledsoe, W.W., The sup--inf method in Presburger arithmetic, Memo ATP 18, Dept. of Math.
and Comp. Sei., University of Texas at Austin, Austin, Texas (1974).
12. Bledsoe, W.W., A new method for proving certain Presburger formulas, Proceedings of IJCAI
4, Tibilsi, Georgia, U.S.S.R. (1975) 15-21.
13. Bobrow, D.G., Natural language input for a computer problem solving system, in: M.L.
Minsky, Ed., Semantic Information Processing (MIT Press, Cambridge, MA, 1968).
14. Bobrow, D.G. and Winograd, T., An overview of KRL, a knowledge representation language,
Cognitive Sci. 1 (1977) 3--46.
15. Borning, A., THINGLAB: m constraint-oriented simulation laboratory, Stanford CS Report,
STAN-CS-79-746 (July 1979).
16. Braid, I.C., Designing With Volumes (Cantab Press, Cambridge, England, 1973).
17. Brooks, R.A., Goal-directed edge linking and ribbon finding, Proceedings ARPA Image
Understanding Workshop, Menlo Park, CA (1979) 72-76.
18. Brooks, R.A., Greiner, R. and Binford, T.O., The ACRONYM model-based vision system,
Proceedings IJCAI 6, Tokyo (1979) 105-113.
19. Brooks, R.A. and Binford, T.O., Representing and reasoning about partially specified scenes,
Proceedings ARPA Image Understanding Workshop, College Park, MD (1980) 95-103.
20. Fikes, R.E., Ref-ARF: A system for solving problems stated as procedures, Artificial In-
telligence ! (1970) 27-120.
21. Garvey, T.D., Perceptual strategies for purposive vision, SRI AI Center, Tech. Note 117
(1976).
22. Goldman, R., Recent work with the AL system, Proceedings IJCAI 5, Cambridge (1977)
733--735.
23. Grimson, W.E.L., Aspects of a computational theory of human stereo vision, Proceedings
ARPA Image Understanding Workshop, College Park, MD (1980) 128--149.
24. Grossman, D.D., Monte Carlo simulation of tolerancing in discrete parts manufacturing and
assembly, Memo AIM 280, Stanford University AI Lab (1976).
25. Hollerbach, J., Hierarchical shape description of objects by selection and modification of
prototypes, Tech. Rept. AI-TR-346, M1T, Cambridge (1975).
26. Horn, B.K.P., Obtaining shape from shading information, in: P.H. Winston, Ed., The Psy-
chology of Computer Vision (McGraw-Hill, New York, 1975).
27. de Kleer, J. and Sussman G.J., Propagation of constraints applied to circuit synthesis, Memo
AIM 485, MIT, Cambridge (1978).
348 R.A. BROOKS
28. Lieberman, L., Model-driven vision for industrial automation, in: P. Stucki, Ed., Advances in
Digital Image Processing: Theory, Application, Implementation (Plenum Press, New York.
1979).
29. Lowe, D., Solving for the parameters of object models from image descriptions, Proceedings
ARPA Image Understanding Workshop, College Park, MD (1980) 121-127.
30. Lozano-P~rez, T., The design of a mechanical assembly system, Tech. Rept. AI-TR-397, MIT,
Cambridge (1976).
31. Lozano-P6rez, T. and Wesley, M.A., An algorithm for planning collision-free paths among
polyhedral obstacles, Comm. ACM 22 (1979) 560-570.
32. Marr, D., Visual information processing: The structure and creation of visual representations,
Proceedings IJCAI 6, Tokyo (1979) 1108-1126.
33. Mart, D. and Hiidreth, E., Theory of edge detection, Memo AIM 518, MIT, Cambridge (1979).
34. Mart, D. and Nishihara, H.K., Representation and recognition of the spatial organization of
three-dimensional shapes, Memo AIM 377, MIT, Cambridge (1976).
35. McDermott, D., A theory of metric spatial inference, Proceedings of the First Annual National
Conference of on Artificial Intelligence, Stanford (1980) 246-248.
36. Michie, D., Memo functions: A language feature with rote-learning properties, Proceedings
IFIP, 1968.
37. Miyamoto, E. and Binford, T.O., Display generated by a generalized cone representation.
IEEE Conference on Computer Graphics and Image Processing, May 1975.
38. Nevatia, R. and Ramesh Babu, K., Linear feature extraction and description, Comput.
Graphics and Image Processing 13 (1980) 257-269.
39. Nevatia, R. and Binford, T.O., Description and recognition of curved objects, Artificial
Intelligence 8 (1977) 77-98.
40. Ohta, Y., Kanade, T. and Sakai, T., A production system for region analysis, Proceedings
IJCAI 6, Tokyo (1979) 684---686.
41. Shapiro, L.G., Moriarty, J.D. Mulgaonkar, P.G. and Haralick, R.M., Sticks, plates, and blobs:
A three-dimensional object representation for scene analysis, Proceedings~of the First Annual
National Conference on Artificial Intelligence, Stanford (1980) 28-30.
42. Shostak, R.E., On the sup-inf method for proving Presburger formulas, J. Assoc. Comput. Mach.
24 (1977) 529-543.
43. Soroka, B.I., Understanding objects from slices: Extracting generalised cylinder descriptions
from serial sections, Tech. Rept. TR-79-1, Dept. of Computer Science, University of Kansas,
Lawrence (1979).
44. Soroka, B.I., Debugging manipulator programs with a simulator, to be present at CAD/CAM
8, Anaheim, Nov. 1980.
45. Staff, An introduction to PADL: characteristics, status, and rationale, Production Automation
Project, Tech. Merao TM-22, University of Rochester, Rochester (1974).
46. Stallman, R. and Sussman, G.J., Forward reasoning and dependency--Directed backtracking
in a system for computer-aided circuit analysis, Artifwial Intelligence 9 (1977) 135-196.
47. Sugihara, K., Automatic construction of junction directionaries and their exploitation for the
analysis of range data, Proceedings of IJCAI 6, Tokyo (1979) 859-864.
48. Taylor, Russel, H., A synthesis of manipulator control programs from task-level specifications,
Memo AIM 282, Stanford University AI Lab (1976).
49. Waltz, D,, Understanding line drawings of scenes with shadows, in: P.H. Winston, Ed,, The
Psychology of Computer Vision (McGraw-Hill, New York, 1975).
50. Winston, P.H., Learning structural descriptions from examples, in: P.H. Winston, Ed., The
Psychology of Computer Vision, (McGraw-Hill, New York, 1975).
51. Woodham, R.J., Relating properties of surface curvature to image inti~nsity, Proceedings of
IJCAI 6, Tokyo (1979) 971-977.
R e c e i v e d N o v e m b e r 1980