Minsky (1974)_Frame System
Minsky (1974)_Frame System
I ~ , - - ~ left-above-~.~(~ -
I I 1 I parallelogram
~....~L~ ,~ ,"--- i 0 > etc.
A B
I But since we know we moved to the right, we can save "B"
i /~invisible
,. d
I A E B C
i If later we,move back to the left,
I % / --"left-vertical parallelogram"
ht-vertical parallelogram"
__
square" (in space)
"II P.._hl I A pl
I tO5"
I
I
around the object. This would lead to a more comprehensive B
frame system, in which each frame represents a different
"perspective" of a cube. In figure 4 there ere three frames
~RIGH_~T~RIGHT_ ~ ~ SpatialFrames Ii
PictorialFrames
I
Rel
ccoamamtioonnM
-
t
e a
r
mrkn
eastructure
i rslin I
B _ left - C)
nrepresent
iant m odreim
invar-
left FIC~U~E ~.
properties. ensional)
(e.g. thre e -
corresponding to 45-degree MOVE-RIGHT and MOVE-LEFT
actions. If we pursue this analysis, the resulting system can
I
become very large; more complex objects need even more
different projections. It is not obvious either that all of them
are normally necessary or that just one of each variety is
adequate. It all depends.
( l ) A frame, once evoked on the basis of partial evidence
or expectat!on, would first direct a test to confirm
its own appropriateness , using knowledge about
I
It is not proposed that this kind of complicated recently noticed features, loci, relations, and
structure is recreated every time one examines an object. It
is imagined instead that a great collection of frame systems is
stored in permanent memory, and one of them is evoked
plausible Sub-frames. The current goal list is used
to decide which terminals and conditions must be
made to match reality.
I
when evidence and expectation make it plausible that the
scene in view will fit It. How are they acquired? We
propose that if a chosen frame does not fit well enough, end
if no better one is easily found, and if the matter Is important
(2) Next it would request information needed to assign
values to those terminals that cannot retain their
default assignments. For example, it might request
I
enough, then an adaptation of the best one so far discovered a description of face "C," if this terminal is
will be constructed and remembered for future use. currently unassigned, but only if it is not marked
"invisible." Such assignments must avee with the
current markers at the terminal. Thus, face "C"
I
Each frame has terminals for attaching pointers to
substructures. Different frames can share the same terminal, might already~have markers for such constraints or
which can thus correspond to the same physical feature as
seen in different views. This permits us to represent~ in a
expectations as:
I
single place, view-independent information gathered st
different times and places. This is important also in non- , Right-middle visual field.
visual applications. , Must be assigned.
, Should be visible; if not, consider
moving right.
I
. , Should be a cube-face sub-frame.
The matching.process which decides whether a
proposed frame is suitable is controlled partly by one's
current goals and partly by information attached to the frame!
* Share left vertical b o u n d a r y
terminal with face "B."
* If failure, consider box-lying-on-side
I
the frames carry terminal markers and other constraints,
frame.
while the goals are used to decide which of these constraints
are currently relevantL Generally, the matching process could
have these components:
* Same backip'ound color as face "B."
I
(3) Finally, if informed about a transformation (e.g., an
impending motion) it would transfer control to the
appropriate other frame of that system I
Within the details of the control scheme are opportunities to
embed many kinds of knowledge. When a terminol-essilpdng
attempt fails, the resulting error message can be used to
i
propose a second-guess alternative. Later it is shown how
memory can be organized into a "Similarity Network" as
proposed in Winston's thesis (TR-23|). I
lOI=
I
The simplest sort of room-frame candidate is like
I@VI@I@}l~YIII]3OhIC? the inside of a box. Following our cube-model, the room-
frame might have the top-level structure shown in figure 5.
Can one really believe that a person's appreciation
o f three-dimensional structure can be so fragmentary and
atomic as to be representable in terms of the relations
~ ~ aa
, ceiling /c
between parts of two-dimensional views? Let us separate,
at once, the two issues: is imagery symbolic? and is it based
on two-dimensional fragments? The first problem is one of I
degree; sure!y everyone would agree that at some level
vision is essentially symbolic. The quarrel would be between left wall g centerwall lh right wall
fl oor
"'•
I boundary.
Io'/
I
=,,uu,u transform in the same way a,J does its wall. If a
"center-rectangle" is drawn on a left wall it will appear to
If the new room is unfamiliar, no pre-assembled
frame can supply fine details; more scene-analysis is needed.
Even so, the complexity of the work can be reduced, given
project out because one makes the default assumption that
any such quadrilateral is actually a rectangle hence must lie
in a plane that would so project. In figure 7A, both
I
suitable subframes for constructing hypotheses about
substructures in the scene. How useful these will be
depends both on their inherent adequacy and on the quality
of the expectation process that selects which one to use
quadrilaterals could "look like" rectangh.s, but the one to the
right does not match the markers for a "left rectangle"
subframe (these require, e.g.,, that the left side be longer
I
than the right side). That rectangle is therefore represented
next. One can say a lot even about an unfamiliar roont Most
rooms are like boxes, and they can be categorized into types"
kitchen, hall, living room, theater, and so on. One knows
by a center-rectangle frame, and seems to project out as
though parallel to the center wall.
!
dozens of kinds of rooms and hundreds of particular rooms; Thus we must not simply assign the .label
one no doubt has them structured into some sort of similarity "rectangle" to a quadrilateral but to a particular frame of a
network for effective access. This will be discussed later. rectangle-system. When we move, we expect"w--hatever
A typical room-frame has three or four visible space-transformation is applied to the top-level system will
walls, each perhaps of a different "kind." One knows many be applied also to its subsystems as suggested in figure 7B.
kinds of walls: walls with windows, shelves, pictures, and Similarly the sequence of elliptical projections of a
fireplaces. Each kind of room has its own kinds of walls. A circle contains congruent pairs that are visually ambiguous as
typical wall might have a 3 x 3 array of region-terminals shown in figure 8. But because wall objects usually lie flat,
(left-center-right) x (top-middle-bottom) so that wall-objects we assume that an ellipse on a left wall is a left-ellipse,
Can be assigned qualitat!ve locations. One would further expect it to .transform the same way as the left wall, and are
want to locate objects relative to geometric inter-relations in •
surprised if the prediction is not confirmed.
I,
order to represent such facts as "Y is a little above the
center of the line between X and Z."
In three dimensions, the location of a visual feature
of a subframe is ambiguous, given only eye direction. A
feature in the middle of the visual field could belong either to
a Center Front Wall object or to a High Middle Floor object;
these attach to different subframes. The decision could
depend on reasoned evidence for support, on more directly
visual distance information derived from stereo disparity or
motion-parallax, or on plausibility information derived from
other frames: a clock would be plausible only on the wall-
frame while a person is almost certainly standing on the floor.
Given a box-shaped room, lateral motions induce
orderly changes in the quadrilateral shapes of the walls as i n
figure 6. A picture-frame rectangle, lying flat against a wall,
MOVE RIGHT
FIC~URE" Io
I
I
!
I * ~ . ~ . _ I. 7A
|
right-side r
! O()O
(B) "furiously sleep ideas green colorless"
I
I00 0 001
F~u~E 9
especially concerned with grammar. Since the meaning of an
utterance is "encoded" as much in the positional and
structural relations between the words as in the word
DEF~ILT ~ ] 6 R ~ I ~ R T choices themselves, there must be processes concerned with
analysing those relations in the course of building the
While both Seeing and Imagining r e s u l t in structures that will more directly represent the meaning.
I Io9
i
I
Utterance (B) does not get nearly so far because
no subframeaccepts any substantial fragment. As a result no
FJC--URE q
I
larger frame finds anything to match its terminals, hence
finally, no top level "meaning" or "sentence" frame can
organize the utterance as either meaningful or grammatical.
By combining this "soft" theory with gradations of assignment
I
tolerances, one could develop systems that degrade properly
for sentences with "poor" grammar rather than none~ if the
smaller fragments -- phrases and sub-clauses,-- satisfy
subframes well enough, an image adequate for certain kinds
I
of comprehension could be constructed anyway, even though
some parts of the top level structure are not entirely
satisfied, Thus, we arrive at a qualitative theory of
"grammatical:"
E I
if the top levels are satisfied but some lower terminals are
not we have a meaningless sentence; if the top is weak but
the bottom solid, we can have an ungrammatical but
~-~ .~ S ~ I
meaningful utterance.
I)t~COBI~E next-to~ ~ - - _ _ ~ ~ i
Linguistic activity involves larger structures than
can be described in terms of sentential ip'ammar, and these
larger structures further blur the distinctness of the wntex-
Butan adequate definition would need a good deal more.
What about the fact that the order of things being
transported by water currents is not ordinarily changed?. A
I
semantic dichotomy. Consider the following fable, as told by
logician might try to deduce this from a suitably intricate set
W. Chafe ( ! 972): of '"local" axioms, together with appropriate "induction"
axioms. I propose instead to represent this knowledge in a
structure that automatically translocates Spatial descriptions
l
There was once a Wolf who saw a Lamb from the terminals of one frame to those of another frame of
drinking at a river and wanted an
excuse to eat it. For that purpose,
even though he himself was upstream,
the same system. While this might be considered to be a
form of logic, it uses some of the same mechanisms designed
for spatial thinking.
I
he accused the Lamb of stirring up the In many instances we would handle a change over
water and keeping him from drinkln¢.. time, or a cause-effect relation, in the same way as we deal
with a change in position. Thus, the concept r!ver-flow could
evoke a frame-system structure something like the following,
I
To understand this, one must realize that the Wolf is lyingt where S], $2, and S3are abstract slices of the flowing river
To understand the key conjunctive "even though" one must
realize that contamination never flows upstream. This in turn
requires us to understand (among other things) the word
shown in figure 9.
There are many more nuances to fill in. What is
"stirring up" and why would it keep the wolf from drinking?
i
"upstream" itself. Within a declarative, predicate-based
"logical" system, one might try to formalize "upstream" by
some formula like:
One might normally assign default floating objects to the S's,
but here $3 interacts with "stirring up" to yield something
that "drink" does not find acceptable. Was it "deduced" that
i
stirring river-water means that $3 in the first frame should
have "mud" assigned to it; or is this simply the default
[A upstream B]
AND
[Event T, Stream muddy at A]
assignment for stirred water?
Almost any event, action, change, flow of material,
I
or even flow of information can be represented to a first
=>
[Exists
=
I needed eventually.
~CE~510~
to represent explicitly, in the frame for a scenario structure,
pointers to a collection of the most serious problems and
questions commonly associated with it.
I, I~sparage'the competition.
Make B think C wants X.
These only scratch, the surface. Trades usually occur within a
fr-ame. This mandates attention to that assignment problem
and prepares us for a Possible thematic concern. In any case,
we probably need a more active mechanism for understanding
"wondered" which can apply the information currently in the
| • scenario tied together by more than a simple chain of events
each linked to the next. No single such scenario will do;
when a clue about trading appears it is essential to guess
frame to produce an expectation of what Jane will think
about.
which of the different available scenarios is most tikely to be
useful. The key words and ideas of a discourse evoke substantial
l
i
I
EXCU~E~
Winston's thesis (TR-23]) proposes a way to
construct a retrieval system that cart represent classes but
!
has additional flexibility. His retrieval pointers can be made
We can think of a frame as describing an "ideal."
If an ideal does not match reality because it is "basically"
wrong, it must be replaced.
to represent goal requirements and action effects as well as
class memberships. !
What does it mean to expect a chair? Typically,
But it is in the nature of'ideals.that they are really elegant
simplifications; their attractiveness derives from their
simplicity, but their real power depends upon additional
knowledge about interactions between them! Accordingly we
four legs, some assortment of rungs, a level seat,
an upper back. One expects also certain relations
between these "parts." The legs must be below
I
the seat, the back above. The legs must be
need not abandon an ideal because of a failure to instantiate
it, provided one can explain the discrepancy in terms of such
an interaction. Here are some examples in which such an
"excuse" can save a failing match:
supported by the floo'r. The seat must be
horizontal, the back vertical, and so forth.
Now suppose that this description does not match;
i
the vision system finds four legs, a level plane, but
OCCLUSION: A table, in a certain view, should have four legs,
but a chair might occlude one of them One can look
for things like T-joints and shadows to support such an
no back. The "difference" between what we
expect and what we see is "too few backs." This
suggests not a chair, but a table or a bench.
t
excuse.
I
support. Therefore, a strong center post, with an
may propose a better candidate frame. Winston calls t h e
adequate base plate, should be an a c c e p t a b l e resulting structure a Similarity Network.
replacement for all the legs. Many objects are multiple
purpose and need functional rather than physical
descriptions.
Is a Similarity Network practical? At first sight,
there might seem to be a danger of unconstrained growth of
memory. If there are N frames, and K kinds of differences,
|
BROKEN: A visually missing component could be explained as
then there could be as many as K*N*N interframe pointers.
in fact physically missing, or it could be broken. One might fear that:
Reality has a variety of ways to frustrate ideals.
I structures..
Prepositional and word-order indicator
conventions.
~TCTIIR~
When replacing a frame, we do not want to start
I 5EO~EgTg TO ~EtIIOST
We can now imagine the memory system as driven
the old frame.
The rest of the system will try to placate these lobbyists, • (2) Find or build a frame that has properties
but not so much in accord with "general principles" as in
[a,b,...,z]
I
subframes left in limbo. will require application of both general and special
knowledge.
!
I
CI,~I~TES~. CI, R~E~. 6R]~ R 6EO61~RPI~IC
~IR~IhOG¥
.Suppose your car battery ruins down. You believe
I
To make the Similarity Network act more "complete," consider that there is an electricity shortage and blame the generator.
the following analogy. In a city, any person should be able to
visit any other; but we do not build a special road between
The generator can be represented as a mechanical
system: the rotor has a pulley wheel ,driven by a belt from
the engine. Is the belt tight enough? Is it even there? The
I
each pair of houses; we place a group of houses on e
output, seen mechanically, is a cable to the b a t t e r y or
"block." We do not connect roads between each pair of
blocks; but have them share streets. We do not connect ~
each town to every other; but construct main routes,
whatever. Is. it intact? Are the bolts tight? Are the brushes
pressing on the commutator?
Seen electrically, the generator is described
t
connecting the centers.of larger groups. Within such an
differently. The rotor is seen as a flux-linking coil, rather
organization, each member has direct links to some other
individuals at his own "level," mainly to nearby, highly similar
ones; but each individual has also at least a few links to
than as a rotating device. The brushes and commutator are
seen.as electrical switches. The output is current along a
pairof conductors leading from the brushes through control
I
"distinguished" members of higher level groups. The result is
that there is usually a rather short sequence between any
two individuals, if one can but find it.
A t each level, .the aggregates usually have
circuits to the battery.
The differences between the two frames are
substantial. The entire mechanical chassis of the car plays
I
distinguished loci or capitols. These serve as elements for the simple role, in the electrical frame, of one of the battery
clustering at the next level of aggregation. There is no non-
stop airplane service between New Haven and Sen Jose
because it is more efficient overall to share the "trunk" route
connections. The diagnostician has to use b o t h
representations. A failure of current to flow often means
that an intended conductor is not acting like one. For this
t
between New York and San Francisco, which are the capitols case, the basic transformation between the frames depends
at that level of aggregation.
The non-random convergences and divergences of
the similarity pointers, for each difference ~ thus tend to
on the fact that electrical continuity is in general equivalent
to firm mechanical attachment. Therefore, any conduction
disparity revealed by electrical measurements should make us
look for a corresponding disparity in the mechanical frame. In
1
structure our conceptual .world around
I
I
Sandewall, E. "Representing Natural Language Information in
!
Huffman, D. A. "Impossible Objects as Nonsense Sentences."
Machine Intelligence 6. Ed. D. Michie and B. Predicate Calculus." Machine!ntelligence 6. Ed. D.
Meltzer. Edinburgh: Edinburgh University Press,
] 972.
Michie and B~ Meltzer. Edinburgh: Edinburgh
University Press, 1972. t
Schank, R. "Conceptual Dependency: A Theory of Natural
Koffka, K. Principles of Gestalt Psychology. New York:
Harcourt, Brace and World, ] 963. Language Understanding." Cognitive Psychology
(]972), 552-63l. see also Schank, R. and K. Colby,
|
Kuhn, T. The Structure Of Scientific Revolutions. 2nd ed. Computer Models of Thought and Language. San
Chicago: University of Chicago Press, 1970. Francisco: W. I-1.Freeman, ] 973..
Pylyshyn, Z.W. "What the Mind's Eye Tells the Mind's BraiR"
I
Psychological Bulletin. 80 (1973), 1-24.