Direct Manipulation Interfaces
Direct Manipulation Interfaces
31 1-338
Copyright Q 1985, Lawrence Erlbaum Associates, Inc.
ABSTRACT
Direct manipulation has been lauded as a good form of interface design, and
some interfaces that have this property have been well received by users. In this
article we seek a cognitive account of both the advantages and disadvantages of
direct manipulation interfaces. We identify two underlying phenomena that
give rise to the feeling of directness. One deals with the information processing
distance between the user's intentions and the facilities provided by the ma-
chine. Reduction of this distance makes the interface feel direct by reducing
the effort required of the user to accomplish goals. The second phenomenon
concerns the relation between the input and output vocabularies of the inter-
face language. In particular, direct manipulation requires that the system pro-
vide representations of objects that behave as if they are the objects themselves.
This provides the feeling of directness of manipulation.
A version of this paper also appears as a chapter in the book, User Centered System De-
sign: New Pmpectives on Human-Computer Interaction (Norman & Draper, 1986).
Authors' present address: Edwin L. Hutchins, James D. Hollan, and Donald A. Nor-
man, Institute for Cognitive Science, University of California at San Diego, La Jolla,
CA 92093.
312 HUTCHINS, HOLLAN, NORMAN
CONTENTS
1. DIRECT MANIPULATION
1.1. Early Examples of Direct Manipulation
1.2. The Goal: A Cognitive Account of Direct Manipulation
2. TWO ASPECTS OF DIRECTNESS: DISTANCE AND ENGAGEMENT
2.1. Distance
2.2. Direct Engagement
3. TWO FORMS OF DISTANCE: SEMANTIC AND ARTICULATORY
3.1. Semantic Distance
3.2. Semantic Distance in the Gulfs of Execution and Evaluation
The Gulf of Execution
The Gulf of Evaluation
3.3. Reducing the Semantic Distance That Must Be Spanned
Higher-Level Languages
Make the Output Show Semantic Concepts Directly
Automated Behavior Does Not Reduce Semantic Distance
The User Can Adapt to the System Representation
Virtuosity and Semantic Distance
3.4. Articulatory Distance
3.5. Articulatory Distance in the Gulfs of Execution and Evaluation
4. DIRECT ENGAGEMENT
5. A SPACE OF INTERFACES
6. PROBLEMS WITH DIRECT MANIPULATION
I. DIRECT MANIPULATION
MENU
HUTCHINS, HOLLAN, NORMAN
Now consider how we could partition the data. Suppose one result of our
analysis was the scatter diagram shown in Figure 2. The straight line that has
been fitted through the points is clearly inappropriate. The data fall into two
quite different clusters and it would best to analyze each cluster separately. In
the actual data matrix, the points that form the two clusters might be scattered
randomly throughout the data set. The regularities are apparent only when we
plot them. How do we pull out the clusters? Suppose we could simply circle the
points of interest in the scatter plot and use each circled set as if it were a new
matrix of values, each of which could be analyzed in standard ways, as shown
in Figure 2B.
The examples of Figures 1 and 2 illustrate a powerful manipulation medium
for computation. The promise of direct manipulation is that instead of an ab-
stract computational medium, all the "programmin$ is done graphically, in a
form that matches the way one thinks about the problem. The desired opera-
tions are performed simply by moving the appropriate icons onto the screen
and connecting them together. Connecting the icons is the equivalent of writ-
ing a program or calling on a set of statistical subroutines, but with the advan-
tage of being able to directly manipulate and interact with the data and the
connections. There are no hidden operations, no syntax or command names to
learn. What you see is what you get. Some classes of syntax errors are elimi-
nated. For example, you can't point at a nonexistent object. The system re-
quires expertise in the task domain, but only minimal knowledge of the com-
puter or of computing.
The term direct manipulation was coined by Shneiderman (1974, 1982, 1983)
to refer to systems having the following properties:
Figure 2. (A) The scatter plot formed in Figure 1, along with the best fitting re-
gression line to the data. It is clear that the data really fall into two quite distinct
clusters and that it would be best to look at each independently. (B) The clusters are
analyzed by circling the desired data, then treating the group of circled data as if
they were a new matrix of values, which can be treated as a data source and ana-
lyzed in standard ways.
Can this really be true? Certainly there must be problems as well as benefits.
It turns out that the concept of direct manipulation is complex. Moreover, al-
though there are important benefits there are also costs. Like everything else,
direct manipulation systems trade off one set of virtues and vices against an-
other. It is important that we understand these trade-offs. A checklist of surface
features is unlikely to capture the real sources of power in direct manipulation
interfaces.
mark not only because of historical priority but because of the ideas that he
helped develop: He was one of the first to discuss the power of graphical inter-
faces, the conception of a display as "sheets of paper," the use of pointing de-
vices, the virtues of constraint representations, and the importance of de-
picting abstractions graphically.
Sutherland's ideas took 20 years to have widespread impact. The lag is per-
haps due more to hardware limitations than anything else. Highly interactive,
graphical programming requires the ready availability of considerable
computational power, and it is only recently that machines capable of sup-
porting this type of computational environment have become inexpensive
enough to be generally available. Now we see these ideas in many of the
computer-aided design and manufacturing systems, many of which can trace
their heritage directly to Sutherland's work. Borning's ThingLab program
(1979) explored a general programming environment, building upon many of
Sutherland's ideas within the Smalltalk programming environment. More re-
cently direct manipulation systems have been appearing with reasonable fre-
quency. For example, Bill Budge's Pinball Construction Set (Budge, 1983) permits
a user to construct an infinite variety of electronic pinball games by directly
manipulating graphical objects that represent the components of the game sur-
face. Other examples exist in the area of intelligent training systems (e.g., the
Steamer system of Hollan, Hutchins, & Weitzman, 1984; Hollan, Stevens, &
Williams, 1980). Steamer makes use of similar techniques and also provides
tools for the construction of interactive graphical interfaces. Finally, spread-
sheet programs incorporate many of the essential features of direct manipula-
tion. In the lead article of ScaentifiGAmetdcan's special issue on computer soft-
ware, Kay (1984) claims that the development of dynamic spreadsheet systems
gives strong hints that programming styles are in the offing that will make pro-
gramming as it has been done for the past 40 years-that is, by composing text
that represents instructions - obsolete.
We see promise in the notion of direct manipulation, but as yet we see no ex-
planation of it. There are systems with attractive features, and claims for the
benefits of systems that give the user a certain sort of feeling, and even lists of
properties that seem to be shared by systems that provide that feeling, but no
account of how particular properties might produce the feeling of directness.
The purpose of this article is to examine the underlying basis for direct manip-
ulation systems. O n the one hand, what is it that provides the feeling ofudirect-
ness?" Why do direct manipulation systems feel so natural? What is so
compelling about the notion? O n the other hand, why can using such systems
sometimes seem so tedious?
DIRECT MANIPULATION INTERFACES 3 17
For us, the notion of "direct manipulation" is not a unitary concept, nor even
something that can be quantified in itself. It is an orienting notion. "Direct-
ness* is an impression or a feeling about an interface. What we seek to do here
is to characterize the space of interfaces and see where within that picture the
range of phenomena that contribute to the feeling of directness might reside.
The goal is to give cognitive accounts of these phenomena. At the root of our
approach is the assumption that the feeling of directness results from the com-
mitment of fewer cognitive resources. O r , put the other way around, the need
to commit additional cognitive resources in the use of an interface leads to the
feeling of indirectness. As we shall see, some of the production of the feeling of
directness is due to adaptation by the user, so that the designer can neither
completely control the process, nor take full credit for the feeling of directness
that may be experienced by the user.
We will not attempt to set down hard and fast criteria under which an inter-
face can be classified as direct or not direct. The sensation of directness is al-
ways relative; it is often due to the interaction of a number of factors. There are
costs associated with every factor that increases the sensation of directness. At
present we know of no way to measure the trade-off values, but we will attempt
to provide a framework within which one can say what is being traded off
against what.
There are two distinct aspects of the feeling of directness. One involves a no-
tion of the distance between one's thoughts and the physical requirements of
the system under use. A short distance means that the translation is simple and
straightforward, that thoughts are readily translated into the physical actions
required by the system and that the system output is in a form readily inter-
preted in terms of the goals of interest to the user. We will use the term directness
to refer to the feeling that results from interaction with an interface. The term
distance will be used to describe factors which underlie the generation of the
feeling of directness.
The second aspect of directness concerns the qualitative feeling of engage-
ment, the feeling that one is directly manipulating the objects of interest.
There are two major metaphors for the nature of human-computer interaction,
a conversation metaphor and a model-world metaphor. In a system built on
the conversation metaphor, the interface is a language medium in which the
user and system have a conversation about an assumed, but not explicitly rep-
resented world. In this case, the interface is an implied intermediary between
the user and the world about which things are said. In a system built on the
model-world metaphor, the interface is itself a world where the user can act,
HUTCHINS, HOLLAN, NORMAN
and which changes state in response to user actions. The world of interest is ex-
plicitly represented and there is no intermediary between user and world. Ap-
propriate use of the model-world metaphor can create the sensation in the user
of acting upon the objects of the task domain themselves. We call this aspect of
directness direct engagement.
2.1. Distance
We call one underlying aspect of directness distance to emphasize the fact that
directness is never a property of the interface alone, but involves a relationship
between the task the user has in mind and the way that task can be accom-
plished via the interface. Here the critical issues involve minimizing the effort
required to bridge the gulf between the user's goals and the way they must be
specified to the system.
An interface introduces distance to the extent there are gulfs between a per-
son's goals and knowledge and the level of description provided by the systems
with which the person must deal. These are referred to as the gulfofexecution and
the gulfofevaluation (Figure 3). The gulf of execution is bridged by making the
commands and mechanisms of the system match the thoughts and goals of the
user. The gulf of evaluation is bridged by making the output displays present a
good conceptual model of the system that is readily perceived, interpreted, and
evaluated. The goal in both cases is to minimize cognitive effort.
We suggest that the feeling of directness is inversely proportional to the
amount of cognitive effort it takes to manipulate and evaluate a system and.
moreover, that cognitive effort is a direct result of the gulfs of execution and
evaluation. The better the interface to a system helps bridge the gulfs, the less
cognitive effort needed and the more direct the resulting feeling of interaction.
EXECUTION
PHYSICAL
SYSTEM
EVALUATION
Figure 4. Every expression in the interface language has a meaning and a form.
Semantic distance reflects the relationship between the user intentions and the
meaning of expressions in the interface languages both for input and output. Artic-
ulatory distance reflects the relationship between the physical form of an expres-
sion in the interaction language and its meaning, again, both for input and output.
The easier it is to go from the form or appearance of the input or output to meaning,
the smaller the articulatory distance.
INTERFACE LANGUAGE
Form of
Expreaalon
ties: (1) The designer can construct higher-level and specialized languages that
move toward the user, making the semantics of the input and output languages
match that of the user. (2) The user can develop competence by building new
mental structures to bridge the gulfs. In particular, this requires the user to au-
tomate the response sequence and to learn to think in the same language as that
required by the system.
Higher-Level Languages
One way to bridge the gulf between the intentions of the user and the specifi-
cations required by the computer is well known: Provide the user with a
higher-level language, one that directly expresses frequently encountered
structures of problem decomposition. Instead of requiring the complete de-
composition of the task to low-level operations, let the task be described in the
same language used within the task domain itself. Although the computer still
requires low-level specification, the job of translating from the domain lan-
guage to the programming language can be taken over by the machine itself.
This implies that designers of higher-level languages should consider how to
develop interface languages for which it will be easy for the user to create the
mediating structure between intentions and expressions in the language. One
way to facilitate this process is to provide consistency across the interface sur-
DIRECT MANIPULATION INTERFACES
face. That is, if the user builds a structure to make contact with some part of the
interface surface, a savings in effort can be realized if it is possible to use all or
part of that same structure to make contact with other areas.
The result of matching a language to the task domain brings both good news
and bad news. The good news is that tasks are easier to specify. Even if consid-
erable planning is still required to express a task in a high-level language, the
amount of planning and translation that can be avoided by the user and passed
off to the machine can be enormous. The bad news is that the language has lost
generality. Tasks that do not easily decompose into the terms of the language
may be difficult or impossible to represent. In the extreme case, what can be
done is easy to do, but outside that specialized domain, nothing can be done.
The power of a specialized language system derives from carefully specified
primitive operations, selected to match the predicted needs of the user, thus
capturing frequently occurring structures of problem decomposition. The
trouble is that there is a conflict between generality and matching to any spe-
cific problem domain. Some high-level languages and operating systems have
attempted to close the gap between user intention and the interaction language
while preserving freedom and ease of general expression by allowing for exten-
sibility of the language or operating system. Such systems allow the users to
move the interface closer to their conception of the task.
The Lisp language and the UNIX operating system serve as examples of this
phenomenon. Lisp is a general-purpose language, but one that has extended it-
self to match a number of special high-level domains. As a result, Lisp can be
thought of as having numerous levels on top of the underlying language ker-
nel. There is a cost to this method. As more and more specialized domain lev-
els get added, the language system gets larger and larger, becoming more
clumsy to use, more expensive to support, and more dificult to learn. Just look
at any of the manuals for the large Lisp systems (Interlisp, Zetalisp) to get a
feel for the complexity involved. The same is true for the UNIX operating sys-
tem, which started out with a number of low-level, general primitive opera-
tions. Users were allowed (and encouraged) to add their own, more specialized
operations, or to package the primitives into higher-level operations. The re-
sults in all these cases are massive systems that are hard to learn and that re-
quire a large amount of support facilities. The documentation becomes huge,
and not even system experts know all that is present. Moreover, the difficulty
of maintaining such a large system increases the burden on everyone, and the
possibility of having standard interfaces to each specialized function has long
been given up.
The point is that as the interface approaches the user's intention end of the
gulf, functions become more complicated and more specialized in purpose.
Because of the incredible variety of human intentions, the lexicon of a lan-
guage that aspires to both generality of coverage and domain-specific functions
can grow very large. In any of the modern dialects of Lisp one sees a microcosm
326 HUTCHINS, HOLLAN, NORMAN
be rebuilt every time they are needed if they have been remembered. Thus, a
user may remember how to do something rather than having to rederive how to
do it. It is well known that when tasks are practiced sufficiently often, they be-
come automated, requiring little or no conscious attention. As a result, over
time the use of an interface to solve a particular set of problems will feel less
difficult and more direct. Experienced users will sometimes argue that the in-
terface they use directly satisfies their intentions, even when less skilled users
complain of the complexity of the structures. T o skilled users, the interface
feels direct because the invocation of mediating structure has been automated.
They have learned how to transform frequently arising intentions into action
specifications. The result is a feeling of directness as compelling as that which
results from semantic directness. As far as such users are concerned, the inten-
tion comes to mind and the action gets executed. There are no conscious
intervening stages. (For example, a user of the vi text editor expressed this as
follows: "I am an expert user of vi, and when I wish to delete a word, all I do is
think 'delete that word,' my fingers automatically type 'dw,' and the word dis-
appears from the screen. How could anything be more direct?")
The frequent use of even a poorly designed interface can sometimes result in
a feeling of directness like that produced by a semantically direct interface. A
user can compensate for the deficiencies of the interface through continual use
and practice so that the ability to use it becomes automatic, requiring little
conscious activity. While automatism is one factor which can contribute to a
feeling of directness, it is essential for an interface designer to distinguish it
from semantic distance. Automatization does not reduce the semantic distance
that must be spanned; the gulfs between a user's intentions and the interface
must still be bridged by the user. Although practice and the resulting expertise
can make the crossing less difficult, it does not reduce the magnitude of the
gulfs. Planning activity may be replaced by a single memory retrieval so that
instead of figuring out what to do, the user remembers what to do. Automati-
zation may feel like direct control, but it comes about for completely different
reasons than semantic directness. Automatization is useful, for it improves the
interaction of the user with the system, but the feeling of directness it produces
depends only on how much practice a particular user has with the system and
thus gives the system credit for the work the user has done. Although we need to
remember that this happens, that users may adjust themselves to the interface
and, with sufficient practice, may view it as directly supporting their inten-
tions, we need to distinguish between the cases in which the feeling of direct-
ness originates from a close semantic coupling between intentions and the in-
terface language and that which originates from practice. The resultant feeling
of directness might be the same in the two cases, but there are crucial differ-
ences between how the feeling is acquired and what one needs to do as an inter-
face designer to generate it.
HUTCHINS, HOLLAN, NORMAN
If the task changes, then the semantic directness of the interface may also
change.
Consider a musical example: Take the task of producing a middle-C note on
two musical instruments, a piano and a violin. For this simple task, the piano
provides the more direct interface because all one need do is find the key for
middle-C and depress it, whereas on the violin, one must place the bow on the
G string, place a choice of fingers in precisely the right location on that string,
and draw the bow. A piano's keyboard is more semantically direct than the vio-
lin's strings and bow for the simple task of producing notes. The piano has a
single well-defined vocabulary item for each of the notes within its range,
while the violin has an infinity of vocabulary items, many of which do not pro-
duce proper notes at all. However, when the task is playing a musical piece
well rather than simply producing notes, the directness of the interfaces can
change. In this case, one might complain that a piano has a very indirect inter-
face because it is a machine with which the performer "throws hammers at
strings." The performer has no direct contact with the components that actu-
ally produce the sound, and so the production of desired nuances in sound is
more difficult. Here, as musical virtuosity develops, the task that is to be ac-
complished also changes from just the production of notes to concern for how to
control more subtle characteristics of the sounds like vibrato, the slight
changes in pitch used to add expressiveness. For this task the violin provides a
semantically more direct interface than the piano. Thus, as we have argued
earlier, an analysis of the nature of the task being performed is essential in
determining the semantic directness of an interface.
nique is to make the physical form of the vocabulary items structurally similar
to their meanings. In spoken language this relationship is called onomato-
poeia. Onomatopoetic words in spoken language refer to their meanings by
imitating the sound they refer to. Thus we talk about the "boom" of explosions
or the "cock-a-doodle-doo" of roosters. There is an economy here in that the
user's knowledge of the structure of the surface acoustical form has a non-
arbitrary relation to meaning. There is a directness of reference in this
imitation; an intervening level of arbitrary symbolic relations is eliminated.
Other uses of language exploit this effect partially. Thus, although the word
"long" is arbitrarily associated with its meaning, sentences like "She stayed a
looooooooooong time" exploit a structural similarity between the surface form
of "long" (whether written or spoken) and the intended meaning. The same
sorts of things can be done in the design of interface languages.
In many ways, the interface languages should have an easier time of
exploiting articulatory similarity than do natural languages because of the rich
technological base available to them. Thus, if the intent is to draw a diagram,
the interface might accept as input drawing motions. In turn, it could present
as output diagrams, graphs, and images. If one is talking about sound patterns
in the input interface language, the output could be the sounds themselves.
The computer has the potential to exploit articulatory similarities through
technological innovation in the varieties of dimensions upon which it can op-
erate. This potential has not been exploited, in part because of economic con-
straints. The restriction to simple keyboard input limits the form and structure
of the input languages and the restriction to simple, alphanumeric terminals
with small, low-resolution screens, limits the form and structure of the output
languages.
Gulf 5 Semantic
Distance
Semantic
Gulf
f
of
Execution Articulatory
i
Of Distance Distance
HUTCHINS, HOLLAN, NORMAN
4. DIRECT ENGAGEMENT
the actual requirements for producing it. Laurel (1986) discusses some of the
requirements. At a minimum, to allow a feeling of direct engagement the sys-
tem requires the following:
5. A SPACE OF INTERFACES
Figure 7. A space of interfaces. The dimensions of distance from user goals and de-
gree of engagement form a space of interfaces within which we can locate some fa-
miliar types of interfaces. Direct manipulation interfaces are those that minimize
the distances and maximize engagement. As always, the distance between user in-
tentions and the interface language depends on the nature of the task the user is
performing.
lowlevel
world
higMevel direct
language maniptiation
Interface as Interface as
*
conversation model world
Engagement
gramming environments. For some classes of tasks, such interfaces may be su-
perior to direct manipulation interfaces.
Finally, the most direct of the interfaces will lie where engagement is
maximized, where just the right semantic and articulatory matches are pro-
vided, and where all distances are minimized.
Direct manipulation systems have both virtues and vices. For instance, the
immediacy of feedback and the natural translation of intentions to actions
make some tasks easy. The matching of levels of thought to the interface
language- semantic directness- increases the ease and power of performing
some activities at a potential cost of generality and flexibility. But not all
things should be done directly. For example, a repetitive operation is probably
best done via a script, that is, through a symbolic description of the tasks that
336 HUTCHINS, HOLLAN, NORMAN
REFERENCES
Black, J . B., & Moran, T. P. (1982). Learning and remembering command names. Pro-
ceedings of the Human Factors in Computer Systems Conference, 8-1 1. New York: ACM.
Black, J. B., & Sebrechts, M. M. (1981). Facilitating human-computer communica-
tion. Applied Psycholinguistics, 2, 149- 17 7.
Borning, A. (1979). ThingLab: A constraint-oriented simulation laboratory (Tech. Rep. No.
SSL-79-3). Palo Alto, CA: Xerox Palo Alto Research Center.
338 HUTCHINS, HOLLAN, NORMAN
Budge, B. (1983). Pinball construction set [Computer program]. San Mateo, CA: Elec-
tronic Arts.
Buxton, W. (1 986). There's more to interaction than meets the eye: Some issues in man-
ual input. In D. A. Norman & S. W. Draper (Eds.), Usercenteredsystem design: Newper-
spectiues on human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates,
Inc.
Carrol, J. M . (1985). What's in a name? An essay in the psychology ofrt$erence. New York:
Freeman.
disessa, A. A. (1985). A principles design for an integrated computational environ-
ment. Human-Computer Interaction, 1, 1-47.
Draper, S. W. (1986). Display managers as the basis for user-machine communication.
In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectiue~on
human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Galton, F. (1894). Arithmetic by smell. Psychological Review, 1, 61-62.
Hollan, J. D., Hutchins, E . , & Weitzman, L. (1984). Steamer: An interactive
inspectable simulation-based training system. AIMagatine, 5, 15-27.
Hollan, J . D . , Stevens, A., & Williams, M . D. (1980). Steamer: An advanced
computer-assisted instruction system for propulsion engineering. Proceedings of Sum-
mer Computer Simulation Conference, 400-404. Arlington, VA: AFIPS Press.
Kay, A. (1984, September). Computer software. Scientifzc American, 52-59.
Laurel, B. K. (1986). Interface as mimesis. In D. A. Norman & S. W. Draper (Eds.),
User centered systnn design. New perspectives on human-computer interaction Hillsdale, NJ:
Lawrence Erlbaum Associates, Inc.
Minksy, M . R . (1984, July). Manipulating simulated objects with real-world gestures
using a force and position sensitive screen. Computer Graphics, 195-203.
Norman, D. A , , & Draper, S. W. (Eds.). (1986). User centered system design: New perspec-
tives on human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Perlis, A. J. (1982). Epigrams on programming. SZGPLAN Notices, 17(9), 7- 13.
Shneiderman, B. (1974). A computer graphics system for polynomials. The Mathemattc~
Teacher, 67(2), 111-1 13.
Shneiderman, B. (1982). The future of interactive systems and the emergence of' direct
manipulation. Behauiorand Znfownation Technology, 1, 237-256.
Shneiderman, B. (1983). Direct manipulation: A step beyond programming lan-
guages. ZEEE Computer, 16(8), 57-69.
Sutherland, I . E. (1963). Sketchpad: A man-machine graphical communication sys-
tem. Proceedings of theSpringJoint Computer Conference, 329-346. Baltimore, M D : Spar-
tan Books.
HCZEditorial Record. This is an invited paper based on a draft of April 1 , 1985. Fi-
nal manuscript received October 3, 1985. -Editor