Andy Clark: Microcognition
Andy Clark: Microcognition
ANDY CLARK
tis weegg
8ashen
prygee*peat
i &
“ad
Natya +
Oe seiatie.
0mee ag’,
SR 90
gh"yh
ree)
tefateelles
vy
HBSSR
ge?
Li
=
*4
it
*
bd
iam
einer
MICROCOGNITION
Philosophy, Cognitive Science, and
Parallel Distributed Processing
by Andy Clark
Andy Clark
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
© 1989 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic
or mechanical means (including photocopying, recording, or information storage and
retrieval) without permission in writing from the publisher.
This book was set in Palatino by Asco Trade Typesetting Ltd., Hong Kong, and printed
and bound in the United States of America.
Clark, Andy.
Microcognition: philosophy, cognitive science, and parallel distributed processing.
or
For Christine Clark, James Clark, and Lesley Benjamin
\
i Qe
= = od “a Cane ~ Sai gt
— _——
i ; ane * eee
.
é =
os * > i= Re - hf =n», ” Ove eeeee
a
a
i. a Coals
a <=,
Vine give ~
chai ayers airy Ae iee®
Contents
Preface = xi
Acknowledgments xiii
Introduction
What the Brain’s-Eye View Tells the Mind’s-Eye View 1
1 After the Goldrush 1
2 Parallel Distributed Processing and Conventional AI 2
3 The Multiplicity of Mind 2
4 The Mind's-Eye View and the Brain's-Eye View 3
5 The Fate of the Folk 5
6 Threads to Follow 6
PART 1
The Mind’s-Eye View
Chapter 1
Classical Cognitivism 9
1 Cognitivism, Life, and Pasta 9
2 Turing, McCarthy, Newell, and Simon 9
3 The Physical-Symbol-System Hypothesis 11
4 Bringing Home the BACON. 13
5 Semantically Transparent Systems 17
6 Functionalism 21
Chapter 2
Situation and Substance 25
1 Stuff and Common Sense 25
2 The Dreyfus Case 25
3 It Ain't What You Know; It's the Way You Know It 27
4 Manipulating the Formal Shadows of Mind 30
5 Showing What We're Made Of 32
6 Microfunctionalism 34
Viii Contents
Chapter 3
Folk Psychology, Thought, and Context 37
1 A Can of Worms 37
A Beginner's Guide to Folk Psychology 38
The Trouble with Folk 39
Content and World 42
Interlude 46 ‘
Some Naturalistic Reflections 47
Ascriptive Meaning Holism 48
WH
W
AAR
ONDChurchland Again 50
9 Cognitive Science and Constitutive Claims 54
10 Functionalism without Folk 58
Chapter 4
Biological Constraints 61
1 Natural-Born Thinkers 61
2 Most Likely to Succeed 61
3 Thrift, the 007 Principle 63
4 Gradualistic Holism and the Historical Snowball 66
5 The Methodology of MIND 74
PART 2
The Brain’s-Eye View
Chapter 5
Parallel Distributed Processing 83
1 PDP ornot PDP? 83
The Space between the Notes 84
The Jets and the Sharks 86
Emergent Schemata 92
Distributed Memory 96
NH
WY Biology Revisited
DoaP 104
Chapter 6
Informational Holism 107
1 In Praise of Indiscretion 107
2 Information Holism in a Model of Sentence
Processing 107
3 Symbolic Flexibility 111
4 Grades of Semantic Transparency 114
5 Underpinning Symbolic Flexibility 118
6 PDP and the Nature of Intelligence 121
7 An Equivalence Class of Algorithms 124
Contents
Chapter 7
The Multiplicity of Mind: A Limited Defence of Classical
Cognitivism 127
1 Of Clouds and Classical Cognitivism 127
2 Against Uniformity 128
3 Simulating a Von Neumann Architecture 131
4 A Lacuna in the Account of Real Symbol Processing 136
5 Full Simulation, Intuitive Processing, and the Conscious Rule
Interpreter 137
6 BACON, an Illustration 139
Chapter 8
Structured Thought, Part1 143
1 Weighting for Godot? 143
2 The Systematicity Argument 144
3 Systematicity and Structured Behavior 146
4 Cognitive Architecture 150
5 Two Kinds of Cognitive Science 152
6 Grammar, Rules, and Descriptivism 154
7 Is Naive Physics in the Head? 157
8 Refusing the Syntactic Challenge 160
Chapter 9
Structured Thought, Part2 161
1 Good News and Bad News 161
2 The Past Tense Acquisition Network 161
3 The Pinker and Prince Critique 165
4 Pathology 168
5 And the Moral of the Story Is... 169
6 The Theoretical Analysis of Mixed Models 172
Chapter 10
Reassembling the Jigsaw 177
1 The Pieces 177
2 Building a Thinker 178
3 Explaining a Thinker 180
4 Some Caveats 183
Epilogue
The Parable of the High-Level Architect 185
Appendix
Beyond Eliminativism 187
1 A Distributed Argument 187
x Contents
Notes — 209
Bibliography 213
Index 221
or
Preface
The subtitle is enough to put anyone off: Philosophy, Cognitive Science, and
Parallel Distributed Processing. It might have read: the elusive, the ill-defined,
and the uncharted. For all that, the project thrust itself upon me with an un-
usual sense of its own urgency. Parallel distributed processing (PDP) is an
exciting and provocative new movement within cognitive science. It offers
nothing less than a new kind of computational model of mind. The bad news
is that it as yet offers nothing more than a hint of the nature and power of
such models. But the hints themselves are remarkable and have the potential,
I believe, to reshape both artificial intelligence (AI) and much of the philoso-
phy of mind. In particular, they offer a new picture of the relation between
sentences ascribing thoughts and the in-the-head computational structures
subserving intelligent action. The final product of such reshaping will not
be found in this work. At best, I offer some views on the central points of
contrast between the old shape and the new, a personal view on most of
the major issues, and along the way, a reasonably detailed taxonomy of
features, distinctions, and subprojects. The taxonomy, though necessarily
individualistic, may be of some use in future discussions of what is, in
effect, a whole new topic for philosophy and AI. The conclusions are often
provisional, as befits discussions of an approach that is still in its infancy.
By the time this book sees print, there will be many new and relevant
developments. I hope the book will provide at least a framework in which
to locate them.
The reader should be warned of my peculiar circumstances. I am first of
all a philosopher, with only a secondary knowledge of AI and evolutionary
biology. I am fortunate to work in the highly interdisciplinary School of
Cognitive and Computing Sciences at Sussex University. It is only thanks
to that harsh selective environment that I have been able to avoid many
glaring misunderstandings. Those that remain are, in the time-honored
clause, entirely my own responsibility. Adaptation, even to an environ-
ment of Al researchers and cognitive scientists, falls somewhat short of an
optimizing process.
x' 7 ’ @
_~a
a .
*
~
Sa ;
*
= * “s
oe
x
= itek en — —
ar fb Se reyes ke hatd ey
“eons
lo mtg tyne sesh adi bo tad« fet sen geile: eniy eae ie
= seta atttors det derm wis i's gata cipill dot te
"iou+4 . ra ae MH xy thers Stemete oN A
ole
owls rou bales af | to ania reson wa ett
” rite ae: wieder lor uo bs oth wtb aed og fosd gh :
early Lanae ea
fuse Pi i % A Se
Acknowledgments
Special thanks are due to the following people (in no particular order): Neil
Tennant and Donald Campbell for showing me what a biological perspec-
tive was all about; Martin Davies, Barry Smith, and Michael Morris for
reminding me what a philosophical issue might look like; Aaron Sloman for
encouraging an ecumenical approach to conventional symbol-processing
Al; Margaret Boden for making the School of Cognitive and Computing
Sciences possible in the first place; my father, James Henderson Clark, and
my mother, Christine Clark, for making me possible in the first place;
my father and Karin Merrick for their saintly patience with the typing;
H. Stanton and A. Thwaits for help with the many mysterious details of
publishing a book; the Cognetics Society at Sussex (especially Adam Sharp
and Paul Booth) for some of the graphics; and Lesley Benjamin for invalu-
able help with the problem of meaning in all its many forms.
I would like to thank the copyright owners and authors for permission
to use the following figures. Figures 4.3 and 4.4 are from S. J. Gould and
R. Lewontin, “The Spandrels of San Marco and the Panglossian Paradigm,”
Proceedings of the Royal Society of London, Series B, 205 (1979) no. 1161;
582-583. Reproduced by permission of the authors and the Royal Society.
Table 5.1 and figures 5.1, 5.3, 5.4, 5.5, and 5.7 are from J. McClelland,
D. Rumelhart, and the PDP Research Group, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, vols. 1 and 2 (Cambridge:
MIT/Press, 1986). Reproduced by permission of the authors and the MIT
Press.
Parts of chapters 3 and 4 are based on the following articles of mine that
have appeared in scholarly journals. Thanks to the editors for permission to
use this material.
“From Folk-Psychology to Cognitive Science,” Cognitive Science 11,
no. 2 (1987): 139-154.
“A Biological Metaphor,” Mind and Language 1, no. 1 (1986): 45-64.
“The Kludge in the Machine,” Mind and Language 2, no. 4 (1987):
277 =300,
xiv Acknowledgments
of which are adapted for subsymbolic processing. For many tasks, our
everyday performance may involve the cooperative activity of a variety of
such machines. Many connectionists are now sympathetic to such a vision.
Thus, Smolensky (1988) introduces a virtual machine that he calls the con-
scious rule interpreter. This is, in effect, a PDP system arranged to simulate
the computational activity of a computer running a conventional (and
semantically transparent) program.
Less straightforward but perhaps equally important is what might be
termed the multiplicity of explanation. This will be a little harder to tease
out in summary form. The general idea is that even in cases where the
underlying computational form is genuinely connectionist, there will remain
a need for higher levels of analysis of such systems. We will need to
display, for example, what it is that a number of different connectionist
networks, all of which have learned to solve a certain class of problems,
have in common. Finding and exhibiting the commonalities that underpin
important psychological generalizations is, in a sense, the whole point of
doing cognitive science. And it may be that in order to exhibit such
commonalities we shall need to advert to the kinds of analysis found in
symbolic (nonconnectionist) Al.
able to rediscover one of Kepler’s laws and Ohm’s law; planning programs
learned to mix and match old successful strategies to meet new demands;
new data structures enabled a computer to answer "questions about the
unstated implications of stories; cryparithmetic programs could far outpace
you or me. But something seemed to be missing. The programmed com-
puters lacked the smell of anything like real intelligence. They were rigid
and brittle, capable of doing only a few tasks interestingly well.
This approach (which was not universal) may have erred by placing too
much faith in the mind’s own view of the mind. The entities that found
their way into such models would be a fairly direct translation of our
standard expressions of belief and desire ultimately into some appropriate
machine code. But why suppose that the mind’s natural way of understand-
ing its own and others mental states by using sentential attributions of
beliefs, desires, fears, and so on should provide a powerful model on which
to base a scientific theory of mind? Why suppose, that is, that the com-
putational substrate of most thought bears some formal resemblance to
ordinary talk of the mind? Such talk is an evolutionarily recent develop-
ment, geared no doubt to smoothing our daily social interactions. There is
no obvious pressure here for an accurate account of the computational
structure underlying the behavior that such talk so adequately describes
and (in a very real sense) explains.
Part 1 of the book examines the mind’s-eye approach, associating it with
a commitment to semantically transparent programming. It looks at some
standard philosophical criticisms of the approach (mistakenly identified by
many philosophers with an AI approach in general) and also raises some
further worries of an evolutionary and biological nature. In part 2 our
attention is focused on the PDP alternative, which I call the “brain’s-eye
view.” The label refers to the brainlike structure of connectionist architec-
tures. Such architectures are neurally inspired. The neural networks found
in slugs, hamsters, monkeys, and humans are likewise vast parallel net-
works of richly interconnected, but relatively slow and simple, processors.
The relative slowness of the individual processors is offset by having them
work in a cooperative parallelism on the task at hand. A standard analogy
is with the way a film of soap settles when stretched across a loop (like the
well-known children’s toy). Each soap molecule is affected only by its
immediate neighbors. At the edges their position is determined by the loop
(the system input). The affect of this input is propagated by a spreading
series of local interactions until a global order is achieved. The soap film
settles into a stable configuration across the loop. In a PDP system at this
point we say the network has relaxed into a solution to the global problem.
In computing, this kind of cooperative parallelism proves most useful when
the task involves the simultaneous satisfaction of a large number of small,
or “soft,” constraints. In such cases (biological vision and sensorimotor
What the brain tells the mind 5
6 Threads to Follow
Here are some threads to follow if your interest lies in a particular topic
among those just mentioned. Conventional, semantically transparent Al
is treated in chapter I, sections 2 to 5; chapter 2, section 4; chapter 4,
section 5; chapter 7, section 6; chapter 8, sections 2 and 4 to 8; chapter 9,
sections 3, 5, and 6; chapter 10, section 4, and the epilogue. Parallel
distributed processing is covered in chapter 5, sections 1 to 7; chapter 6,
sections 1 to 8; chapter 7, sections 1 to 7; chapter 9, sections 1 to 7; and
chapter 10, sections 2 to 5. Mixed models (PDP and simulated conven-
tional systems) are taken up in chapter 7, sections 1 to 7; chapter 9, sections
1 to 7 (especially 9.6); and chapter 10, section 4. Folk psychology and
thought are discussed in chapter 3, sections 1 to 9; chapter 4, section 5;
chapter 7, section 6; chapter 8, sections 1 to 9; and chapter 10, section 4.
Biology, evolutionary theory, and computational models are discussed in
chapter 3, section 6; chapter 4, sections 1 ta 6; and chapter 5, section 6.
The main PDP models used for discussion and criticism are the Jets and
the Sharks (chap. 5, sec. 3), emergent schemata (chap. 5, sec. 4), memory
(chap. 5, sec. 5), sentence processing (chap. 6, secs. 2 and 3), past tense
acquisition (chap. 9, secs. 2 and 3).
I
The Mind’s-Eye View
Current systems, even the best ones, often resemble a house of cards. The
researchers are interested in the higher levels, and try to build up the minimum
of supporting props at the lower levels.... The result is an extremely fragile
structure, which may reach impressive heights, but collapses if swayed in the
slightest from the specific domain for which it was built.
—Bobrow and Winograd, “An Overview of KRL, a Knowledge Representation
Language”
tan vateath
pei cnt, Versa a
eeil
inka haptns Pre.
och Wea Seg
8 ey 7
ae
~_— ee pe
- _——.
=! a os = exe i st Deol z
eal
Sous phere ; . iss NG
Chapter 1
Classical Cognitivism
it
that by leaving the nature of the computations involved so unspecified,
asserts rather too little to be of immediate psycholo gical interest. Newell and
Simon rather intend the physical-symbol-system hypothesis as “a specific
42, my
architectural assertion about the nature of intelligent systems” (1976,
emphasis). It is fair, if a little blunt, to render this specific architectural
assertion as follows.
The strong-physical-symbol-system (SPSS) hypothesis. A virtual machine
engaging in the von Neumann-style manipulation of standard sym-
bolic atoms has the direct and necessary and sufficient means for
general intelligent action.
It will be necessary to say a little about the terms of this hypothesis and
then to justify its ascription to Newell and Simon.
About the terms, I note the following. A virtual machine is a “machine”
that owes its existence solely to a program that runs (perhaps with other
intervening stages) on a real, physical machine and causes it to imitate the
usually more complex machine to which we address our instructions (see,
for example Sloman 1984). Such high-level programming languages as
LISP, PROLOG, and POP11 thus define virtual machines. And a universal
Turing machine, when it simulates a special-purpose Turing machine, may
be treated as a virtual version of the special-purpose machine.
“Yon Neumann-style manipulation” is meant to suggest the use of
certain basic manipulatory operations easily provided in a Von Neumann
machine running a high-level language like LISP. Such operations would
include assigning symbols, binding variables, copying, reading and amend-
ing symbol strings, basic syntactic, pattern-matching operations (more on
which later), and so on. Connectionist processing, as we shall see, involves
a radically different repertoire of primitive operations.
The next phrase to consider is “standard symbolic atoms.” This high-
lights what kinds of entities the SPSS approach defines its computational
operations to apply to. They are to apply to symbolic expressions whose
parts (atoms) are capable of being given an exact semantic interpretation in
terms of the concepts and relations familiar to us in daily, or at any rate
public, language. These are words (atoms) such as “table,” “ball,” “loves,”
“orbit,” “electron,” and so forth. Some styles of connectionism and many
more conventional models (e.g., those of computational linguistics) involve
a radical deparature from the use of standard symbolic atoms. Since this
contrast looms quite large in what follows, it will be expanded upon in
section 5 below.
Finally, the locution “direct and necessary and sufficient means for gen-
eral intelligent action” is intended to capture a claim of architectural suffi-
ciency. In effect, the claim is that a strong physical symbol system, as just
defined, will be capable of genuine intelligent action. That is, such a ma-
Classical cognitivism 13
cessors, a series of programs that aim to simulate and explain the process of
scientific discovery (Langley 1979; Simon 1979; Simon 1987; Langley, et al.
1987).
BACON sets out to induce scientific laws from bodies of data. It takes
observations of the values of variables and searches for functions relating
the values of the different variables. Along the way it may introduce new
variables standing for the ratio of the value of the original variables. When
it finds an invariant, a constant relation of the values of different variables,
it has (in some sense) discovered a scientific law. Thus, by following simple
heuristics of the kind a person might use to seek relations among the data
(“try the simple relations first,” “treat nonconstant products of ratios be-
tween variables as new variables,” etc.), BACON was able to generate from
Kepler's data “ratios of [successive] powers of the radii of the planets’
orbits to [successive] powers of their periods of revolution, arriving at the
invariant D?/P? (Kepler's third law), after a search of a small number of
possibilities,” (Simon 1979, 1088). Similarly, BACON arrived at Ohms law
by noticing that the product of electrical current and resistance is a constant.
Now for a few comments on BACON (the point of some of these won't
be clear until subsequent chapters, but patience is a virtue). First, BACON
makes its “discoveries” by working on data presented in notational formats
(e.g., measures of resistance, periods of planetary revolution) that represent
the fruits of centuries of human labor. Manipulating these representations
could be the tip of the iceberg; creating them and understanding them may
constitute the unseen bulk. I say a little more about this in chapters 6 and
7. For now, simply note that BACON and other programs like AM and
EURISKO® and MYCIN (below) help themselves to our high-level repre-
sentational formalism. In a recent work Langley et al. (1987, 326) are sensi-
tive to the problem of creating new representational formalisms. But they
insist that such problems can be tackled within the architectural paradigm
associated with the SPSS hypothesis.
Second and relatedly, the knowledge and heuristics that BACON de-
ploys are coded rather directly from the level of thought at which we con-
sciously introspect about our own thinking. This is evident from Simon’s
(1987) statement that he relies heavily on human protocols, laboratory
notebooks, etc. BACON thus simulates, in effect, the way we reason when
we are conscious of trying to solve a problem and it uses kinds of heuristics
that with some effort we might explicitly formulate and use as actual,
practical rules of thumb. In chapters 5 to 9 I shall conjecture that this kind
of thought is a recent overlay on more primitive, instantaneous processes
and that though modeling such thought may constitute a psychological
theory of such conscious reasoning, it could not serve on its own to
instantiate any understanding whatsoever. This level of modeling is com-
mon to much but not all contemporary work in AI, including work in
Classical cognitivism 15
expert systems and qualitative reasoning (see, e.g., the section “Reasoning
about the Physical World” in Hallam and Mellish 1987). Thus the MYCIN
rule (Shortliffe 1976) for blood injections reads: If (1) the site of the culture
is blood, (2) the gram stain of the organism is gramneg, (3) the morphology
of the organism rod, and (4) the patient is a compromised host, then there
is suggestive evidence that the identity of the organism is pseudomonas-
aeruginosa (from Feigenbaum 1977, 1014).
Likewise, BACON’s representation of data was at the level of attribute-
value pairs, with numerical values for the attributes. The general character
of the modeling is even more apparent in programs for qualitative-law
discovery: GLAUBER and STAHL. GLAUBER applies heuristic rules to
data expressed at the level of predicate-argument notation, e.g., “reacts
[inputs (HCI, NH) outputs (NH,, Cl)]/” and STAHL deploys such heur-
istics as “identify components: If a is composed of b and c, and a is composed
of b and d, and neither c contains d nor d contains c, then identify c with d.”
Third, BACON uses fairly slow serial search, applying its heuristics one
at a time and assessing the results. Insofar as BACON relies on produc-
tions, there is an element of parallelism in the search for the currently
applicable rule. But only one production fires at a time, and this is the
seriality I have in mind. Serial behavior of this kind is characteristic of slow,
conscious thought. And Hofstadter (1985, 632) reports Simon as asserting,
“Everything of interest in cognition happens above the 100 millisecond
level—the time it takes you to recognise your mother.” Hofstadter disagrees
vehemently, asserting that everything of interest in cognition takes place
below the 100-millisecond level. My position, outlined in detail in chapters
5 to 9 below, is sympathetic to Hofstadter’s (and indeed owes a great deal
to it). But I will show that the notion of a dispute over the correct level
of interest here is misplaced. There are various explanatory projects here,
all legitimate. Some require us to go below the 100-millisecond level (or
whatever) while others do not. This relates, in a way I expand on later, to
a problem area cited by Simon in a recent lecture (1987; see also Langley,
et al. 1987, 14-16). Simon notes that programs like BACON are not good
at very ill structured tasks, tasks demanding a great deal of general knowl-
edge and expectations. Thus, though BACON neatly arrives at Kepler's
third law when given the well-structured task of finding the invariants in
the data, it could not come up with the flash of insight by which Fleming
could both see that the mould on his petri dish was killing surrounding
bacteria and recognize this as an unusual and potentially interesting event.
Listening to Simon, one gets the impression that he believes the way to
solve these ill-structured problems is to expand the set of high-level data
and heuristics that a system manipulates in the normal, slow, serial way
(ie., by creating, modifying, and comparing high-level symbol strings
according to stored rules). Thus, in a recent coauthored book he dismisses
16 Chapter 1
the idea that the processes involved in the flash-of-insight type of dis-
covery might be radically different in computational kind, saying that
the speed and subconscious nature of such events “does not in any way
imply that the process is fundamentally different from other processes of
discovery—only that we must seek for other sources of evidence about its
nature (i.e., subjects’ introspections can no longer help)” (Langley et al.
1987, 329). \
The position I develop holds rather that the folk-psychological term
“scientific discovery” encompasses at least two quite different kinds of
processes. One is a steady, Von Neumann-style manipulation of standard
symbolic atoms in a search for patterns of regularity. And this is well
modeled in Simon and Langley’s work. The other is the flash-of-insight
type of recognition of something unusual and interesting. And this, I shall
suggest, may require modeling by a method quite different (though still
computational).
In effect, theorists such as Langley, Simon, Bradshaw, and Zytkow are
betting that all the aspects of human thought will turn out to be depen-
dent on a single kind of computational architecture. That is an architecture
in which data is manipulated by the copying, reorganizing, and pattern
matching capabilities deployed on list structures by a Von Neumann (serial)
processor. The basic operations made available in such a setup define the
computational architecture it is. Thus, the pattern matching operations
which such theorists are betting on are the relatively basic ones available
in such cases (i.e., test for complete syntactic identity, test for syntactic
identity following variable substitution, and so on). Other architectures (for
example, the PDP architecture discussed in part 2 of this book) provide
different basic operations. In the case of parallel distributed processing
these include a much more liberal and flexible pattern-matching capacity
able to find a best match in cases where the standard SPSS approach would
find no match at all (see especially chapters 6 and 7 below).
Langley, Simon, et al. are explicit about their belief that the symbol-
processing architecture they investigate has the resources to model and
explain all the aspects of human thought. Faced with the worry that the
approach taken by BACON, DALTON, GLAUBER, and STAHL won't
suffice to explain all the psychological processes that make up scientific
discovery, they write, “Our hypothesis is that the other processes of scien-
tific discovery, taken one by one, have [the] same character, so that pro-
grams for discovering research problems, for designing experiments, for
designing instruments and for representing problems will be describable
by means of the same kinds of elementary information processes that are
used in BACON” (1987, 114). They make similar comments concerning the
question of mental imagery (p. 336). This insistence on a single architecture
of thought may turn out to be misplaced. The alternative is to view mind
Classical cognitivism 17
(2) “In classical models, the principles by which mental states are
transformed, or by which an input selects the corresponding output,
are defined over structural properties of mental representations. Be-
cause classical mental representations have combinatorial structure, it is
possible for classical mental operations to apply to them by reference
to their form.” This means that if you have a certain kind of structured
representations available (as demanded by point 1), it is possible to
define computational operations on those representations so that the
operations are sensitive to that structure. If the structure isn’t there
(ie., if there is no symbolic representation), you couldn't to it, though
you might make it look as if you had by fixing on a suitable function
in extension. (Quotes are from Fodor and Pylyshyn 1988, 12-13.)
In short, a classical system is one that posits syntactically structured,
symbolic representations and that defines its computational operations to
apply to such representations in virtue of their structure.
The notion of a semantically transparent system is also meant to capture
the spirit of Smolensky’s views on the classical/connectionist divide, as
evidenced in comments like the following.
A symbolic model is a system of interacting processes, all with the
same conceptual-level semantics as the task behavior being explained.
Adopting the terminology of Haugeland (1978), this systematic explan-
ation relies on a systematic reduction of the behavior that involves no
shift of semantic domain or dimension. Thus a game-playing program
is composed of subprograms that generate possible moves, evaluate
them and so on. In the symbolic paradigm these systematic reductions
play the major role in explanation. The lowest level processes in the
systematic reduction, still with the original semantics of the task
domain, are then themselves reduced by intentional instantiation: they
are implemented exactly by other processes with different semantics
but the same form. Thus a move-generation subprogram with game
semantics is instantiated in a system of programs with list-manipulating
semantics. (Smolensky 1988, 11)
Before leaving the subject of STSs, it is worth pausing to be quite
explicit about one factor that is not intended as part of the definition of an
STS. Under the terms of the definition an STS theorist is not committed to
any view of how the system explicitly represents the rules adduced in task
analysis (level 1). Thus, in my example (1), there is no suggestion that the
tule “If (cup and saucer) then (cup)” must itself be explicitly represented by
the machine. A system could be an STS and be hard-wired so as to take
input “cup and saucer” and transform it into output “cup.” According to
STS theory, all that must be explicit is the structured description of the
Classical cognitivism 21
objects to which the rule is defined to apply. The derivation rules may be
tacit, so long as the data structures they apply to are explicit. On this Fodor
and Pylyshyn rightly insist: “Classical machines can be rule implicit with
respect to their programs. ... What does need to be explicit in a classical
machine is not its program but the symbols that it writes on its tapes (or
stores in its registers). These, however, correspond not to the machine's
rules of state transition but to its data structures” (1988, 61). As an example
they point out that the grammar posited by a linguistic theory need not be
explicitly represented in a classical machine. But the structural descriptions of
sentences over which the grammar is defined (e.g., in terms of verb stems,
subordinate clauses, etc.) must be. Attempts to characterize the classical/
connectionist divide by reference to explicit or'non explicit rules are thus
shown to be in error (see also Davies, forthcoming).
6 Functionalism
While I am setting the stage, let me bring on a major and more straight-
forwardly philosophical protagonist, the functionalist. The functionalist is
in many ways the natural bedfellow of the proponent of the physicial-
symbol-system hypothesis. For either version of the physical-symbol-system
hypothesis claims that what is essential to intelligence and thought is a
certain capacity to manipulate symbols. This puts the essence of thought at
a level independent of the physical stuff out of which the thinking system
is constructed. Get the symbol manipulating capacities right and the stuff
does not matter. As the well-known blues number has it, “It ain’t the meat,
it’s the motion.” The philosophical doctrine of functionalism echoes this
sentiment, asserting (in a variety of forms) that mental states are to be
identified not with, say, physicochemical states of a being but with more
abstract organizational, structural, or informational properties. In Putnam's
rousing words “We could be made of swiss cheese and it wouldn’t matter”
philo-
(1975, 291). Aristotle, some would have it, may have been the first
sophical functionalist. Though there seems to be a backlash now under-
way (see, e.g., Churchland 1981 and Churchland 1986), the recent popu-
larity of the doctrine can be traced to the efforts of Hilary Putnam (1960,
dif-
1967), Jerry Fodor (1968), David Armstrong (1970) and, in a slightly
(1981). I shall not
ferent vein, Daniel Dennett (1981) and William Lycan
attempt to do justice to the nuances of these positions here. Instead, I shall
simply characterize the most basic and still influential form of the doctrine,
a
leaving the search for refinements to the next chapter. First, though,
comment on the question to which functionalism is the putative answer.
to be
In dealing with the issues raised in this book, it seemed to me
for which ideas
essential to distinguish the various explanatory projects
especially clear in the
about the mind are put forward. This should become
22 Chapter 1
closing chapters. For now I note that one classical philosophical project has
been to formulate and assess schemas for a substantial theory of the essence
of the mental. The notion of essence here may be unpacked as the search
for the necessary and sufficient conditions for being in some mental state.
In this restricted sense a theory of mind should tell us what it is about a
being that makes it true to assert of that being that it is in a given mental
state (e.g., believing it is about to rain, feeling sad, feeling anxious, suffering
a stabbing pain in the left toe, and so forth).
For the moment let me simply assert that Newell and Simon’s intended
project (in common with a lot of workers in Al) is psychological explana-
tion. Pending a fuller account of psychological explanation, it is not obvious
that the project of psychological explanation is identical with the project of
seeking the essence of the mental in the sense just sketched. Newell and
Simon’s talk of the physical-symbol-system hypothesis as an account of the
necessary and sufficient conditions of intelligent action effectively iden-
tifies the tasks. It follows that having a full psychological explanation in
their sense would put you in a position to re-create or instantiate the
analyzed mental state in a machine (barring practical difficulties). I shall
later argue for a firm distinction between these projects of psychological
explanation and psychological instantiation.
Functionalism, then, is a sketch or schema of the kind of theory that,
when filled in, will tell us in a very deep sense what it is to be in some
mental state. The most basic form of such a theory is known as Turing-
machine functionalism. Not surprisingly, the doctrine takes its cue from
Turing’s conception of the formal properties sufficient to quarantee that a
task is computable by a mechanism, regardless of the physical stuff out of
which the mechanism was made (see section 2 above).
In Putnam's hands (1960, 1967) functionalism came to suggest a theory
of mind (in the sense of a schema for a substantial theory of the essence of
the mental) that was apparently capable of avoiding many of the difficulties
that beset other such proposals. Very sketchily, the situation was some-
thing like this. Dualism (the idea that mind is a ghostly kind of nonmaterial
substance) had been discredited as nonexplanatory mysticism and was briefly
displaced by behaviorism. Behaviorism (Ryle 1949) held that mental states
were identical with sets of actual and counterfactual overt behaviors and
that inner states of the subject, though no doubt causally implicated in such
behaviors, were not theoretically important to understanding what it is to
be in certain mental states.
This dismissal of the importance of internal states (for a philosophical
theory of mind) was resisted by the first wave of identity theories which
claimed that mental states were identical with brain processes (Smart 1959).
But the identity theory, if one took the claims of its proponents rather
literally (more literally, I am inclined to think, than they ever intended) lay
Classical cognitivism 23
least prima facie reason to doubt that such a system could have any mental
states at all. Could that overall system really constitute, say, an agent in
pain? Surely not. Surely there is nothing which it is like, either nice or nasty,
to be such a system. It has no phenomenonal or subjective experience. Or
as philosophers put it, the system has no qualia (raw feels, real subjectivity).
Hence, Block dubs this argument the absent-qualia argument. I suspect that
this argument loses much of its force once the functionalist hypothesis is
firmly disassociated from both STS-and SPSS-type approaches (see chap-
ters 2 and 10). But for the moment, I leave the discussion of functionalism
on that quizzical note. ;
The players are on stage. We have met the physical-symbol-system
hypothesis and its methodological cousin, semantically transparent cog-
nitive modeling. We have met a computationally inspired philosophical
model of mind (functionalism) and hinted at its difficulties. The disembodied
presence of connectionism awaits future flesh. But we should first pause to
see what philosophers have made of the story so far.
Chapter 2
Situation and Substance
resolve problems concerning the correct referents of words like “it” and
“the pyramid” (there were many pyramids) by deploying its knowledge of
the domain. The output of the program was relatively impressive, but its
theoretical significance as a step on the road to modeling human under-
standing was open to doubt. Could a suitable extension of the microworld
strategy really capture the depth and richness of human understanding?
Dreyfus thought not, since he.saw no end to the number of such micro
competences required to model even a child’s understanding of a real-
world activity like bargaining.
Consider the following example (due to Papert and Minsky and cited in
Dreyfus 1981, 166).
Janet: “That isn’t a very good ball you have. Give it to me and I'll
give you my lollipop.”
For the set of micro theories needed to understand Janet’s words, Minsky
and Papert suggest a lengthy list of concepts that includes: time, space,
things, people, words, thoughts, talking, social relations, playing, owning,
eating, liking, living, intention, emotions, states, properties, stories, and
places. Each of these requires filling out. For example, anger is a state
caused by an insult (and other causes) and results in noncooperation (and
other effects). And this is just one emotion. This is a daunting list and one
that, as Dreyfus points out, shows no obvious signs of completion or even
completability. Dreyfus thus challenges the micro theorist’s faith that some
finite and statable set of micro competences can turn the trick and result in
a computer that really knows about bargaining or anything else. Such faith,
he suggests, is groundless, as AI has failed “to produce even the hint of
a system with the flexibility of a six-month old child” (Dreyfus 1981,
173). He thinks that “the special purpose techniques which work in con-
text-free, gamelike micro-worlds may in no way resemble general purpose
human and animal intelligence.”
Similar criticisms have been applied to Winston's (1975) work in com-
puter vision and Minsky’s (1974) frame-based approach to representing
everyday knowledge. A frame is a data structure dealing with a stereo-
typical course of events in a given situation. It consists of a set of modes
and relations with slots for specific details. Thus a birthday-party frame
would consist of a rundown of a typical birthday-party-going sequence
and might refer to cakes, candles, and presents along the way. Armed with
such a frame, a system would have the common sense required to assume
that the cake had candles, unless told otherwise.
But as Dreyfus is not slow to point out, the difficulties here are still
enormous. First, the frames seem unlikely ever to cover all the contin-
gencies that common sense copes with so well. (Are black candles on a
birthday cake significant?) Second, there is the problem of accessing the
Situation and substance 27
right frame at the right time. Humans can easily make the transition from,
say, a birthday-party frame to an accident frame or to a marital-scene frame.
Is this done by yet more, explicit rules? If so, how does the system know
when to apply these. How does it tell what rule is relevant and when? Do
we have a regress here? Dreyfus thinks so. To take one final example,
consider the attempt to imbue an AI system with knowledge of what it is
for something to be a chair. To do so, Minsky suggests, we must choose a
set of chair-description frames. But what does this involve? Presumably not
a search for common physical features, since chairs, as Dreyfus says, come
in all shapes and sizes (swivel chairs, dentist's chairs, wheelchairs, beanbag
chairs, etc.). Minsky contemplates functional pointers, e.g., “something one
can sit on.” But that is too broad and includes mountains and toilet seats.
And what of toy chairs, and glass chairs that are works of act? Dreyfus’
suspicion is that the task of formally capturing all these inter-linked criteria
and possibilities in a set of context-free facts and rules is endless. Yet, as
Dreyfus can hardly deny, human beings accomplish the task somehow. If
Dreyfus doubts the cognitivist explanation, he surely owes us at least a
hint of a positive account.
where real intelligence is concerned, it ain't what you know, it’s the way
you know it. Of course, there could be a link with the points about bodies
and so forth even here. There may be some things we know about largely
by our own awareness of our bodily and muscular responses (Dreyfus cites
swimming as an example). Perhaps a machine lacking our kind of body but
equipped with some kind of mechanism for holistic similarity processing
could even so know less about\such things than we can. Nonetheless, on
the reading of Dreyfus I am proposing (I have no idea whether he would
endorse it), such a machine would be flexible and commonsensical within
its own domains, and as such it would be at least a candidate for a genuine
knower, albeit not quite a human one. This is in contradistinction to any
system running a standard cognitivist program. |shall expand on this point
in subsequent chapters.
Of Dreyfus’s two points (the one about the social and embodied roots
of human knowing, the other about the need for flexible, commonsense
knowledge) it is only the second which I think we can expect to bear any
deep theoretical weight. But at least that point is suggestive. So let us keep
it in our back pockets as we turn to a rather different criticism of AI and
cognitive science.
then asked, “And did the man eat the hamburger?” it can answer “yes,”
because it apparently knows about restaurants. Searle believes, | think
rightly, that the computer running this program does not really know
about restaurants at all, at least if by “know” we mean anything like
“understand.” The Chinese-room example is constructed in part to dem-
onstrate this. But Searle believes his arguments against that sort of
computational model of understanding are also arguments against any
computational model of understanding.
We are asked to imagine a human agent, an English monolinguist, placed
in a large room and given a batch of papers with various symbols on it.
These symbols, which to him are just meaningless squiggles identifiable
only by shape, are in fact the ideograms of the Chinese script. A second
batch of papers arrives, again full of ideograms. Along with it there arrives
a set of instructions in English for correlating the two batches. Finally, a
third batch of papers arrives bearing still further arrangements of the same
uninterpreted formal symbols and again accompanied by some instructions
in English concerning the correlation of this batch with its predecessors.
The human agent performs the required matchings and issues the result,
which I shall call “the response.” This painstaking activity, Searle argues,
corresponds to the activity of a computer running Schank’s program. For
we may think of batch 3 as the questions, batch 2 as the story, and batch 1
as the script or background data. The response, Searle says, may be so
convincing as to be indistinguishable from that of a true Chinese speaker.
And yet, and this is the essential point, the human agent performing
the correlations understands no Chinese, just as, it would now appear,
a computer running Schank’s program understands no stories. In each case
what is going on is the mere processing of information. If the intuitions
prompted by the Chinese-room example are correct, understanding must
involve something extra. From this Searle concludes that no computer
can ever understand merely by “performing computational operations on
formally specified elements.” Nor, consequently, can the programs that
determine such computational operations tell us anything about the special
nature of mind (Searle 1980, 286). nits
Ramming the point home, Searle asks us to compare the understanding
we (as ordinary English speakers) have of a story in English against the
“understanding” the person manipulating the formal symbols in the Chinese
room has of Chinese. There is, Searle argues, no contest. “In the Chinese
way
case I have everything that Artificial Intelligence can put into me by
of a program and | understand nothing; in the English case I under-
stand everything and there is so far no reason at all to suppose that my
with
understanding has anything to do with computer programs—i.e.,
computational operations on purely formally specified elements” (Searle
1980, 286). In short, no formal account can be sufficient for understanding,
32 Chapter 2
since “a human will be able to follow the formal principles without under-
standing anything” (p. 287). And there is no obvious reason to think
that satisfying some formal condition is necessary either, though as Searle
admits, this could (just conceivably) yet prove to be the case. The formal
descriptions, Searle thinks (p. 299), seem to be capturing
just the shadows
of mind, shadows thrown not by abstract computational sequences but by
the actual operation of the physical stuff of the brain.
I shall argue that Searle is simply wrong thus completely to shift the
emphasis away from formal principles on the basis of a demonstration that
the operation of a certain kind of formal program is insufficient for inten-
tionality. The position to be developed below and in chapters 3 and 5 to
11 views as a necessary though perhaps insufficient condition of real
understanding the instantiation of a certain kind of formal description that
is far more microstructural than the descriptions of the SPSS hypothesis.
Undermining Searle’s strongest claims, however, is no simple matter, and
we must proceed cautiously. The best strategy is to look a little more
closely at the positive claims about the importance of the nonformal,
biological stuff.
support thought. Well, (1) may be right (see chapter 3), though not for
the reasons cited in (2). But even so, (2) is surely not that obscure a claim.
Searle cites the less puzzling case of photosynthesis. By focusing on this,
we may begin to unscramble the chaos.
Photosynthesis, Searle suggests, is a phenomenon dependent on the
actual causal properties of certain substances. Chlorophyl is an earthly
example. But perhaps other substances found elsewhere in the universe can
photosynthesize too. Similarly, Martians might have intentionality, even
though (poor souls) their brains are made of different stuff from our own.
Suppose we now take a formal chemical theory of how photosynthesis
occurs. A computer could then work through the formal description. But
would actual photosynthesis thereby take place? No, it’s the wrong stuff,
you see. The formal description is doubtless a handy thing to have. But if
it’s energy (or thought) you need, you had better go for the real stuff. In its
way, this is fair enough. A gross formal theory of photosynthesis might
consist of a single production, “If subjected to sunlight, then produce
energy.” A fine-grained formal theory might take us through a series of
microchemical descriptions in which various substances combine and cause
various effects. Gross or fine-grained, neither formalism seems to herald the
arrival of the silicon tulip. Market gardening has nothing to fear from
simulated gardening as yet.
Now, there are properties of plants that are irrelevant to their photo-
synthetic capacities, e.g., the color of blooms, the shape of leaves (within
limits) the height off the ground, and so on. The questions to ask are: What
do the chemical properties buy for the plant, and what are the properties
of the chemicals by which they buy it? The human brain is made out of
a certain physical, chemical stuff. And perhaps in conjunction with other
factors, that stuff buys us thought, just as the plant’s stuff buys it energy.
So, what are the properties of the physical chemical stuff of the brain that
buy us thought? Here is one answer (not Searle's or that of supporters
of Searle’s emphasis on stuff, e.g., Maloney [1987]): the vast structural
variability in response to incoming exogenous and endogenous stimuli that
the stuff in that arrangement provides.*
Suppose this were so. Might it not also be true that satisfying some kinds
of formal description guaranteed the requisite structural variability and that
satisfying other kinds of formal description did not? Such a state of affairs
seems not only possible but pretty well inevitable. But if so, Searle’s
For
argument against the formal approach is, to say the least, inconclusive.
properti es of the brain
the only evidence against the claim that the formal
to sustain
buy it structural variability, which in turn buys it the capacity
example
thought, is the Chinese-room thought experiment. But in that
in line with the SPSS
the formal description was at a very gross level,
ing
hypothesis of chapter 1, which in this case amounts to rules for correlat
' 34 Chapter2
6 Microfunctionalism
The defence of a formal approach to mind mooted above can easily be
extended to a defence of a form of functionalism against the attacks
mounted by Block (see chapter 1, section 5). An unsurprising result, since
Searle’s attack on strong Al is intended to cast doubt on any purely formal
account of mind, and that attack, as we saw, bears a striking resemblance to
the charges of excessive liberalism and absent qualia raised by Block.
Functionalism, recall, identified the real essence of a mental. state with
an input ‘internal state transition, and output profile. Any system with
the right profile, regardless of its size, nature and components, would
Situation and substance 35
occupy the mental state in question. But unpromising systems (like the
population of China) could, it seemed, be so organized. Such excessive
liberalism seemed to undermine functionalism: surely the system comprising
the population of China would not itself be a proper subject of experience.
The qualia (subjective experience or feels) seem to be nowhere present.
It is now open to us to respond to this charge in the same way we just
responded to Searle. It all depends, we may say, on where you locate the
grain of the input, internal state transitions, and output. If you locate it at
the gross level of a semantically transparent system, then we may indeed
doubt that satisfying that formal description is a step on the road to being
a proper subject of experience. At that level we may expect absent qualia,
excessive liberalism, and all the rest, although this needn't preclude formal
accounts at that level being good psychological explanations in a sense to
be developed later (chapters 7 and 10). But suppose our profile is much
finer-grained and is far removed from descriptions of events in everyday
language, perhaps with internal-state transitions specified in a mathematical
formalism rather than in a directly semantically interpretable formalism.
Then it is by no means so obvious (if it ever was—see Churchland and
Churchland 1981) either that a system made up of the population of China
could instantiate such a description or that if it did, it would not be a proper
subject of the mental ascriptions at issue (other circumstances permitting—
see chapter 3). My suggestion is that we might reasonably bet on a kind
of microfunctionalism, relative to which our intuitions about excessive
liberalism and absent qualia would show up as more clearly unreliable.
Such a position owes something to Lycan’s (1981) defence of function-
alism against Block. In that defense he accuses Block of relying on a kind of
gestalt blindness (Lycan’s term) in which the functional components are
made so large (e.g., whole Chinese speakers) or unlikely (eg., Searle’s
beer cans) that we rebel at the thought of ascribing intentionality to the
giant systems they comprise. Supersmall beings might, of course, have the
he calls
same trouble with neurons. Lycan, however, then opts for what
a homuncular functionalism, in which the functional subsystems are iden-
tified by whatever they may be said to do for the agent.
internal
Microfunctionalism, by contrast, would describe at least the
state transitio ns) in terms
functional profile of the system (the internal
far removed from such contentful, purposive characterizations. It would
ng units
delineate formal (probably mathematical) relations between processi
obtain, the system will be
in a way that when those mathematical relations
and will have the attendan t
capable of vast, flexible structural variability
emergent properties. By keeping the formal characte rization (and thereby
at this
any good semantic interpretation of the formal characterization)
of such
fine-grained level we may hope to guarantee that any instantiation
right kind of substruc ture to
a description provides at least potentially the
36 Chapter 2
support the kind of flexible, rich behavior patterns required for true under-
standing. These ideas about the right kind of fine-grained substructures
will be fleshed out in later chapters. :
Whether such an account is properly termed a species of functionalism,
as I’ve suggested, is open to some debate. I have opted for a broad notion
of functionalism that relates the real essence of thought and intentionality
to patterns of nonphysically specified internal state transitions suitable for
mediating an input-output profile in a certain general kind of way. This in
effect identifies functionalism with the claim that structure, not the stuff,
counts and hence identifies it with any formal approach to mind. On
that picture, microfunctionalism is, as its name suggests, just a form of
functionalism, one that specifies internal state transitions at a very fine-
grained level.
Some philosophers, however, might prefer to restrict the “functionalism”
label to just those accounts in which (1) we begin by formulating, for each
individual mental state, a profile of input, internal state transitions, and
output in which internal state transitions are described at the level of
beliefs, desires, and other mental states of folk psychology (see the next
chapter) (2) we then replace the folk-psychological specifications by some
formal, nonsemantic specification that preserves the boundaries of the
folk-psychological specifications.> Now there is absolutely no guarantee
that such boundaries will be preserved in a microfunctionalist account
(see the next chapter). Moreover, though it may, microfunctionalism need
not aspire to give a functional specification of each type of mental state.
(How many are there anyway?) Instead, it might give an account of the
kind of substructure needed to support general, flexible behavior of a kind
that makes appropriate the ascription to the agent of a whole host of
folk-psychological states. For these reasons, it may be wise to treat “micro-
functionalism” as a term of art and the defence of functionalism as a defence
of the possible value of a fine-grained formal approach to mind. I use the
terminology I do because I believe the essential motivation of function-
alism lies in the claim that what counts is the structure, not the stuff (this is
consistent with its roots—see Putnam 1960, 1967, 1975b). But who wants
to fight over a word? Philosophical disquiet over classical cognitivism,
I conclude, has largely been well motivated but at times overambitious.
Dreyfus and Searle, for example, both raise genuine worries about the
kind of theories that seek to explain mind by detailing computational
manipulations of standard symbolic atoms. But it is by no means obvious
that criticisms that make sense relative to those kinds of computational
models are legitimately generalized to all computational models. The. claim
that structure, not stuff, is what counts has life beyond its classical congi-
tivist incarnation, as we shall see in part 2. t
oJ
Chapter 3
Folk Psychology, Thought, and Context
1 A Can of Worms
Imagine a can of worms liberally spiced with red herrings. Such, I believe,
is the continuing debate over the role of folk-psychology in cognitive
science. What faces us is a convulsing mass of intertwined but ill-defined
issues like:
* Is commonsense talk of our mental lives (folk-psychology) a proto-
scientific theory of the inner wellsprings of human action?
- Should we expect a neat, boundary-preserving reduction of folk-
psychological talk to the categories of a scientific psychology?
- Failing such reduction, can cognitive science properly claim to be
studying the mind?
* Conversely, could progress in cognitive science force the abandon-
ment or revision of ordinary folk-psychological talk of beliefs and
desires? Is it likely to?
There is a sprawling literature here: Churchland 1981, Stich 1983, Fodor
1980a, Searle 1983, McGinn 1982, Millikan 1986, Pettit and McDowell
1986, and Clark 1987a. And that barely scratches the surface. My strategy
will be to divide and selectively ignore.
It may be helpful briefly to gesture at the relation of these issues to
the overall themes of my discussion. One major goal of this book is to
develop a framework in which some formal approaches to mind can be seen
as plausible, despite the kinds of objections raised in the last chapter, and
philosophically respectable. This demand of philosophical respectability
requires me to be quite canny about the precise way in which formal
or computational considerations are meant to illuminate the mind. One
stumbling block here is the thought that the very idea of mind is intimately
tied up with our ordinary talk of mental states like belief, desire, hope, and
treated
fear. These mental states, according to some of the arguments
elude
below, will necessarily (on some accounts) or probably (on others)
eventually
analysis in terms of the internal states that cognitive science
succeed (so the
endorses. So whatever else cognitive science does, it won't
38 Chapter 3
argument goes) in illuminating the nature of mind. If you accept this (which
you need not do), you might either conclude: so much the worse for the
commonsense idea of mind (Churchland 1981; Stich 1983), or so much
the worse for the claims of cognitive science to be investigating the nature
of mind (see various articles in Pettit and McDowell 1986). It thus falls to
any would-be apologist for cognitive science to delve at least some way
into the vermian heap. So here.goes.
There is surely something very wrong with this picture. Can we really
imagine that our ancestors sat around a campfire and just speculated that
human behavior would be usefully explained with ideas of belief and
desire? Surely not. Some such understanding, though not verbally expressed,
of
seems more likely to be a prerequisite of a highly organized society
ons. Moreover , what makes
language users than a function of their speculati
to obser-
Churhland’s comments criticisms of folk psychology, as opposed
be all sorts of assumpti ons here
vations about its nature? There seem to
such
about the role of ordinary ascriptions of mental states in our lives. Are
g and predictin g others’ bodily
ascriptions really just a tool for explainin
is it really trying
movements? And even if in some sense it is such a tool,
42 Chapter 3
states are identical with the first speaker) who likewise says “There is water
in the lake.” Earth and twin earth are qualitatively identical except that
water on earth is H,O while water on twin earth is XYZ, a chemical
difference irrelevant to all daily macroscopic water phenomena. Do the two
speakers mean the same thing by their words? It has begun to seem that
they cannot. For many philosophers hold that the meaning of an utterance
must determine the conditions under which the utterance is true. But the
utterances on earth and twin earth are made true or false by the presence or
absence of H,O and XYZ respectively. So if meaning determines truth
conditions, the meaning of statements involving natural kind terms (water,
gold, air, and so on) can't be fully explained simply by reference to
narrowly specifiable states of the subject. And what goes thus for natural
kind terms also goes (for similar reasons) for demonstratives (‘that table,”
“the pen on the sofa,” etc.) and proper names. The lesson, as Putnam would
have it, is that “meanings just ain’t in the head.”
At this point, according to Pettit and McDowell (1986, 3), we have two
options. (1) We could adopt a composite account of meaning and belief
in which content depends on both an internal psychological component
(common to the speakers on earth and twin Earth) and an external world-
involving component (by hypothesis, not constant across the two earths).
Or (2) we could take such cases as calling into question the very idea that
the psychological is essentially inner and hence as calling into question
even the idea of a purely inner and properly psychological component of
mental states, as advocated in (1). As Pettit and McDowell (1986, 3) put it,
“NIo doubt what is ‘in the head’ is causally relevant to states of mind. But
must we suppose that it has any constitutive relevance to them?” Of
course, we do not have to take the twin earth cases in either of the ways
mentioned above. For one thing, they constitute an argument only if we
antecedently accept that meaning should determine truth conditions. And
even then there might be considerable room for maneuver (see, e.g., Searle
1983: Fodor 1986). In fact, I suspect that as arguments for content that
involves the world, the twin-earth cases are red herrings. As Michael
Morris has suggested in conversation, they serve more to clarify the issues
than to argue for a particular view. Nonetheless, the idea that contentful
states may essentially involve the world has much to recommend it (see
especially the discussion of demonstratives in Evans 1982).
This, however, is not the place to attempt a very elusive argument.
Instead, I propose to conduct a conditional defence of cognitive science.
(2)
Even if all content turned out to radically involve the world (option
above), that in itself need not undermine the claim of cognitive science
to be an investigation that is deeply (though perhaps not constitutively)
relevant to the understanding of mind. In short, accepting option (2) (Le.,
44 Chapter 3
rejecting the idea that the psychological is essentially inner) does not
commit us to the denial of conceptual relevance, implied in the quoted
passage from Pettit and McDowell. :
The notion of constitutive relevance will be amplified shortly. First,
though, a word about an argument that (if it worked—which it doesn’t)
would make any defence of cognitive science against the bogey of broad,
world-involving content look strictly unnecessary. The argument (adapted
from Hornsby 1986, 110) goes like this:
Two agents can differ in mental state only if they differ somehow in
their behavioral dispositions.
A behavioral difference (i.e., a difference in behavioral dispositions)
requires some internal physical difference.
So there can't be a difference in mental states without some cor-
responding difference of internal physical states (contrary to some
readings of the twin-earth cases).
In other words, the content of mental states has to be narrowly determined
if we are to preserve the idea that a difference in behavioral disposition
(upon which mentality is said to supervene) requires a difference of inner
constitution.
This argument, as Hornsby (1986, 110) points out, trades on a fluctuating
understanding of “behavior.” In the first premise “behavior” means “bodily
movements.” This is clear, since no state of the head can cause you to, e.g.,
throw a red ball or speak to Dr. Frankenstein or sit in that (demonstratively
identified) chair in the real absence—despite appearances, let’s presume—
of a red ball, Dr. Frankenstein, or the chair, respectively. At most, a state of
the head causes the bodily movements that might count, in the right
externally specified circumstances, as sitting in that chair, throwing the red
ball, and so on. In the second premise, however, the appropriate notion
of behavior is not so clearly the narrow one. It may be (and Putnam’s
arguments were meant to suggest that it must be) that the correct ascription
of contentful states to one another is tied up with the actual states of the
surrounding environment. If this is so, then it would be reasonable to think
that since the ascription of contentful states is meant to explain behavior,
behavior itself should be broadly construed. Thus, there could be no
behavior of picking up the red ball in the absence of a red ball (whatever
the appearances to the subject, his bodily movements, etc.). In this sense
the idea of behavior implicated in mental-state ascriptions is more demanding
than the idea of mere bodily movements. In another sense, as Hornsby also
points out (1986, 106-107), it might be less demanding, as fine-giained
differences in actual bodily movement (e.g., different ways of moving our
fingers to pick up the red ball) seem strictly irrelevant to the ascription of
Folk psychology, thought, and context 45
there some more indirect way to achieve both synopsis and conceptual
relevance? I believe there is, but we must tread very carefully.
=
5 Interlude
“What a curious project!” you may be thinking. “The author proposes to
attempt a defence of the significance of cognitive scientific investigations
against a radical, intuitively unappealing, and inconclusively argued doctrine.
And he proposes to do so not by challenging the doctrine itself, but by
provisionally accepting it, and then turning the aggressor’s blade.” The
reason for this is simple. With or without the broad-content theory, it
looks extremely unlikely that the categories and classifications of folk
psychology will reduce neatly to the categories and classifications of a
scientific account of what is in the head. This is the “failing” that Church-
land and (to a lesser degree) Stich ascribe to folk psychology. I embrace the
mismatch, but not the pessimistic conclusion. For folk psychology may not
be playing the same game as scientific psychology, despite its deliberately
provocative and misleading label. So I take the following to be a very real
possibility: whenever I entertain a thought, it is completely individuated by
a state of my head, ie., the content of the thought does not essentially
involve the world, but there will be no projectable predicates in a scientific
psychology that correspond to just that thought. By “no projectable
predicate” Imean no predicate (in the scientific description) that is project-
able onto other cases where we rightly say that the being is entertaining
the same thought. Such other cases would include myself at different times,
other humans, animals, aliens, and machines.
Regardless of broad content, I therefore join the cynics in doubting the
scientific integrity of folk psychology as a theory of states of the head. But
I demur at both the move from this observation to the conclusion that
cognitive science, as a theory of states of the head, has no philosophical
relevance to the understanding of mind (Pettit and McDowell) and the move
to the conclusion that folk psychology be eliminated in favor of a scientific
account of states of the head (Churchland).
What I try to develop, then, is more than just a conditional defence of
cognitive science in the face of allegations of broad content. It is also
a defence of cognitive science despite any mismatch between projectable
states of the head and ascriptions of specific beliefs, desires, fears, etc.
Relatedly, it is a defence of belief-desire talk against any failure to carve
nature at internally visible joints. Coping with the broad-content worry is
thus really a fringe benefit associated with a more careful accommodation
of commonsense talk of the mental into a scientific framework. So, now
that we know we are getting value for our money, let’s move on. t ‘
Folk psychology, thought, and context 47
content ought not to aspire to track neat, projectable states of the head.
For the question must then arise: whose head? According to the present
account, what we are interested in is a very particular kind of under-
standing of the bodily movements of other agents. It is an understanding
of those movements just insofar as they will bear on our own needs and
projects. And it is an understanding that seems to be available any time we
can use talk of the world (as in,the “that” clause of a propositional-attitude
ascription) to help us pick out broad patterns in other agents’ behavior.
Take for example the sentence “John believes that Buffalo is in Scotland.”
This thought ascription is useful not because it would help us predict, say,
exactly how John’s feet would move were someone to tell him that his
long-lost cousin is in Buffalo or even when he would set off. Rather, it is
useful because it helps us to predict very general patterns of intended
behavior (e.g., trying to get to Scotland), and because of the nature of our
own needs and interests, these are what we want to know about. Thus,
suppose I have a forged rail ticket to Scotland and I want to sell it. 1am not
interested in the fine-grained details of anyone’s neurophysiology. All
I want to know is where to find a likely sucker. If the population included
Martians whose neurophysiology was quite different from our own (per-
haps it involves different formal principles even), it wouldn’t matter a jot,
so long as they were capable of being moved to seek their long-lost
cousins. Thus construed, folk psychology is designed to be insensitive to
any differences in states of the head that do not issue in differences of quite
coarse-grained behavior. It papers over the differences between individuals
and even over differences between species. It does so because its purpose is
provide a general framework in which gross patterns in the behavior of
many other well-adapted beings may be identified and exploited. The
failure of folk psychology to fix on, say, neurophysiologically well-defined
states of human beings is thus a virtue, not a vice.
connectionist models in part 2, we shall also see how a system can produce
semantically systematic behavior without any internal mirroring of the
semantically significant parts of the sentences we use to describe its
behavior.
8 Churchland Again
We can now return to Churchland’s specific criticisms of folk psychology.
There were three, recall.
* The explanatory power of folk psychology is limited. The exotic,
insane, and very young are left mysterious.
* It is a stagnant and sterile theory that has stayed the same for
a long period of time.
* It fails to integrate neatly with neuroscience.
The last objection is now easily fielded. The failure to find a neat, boundary-
preserving reduction of the categories and claims of folk psychology to
projectible neuroscientific descriptions need not count as a black mark for
folk theory. We may insist that the folk were not even attempting to
theorize about projectible internal states. The concern was rather to isolate
as economically as possible the salient patterns in the behavior of other
agents. Since such patterns may cut across projectible descriptions in a
scientific language dealing solely with internal states, it is to the credit of
folk psychology that it fails to fix on descriptions with neurophysiological
integrity. Nor need failure to deal with the exotic, the young, or the insane
surprise or bother us. For as Stich, for example, clearly sees, the folk-
psychological method exploits commonalities in environments and cognitive
natures to generate an economical and appropriately located grasp of the
salient patterns in the behavior of our peers. If content ascription thereby
breaks down in extreme cases, so be it. That does not impugn its value
as a tool in its intended domain of application. Likewise, stagnation need
cause no loss of sleep. As a tool for its intended purpose, folk-psychological
talk has been well shaped by the constant pressures in favor of a successful
understanding of others. Such pressures may yield the form of a good
solution more strongly than we often believe. The shape of the tool was
forged many centuries ago. Because there are salient regularities in the
environment and because the use of folk-psychological talk has a limited
goal of making the behavior of others intelligible to us to just the degree
necessary to plot their moves in relation to our needs and interests, it is not
surprising that the hard core of such understanding (ie., ascriptions of
beliefs and desires) has remained relatively constant across temporal and
geographical dimensions. 2
Folk psychology, thought, and context 51
specifiable kind (see chapters 5 to 9). Just as we might revise our description
of the behavior on receipt of new data about the being’s relation to the
world, so too we might revise it on receipt of new data concerning the
inner cause of the bodily movements involved. To give the standard
example,? suppose someone discovered that his neighbor's movements had
all been caused by a giant look-up-tree program with precise descriptions
of outputs in the form “Tf (input),then (output).” In this case (astronomically
unlikely and maybe even physically impossible) the descriptions of the
neighbor as truly acting are in fact unwarranted, or defeasibly warranted
and now defeated. In the absence of the right, doubly broadly specified
behavior, our earlier ascriptions of psychological states and contents are
likewise defeated.
There are various complications here. For example, it may be
that although the discovery of a certain computational substructure
gives us warrant to withdraw ascriptions of mentality, there is no sub-
structure whose presence is necessary for the ascription of mentality. This
is a very real possibility, and it leaves the constitutive status of the sub-
structural stories uncertain. It seems as if there is logical space between
fully constitutive features (meeting the subtraction principle) and mere-
ly causal supports. That space is filled by, for example, cases in which
we have a set of features (or types of computational substructures) such
that
(1) being a thinker requires deployment of some member of that set,
and
(2) a conceptually visible relation obtains between each individual
member of the set and the warranted ascription of contentful thought
(e.g., the substructures can each be seen to support the flexible actual
and counterfactual behavior that warrants the use of a mentalistic
vocabulary), but
(3) there is no further formal or scientific unity to the set of structures
picked out in (2) (i.e., no metalevel formal or scientific description
capable of meeting the demands of the subtraction principle).
My own suspicion is that the boundary between the constitutive and the
causal is too sharply drawn and that genuine conceptual interest may
accrue to all kinds of cases that fail to pick out relations as strong as
constitutivity.
This issue of constitutivity raises the somewhat thorny problem of
philosophical relevance. Under what conditions would knowledge of the
in-the-head, computational substructure of thought count as a contribution
to a properly philosophical understanding of mind? In point of fact, I feel a
strong resistance to such a form of question. If one goal of philosophical
Folk psychology, thought, and context 57
holistically ascribed
on basis of
_ constitutively
dependent on both
Note that the relation between levels (1) and (2) is holistic. We are warranted
in ascribing groups of mental states on the basis of overall behavior. The
relation between (3b) and (1) is far from a neat, boundary-respecting iso-
morphism. Such isomorphism is sabotaged by both the role of (3a) and the
holistic nature of the relation between (1) and (2).
ordinary contentful talk. For such talk is not able (nor, I would add, in-
tended) to be sensitive to the relevant internal causes, yet it is sensitive to
irrelevant external states of affairs. The contrast is between putting tokens
of ordinary contentful talk back into the head (classical cognitivism) and
seeking an account of how what is in the head enables the holistic
ascription of such contents to the subject in the setting of the external
world. In sum, the less plausible we find folk-psychological ideas as a
scientific theory of the internal causes of behaviour, the less plausible the
classical cognitivist program should seem, since it relies heavily on that
level of description. Where goes the classical cognitivist, there goes the
standard functionalist also. For standard functionalism (not the microfunc-
tionalism described in chapter 2, section 6) is committed to filling in the
following schemas for each individual mental state.
Mental state p is any state of a system that takes x, y, z (environmental
effects) as inputs, gives I, m, n, (bodily movements, vocalizations, etc.)
as outputs, and fulfils internal state transitions g, h, i.
The difficulties start in the specification of g, h, i, the internal state transi-
tions. For internal state transitions specify relations between mental states,
folk-psychologically identified. The folk-psychological specifications act as
placeholders to be filled in with appropriate, syntactically specified scien-
tific kinds in due course. But this, of course, is simply to bet on the neat,
boundary-preserving relation between folk psychological kinds and scien-
tific kinds, which I have been at pains to doubt for the last umpteen pages.
That kind of functionalism—the kind that treats folk-psychological des-
criptions as apt placeholders for scientifically well-motivated states of the
head—trightly deserves most of the scorn so freely heaped on it by the
eliminative materialist.
In conclusion, if I’m even halfway right, the folk do know their own
minds. But they do so in a way sensitive to the pressures of a certain
explanatory niche, the niche of our everyday understanding of patterns in
behavior. The pressures on a computational theory of brain activity are
quite different. Such a theory is about as likely to share the form of a
folk-psychological picture of mind as a land-bound herbivore is to share the
form of a sky-diving predator.
ee
6”
t le TiAlvs' bite Higpaaett egenorhayery Aon ‘ : it 4 nae
’ Pine 4 Apes ee a eg MT oh Ve
wm bes Te:
a ee oe Tr ee a
whe s
* Ren :
tre pia
Fork kee ‘yy jd,
‘ —_—
1 Natural-Born Thinkers
Cognitive science is, in practice, a highly design-oriented investigation into
the nature of mental processes. As we saw earlier (chapter 1, section 4), a
popular methodology goes something like this. First, isolate an interesting
human achievement, story understanding, say. Second, find the best way
you can of getting a conventional Von Neumann computer to simulate
some allegedly central chunk of the input-output profile associated with
human performance of the task. Finally, hope that the program so devised
will be at least a clue to the form of a good psychological theory of how
human beings in fact manage to perform the task in question. This classical-
cognitivist strategy, I shall now argue, is as biologically implausible as it is
philosophically unsatisfactory. No serious study of mind (including philo-
sophical ones) can, I believe, be conducted in the kind of biological vacuum
to which cognitive scientists have become accustomed. In this respect at
least, Pylyshyn’s admiration of the post-Turing idea of a study of cognitive
activity “fully abstracted in principle from both biological and phenome-
nonological foundations” (see chapter 1, section 2) strikes me as misplaced.
Constraints that apply to naturally evolved intelligent systems are relevant
to any attempt to model or understand the nature of human thought. This
chapter simply plots some of these constraints and shows how they fuel
the fires already lit underneath classical cognitivism. | treat the constraints
in ascending order of importance for cognitive science.
integrate data from many sensory modalities, achieve the task in real time,
and control any appropriate sensorimotor activity (see, e.g., Walker 1983,
188—209). This may prove to be a surprisingly strong constraint on the
kind of computational architecture required (see part 2).
In sum, the mobile organism most likely to succeed is one that is willing
and able to act quickly on messy and even inconsistent data, is able to per-
form sensory-processing tasks in real time, is preferably able to integrate
the data received through various modalities and deploy it flexibly in new
situations, and is generally an all-round biological achiever. This multifaceted
profile contrasts quite starkly with the kinds of systems often studied in
artificial intelligence. Most of these would not last five minutes in the real
world. Like overprotected children, they have been freed to develop an
impressive performance along some very limited dimension (e.g., chess
playing) by never having to cope with the most rudimentary tasks that
any natural chess-playing organism will have coped with. This strategy of
investigating intelligence in what I shall call vertically limited microworlds is,
I suspect, a major cause of the failure of AI to contribute as much to an
understanding of human psychological processes as we might have hoped.
Vertically limited microworlds take you right up to something close to
human performance in some highly limited and evolutionarily very recent
intellectual domain. Horizontally limited microworlds, by contrast, would
leave you well below the level of human performance right across the
board but would tackle many of the kinds of tasks forced on the evolving
creature at an early stage. Such horizontally limited microworlds would be,
in effect, the cognitive domains of animals much lower down the phyloge-
netic tree than us. Flexible, robust, multipurpose, but somewhat primitive
systems will (I suspect) teach us more about human psychology than inflexi-
ble, rigid simulacra of fragments of high-level human cognitive achieve-
ments. (A current example of such work is Schreter and Maurer’s [1986]
project on sensorimotor spatial learning in connectionist artificial orga-
nisms.) This point is further developed in section 4 below.
feeding was made only in the last decade. And yet, as Vogel points out:
The structure of sponges is most exquisitely adapted to take advan-
tage of such currents, with clear functions attaching to a number of
previously functionless features. Dynamic pressure on the incurrent
openings facing upstream, valves closing incurrent pores lateral and
downstream, and suction from the large distal or apical excurrent
openings combine to gain advantage from even relatively slow cur-
rents. And numerous observations suggest that sponges usually pre-
fer moving water. Why did so much time elapse before someone
made a crude model of a sponge, placed it in a current and watched a
stream of dye pass through it? (1981, 190)
Vogel's question is important. Why was such an obvious and simple adap-
tation overlooked? The reason, he suggests, is that biologists have tended
to seek narrowly biological accounts, ignoring the role of various physical
and environmental constraints and opportunities. They have, in effect,
treated the organism as if it could be understood independently of an
understanding of its immediate physical world. Vogel believes a diametri-
cally opposed strategy is required. He urges a thorough investigation of all
the simple physical and environmental factors in advance of seeking any
more narrowly biological account. He thus urges, “Do not develop expla-
nations requiring expenditure of metabolic energy (e.g. the full-pumping
hypothesis for the sponge) until simple physical effects (e.g. the use of
ambient currents) are ruled out” (Vogel 1981, 182). Vogel gives a number
of other examples involving prairie dogs, turret spiders, and mimosa trees.
It is the general lesson that should interest us here. As I see it, the lesson
is this: if evolution can economize by exploiting the structure of the
physical environment to aid an animal's processing, then it is very likely
to do so. And processing here refers as much to information processing as
to food processing. Extending Vogel's observations into the cognitive
domain, we get what I shall dub the 007 principle. Here it is.
The 007 principle. In general, evolved creatures will neither store nor
process information in costly ways when they can use the structure of
the environment and their operations upon it as a convenient stand-in
for the information-processing operations concerned. That is, know |
only as much as you need to know to get the job done.
Something like the 007 principle is recognized in some recent work in
developmental psychology. Rutkowska (1984) thus argues that a proper
understanding of the nature of infant cognition requires rejecting the solip-
sistic strategies of formulating models of mind without attending to the
way mind is embedded in an information-rich world. It is her view that
computational models of infant capacities must be broad enough to include
Biological constraints 65
enough. Two heads are indeed often better than one. Since it is so phe-
nomenonologically immediate, the use of one’s own body is easily over-
looked. Jim Nevins, a researcher into computer-controlled assembly, cites a
nice example (reported in Michie and Johnston 1984). One solution to the
problem of how to get a computer-controlled machine to assemble tight-
fitting components is to design a vast series of feedback loops that tell the
computer when it has failed to find a fit and get it to try again in a slightly
different way. The natural solution, however, is to mount the assembler
arms in a way that allows them to give along two spatial axes. Once this is
done, the parts simply slide into place “just as if millions of tiny feedback
adjustments to a rigid system were being continuously computed” (Michie
and Johnston 1984, 95).
A proponent of the ecological movement in psychology once wrote,
“Ask not what's inside your head but rather what your head’s inside of”
(Mace 1977, quoted in Michaels and Carello 1981). The positive advice, at
least, seems reasonable enough. Evolution wants cheap and efficient solu-
tions to the problems of survival in a real, richly structured external envi-
ronment. It should come as no surprise that we succeed in exploiting that
structure to our own ends. Just as the sponge augments its pumping action
by ambient currents, intelligent systems may augment their information-
processing power by devious use of external structures. Unlike the eco-
logical psychologists, we can ill afford to ignore what goes on inside the
head. But we had better not ignore what's going on outside it either.
The moral, then, is to be suspicious of the heuristic device of studying
intelligent systems independently of the complex structure of their natural
environment. To do so is to risk putting into the modeled head what nature
leaves to the world. Classical cognitivism, we shall later see, may be guilty
of just this cardinal sin.
Figure 4.1
Complex structure with no simpler forms
of a subsequent unit evolve separately and are put together at a later date.
The classic example this time is the evolution of the eukaryotic cell (Ridley
1985, 38-39; Jacob 1977, 1164). A eukaryotic cell has a nucleus and
contains such organelles as mitochondria and chloroplasts. It is much more
complex than the prokaryotic cell, which is little more than a sac of genetic
material. One account of the evolution of the complex eukaryotic cell
depicts the mitochondria and*chloroplasts (which now convert food into
energy for the cell) as once-independent organisms that later formed a
mutually desirable alliance with the host cell. The terms of the alliance need
not concern us. Very roughly, the host cell is provided with energy, and
the organelles with food and raw materials. Such alliances allow for great
increases in complexity achieved very rapidly by the association of existing
systems. (Something similar may occur in human thought when we see
how to unite ideas developed in separate domains into a single overarching
structure.) Symbiosis thus meets the requirements of gradualistic holism
with a contemporaneous set of independent viable structures (thus meeting in
an unusual way the demand of holism) and with a small structural alteration
(as demanded by gradualism) that supports their amalgamation into a new
whole.
Complex biological systems, then, have evolved subject to the con-
straints of gradualistic holism. This mode of evolution suggests the possi-
bility of a snowballing effect with worrying implications for cognitive
science. The snowballing effect is summed up in an informal principle
formulated by Jacob, a cell geneticist. It is just this: “Simpler objects are
more dependent on (physical) constraints than on history. As complexity
increases, history plays the greater part” (Jacob 1977, 1163). The idea is
simple, and follows immediately from our earlier observations. Gradualism
requires that each structural step in the evolutionary process involve only
a small adjustment to the previous state. Jacob compares evolution to a
tinkerer who must use whatever is immediately at his disposal to achieve
some goal. This case contrasts with that of an engineer who, within certain
limits, decides on the appropriate materials and design, gathers them,
and then carries out the project. (Levis-Strauss [1962] explored a similar
analogy involving the notion of bricolage. See also Draper 1986.)
The point, then, is that what the tinkerer produces is heavily dependent
on his historical situation in a way in which the engineer's product is not.
Two good engineers will often arrive independently at a similar design that
“approaches the level of perfection made possible by the technology of the
time” (Jacob 1977, 1163), whereas two tinkerers attempting to use only the
materials they happen to have to hand will be unlikely to chance on the
same solution. If evolution proceeds as a tinkerer, each step in the evolu-
tionary chain exploits a net historical opportunity whose nature is deter-
mined by whatever materials happen to be available to adapt to a new
Biological constraints 71
requirement. Chance and local factors will play some role at every stage
along the way. Since every later development occurs in a space determined
by the existing solutions (and materials), it is easy to see that there will be
a snowball effect. Every idiosyncrasy and arbitrariness at stage s, forms the
historical setting for a tinkering solution to a new problem at s,. As
complexity increases and s, gives way to s,, the solutions will come more
and more to depend on the particular history of the species. This may be
one reason why some evolutionary theorists (e.g., Hull 1984) prefer to
regard a species as a historical individual picked out by the particular
circumstances of its birth, upbringing, and culture, rather than as an in-
stance of a general, natural kind.
This historical snowballing effect, combined with the need to achieve
some workable total system at each modification (holism), often makes
natural solutions rather opaque from a design-oriented perspective. We
have already seen one example in the evolution of a breathing device from
a swim bladder. If we set out to build a breathing device from scratch, we
might, it seems, do a better job of it.
Two further examples should bring home the lesson. These examples,
beautifully described by Dawkins (1986, 92—95), concern the human eye
and the eye of the flatfish. The human eye, it seems, incorporates a very
strange piece of design. The light sensitive photoreceptor cells face away
from the light and are connected to the optic nerve by wires that face in the
direction of the light and pass over the surface of the eye before disappear-
ing through a hole in the retina on their way to the brain. This is an odd
and seemingly clumsy piece of design. It is not shared by all the eyes that
nature has evolved. An explanation of why vertebrate eyes are wired as
they are might be that the combination of some earlier historical situation
with the need to achieve an immediate working solution to some problem
of sight forced the design choice. The wiring of the eye, then, has every
appearance of being a kludge, a solution dictated by available materials and
short term expediency. As a piece of engineering it may be neither elegant
nor optimal. But as a piece of thinkering, it worked.
Dawkins’s second example concerns bony flatfish, e.g., plaice, sole, and
halibut. All flatfish hug the sea floor. Some flatfish, like skates and rays, are
flattened elegantly along a horizontal axis. Not so the bony flatfish. It is
flattened along a vertical axis and hugs the sea floor by lying on its side.
This rather ad hoc solution to whatever problem forced these fish to take
to the sea bed must have raised a certain difficulty. For one eye would then
be facing the bottom and would be of little use. The obvious tinkerer’s
solution is to gently twist the eye round to the other side. Overall, it is a
rather messy solution that clearly shows evolution favoring quick, cheap,
short-term solutions (to hug the sea bed, lie on your side) even though
they may give rise to subsequent difficulties (the idle eye), which then get
72 Chapter 4
The list is odd in that some of the items (like emotional response and
curiosity) seem either too biological or too complex for current work to
address. Yet these, I suggest, are the building blocks*of human cognition. It
seems psychologically ill-advised to seek to model, say, natural language
understanding without attending to such issues. Paradoxically, then, the
protocognitive capacities we share with lower animals, and not the distinc-
tive human achievements such as chess-playing or story understanding,
afford, I believe, the best opportunities for design-oriented insight into
human psychology. This is not to say, of course, that all design-oriented
investigation into higher-level skills is unnecessary, merely that it is insuffi-
cient. Indeed, a design-oriented approach to the higher-level achievements
may be necessary if we are to understand the nature of the task that
evolution may choose to perform in some more devious way. The general
point I have been stressing is just that understanding the natural solution
to an information-processing task may require attending at least to the
following set of biological motivated constraints:
+ High value must be placed on robustness and real-time sensory
processing (section 2).
* The environment should not be simplified to a point where it no
longer matters. Instead, it should be exploited to augment internal
processing operations (section 3).
« The general computational form of solutions to evolutionarily re-
cent information-processing demands should be sensitive to a re-
quirement of continuity with the solutions to more basic problems
(section 4).
tive approach. And some cognitive scientists do not see any such alterna-
tive (but see part 2 for some reasons for optimism). It is also worth noting,
as stressed at the very beginning, that what I am calling the classical cogni-
tivist tradition by no means exhausts the kinds of work already being done
even in what I shall later view as conventional artificial intelligence (the
intended contrast is with the PDP approach investigated in part 2). Thus,
early work on cybernetics, more recent work on low-level visual pro-
cessing, and some work in robotics can all be seen as attempting to do
some justice to the kinds of biological constraints just detailed. To take just
one recent example, Mike Brady of Oxford recently gave a talk in which
he explained his interest in work on autonomously guided vehicles as
rooted in the peculiar task demands faced by such vehicles as robot trucks
that must maneuver and survive in a real environment (see Brady et al.
1983). These included severe testing of modules by the real environment,
three-dimensional ranging and sensing, real-time sensory processing, data
fusion and integration from various sensors, and dealing with uncertain
information. Working on autonomously guided vehicles is clearly tan-
tamount to working on a kind of holistic animal micro world: such work is
forced to respect many (but not all) of the constraints that we saw would
apply to evolved biological systems.
Classical cognitivism tries to make a virtue out of ignoring such con-
straints. It concentrates on properties internal to the individual thinker,
paying at best lip service to the idea of processing that exploits the world;
it seeks neat, design-oriented, mathematically well understood solutions to
its chosen problems; and it chooses its problems by fixating on various
interesting high-level human achievements like conscious planning, story
understanding, language parsing, game playing, and so forth. Call this a
MIND methodology. MIND is a slightly forced acronym meaning: focused
on Mature (ie., evolutionarily recent) achievements; seeking Internalist
solutions to information processing problems (i.e. not exploiting the
world); aimed at Neat (elegant, well-understood) solutions; and studying
systems from an ahistorical, Design-oriented perspective. The methodology
of MIND thus involves looking at present human achievements, fixating
on various intuitively striking aspects of those achievements (e.g., plan-
ning, grammatical competence, creativity), then treating each such high-
level aspect as a separate domain of study in which to seek neat, internalist,
design-oriented solutions, and hoping eventually to integrate the results
into a useful understanding of human thought. This general strategy is
reflected in the plan of AI textbooks, which will typically feature individual
chapters on, e.g., vision, parsing, search, logic, memory, uncertainty, plan-
ning, and learning (this is the layout of Charniak and McDermott 1985).
Our earlier reflections (sections 2 to 4) already give us cause to doubt the
76 Chapter 4
function. The shape of the chin, to use a classic example, is merely a striking
by-product, the result of an architectural relation that obtains between
anatomical features selected on quite independent grounds.
In a seminal paper Gould and Lewontin (1978) caricature the adapta-
tionist approach by applying such reasoning to two nonbiological cases.
The first concerns the spandrels of Saint Marks Cathedral in Venice. A
spandrel is a triangular space formed by the intersection of two rounded
arches. Spandrels are a necessary structural by-product caused by mounting
a dome on a number of rounded arches. The spandrels of Saint Marks,
however, have been put to particularly good use, as can be seen in figure
4.3. The spandrels are used to express the Christian themes of the dome. In
this case a man (said to represent one of the biblical rivers) is seen pouring
water from a pitcher. Overall, the effect of the designs worked into the
spandrels is so striking that we might even tempted to view the overall
structure of pillars and dome as themselves a result of the need to have a
triangular space for the designs. But this, of course, would be precisely to
reverse the true order of explanation. As a result of the decision to rest a
dome on rounded arches, spandrels come into being as an inevitable by-
product. These were then exploited by the artist or designer.
As a second example consider the ceiling of Kings College Chapel
(Gould and Lewontin 1978, 254). The ceiling (described by Wordsworth as
“that branching roof self-poised, and scooped into ten thousand cells,
where light and shade repose”) is supported by a series of pillars that are
fan-vaulted at the top. Where the fan vaultings meet between pillars, a
star-shaped space is inevitably created. In Kings College Chapel these
star-shaped spaces have been decorated with portcullises and roses (see
figure 4.4). Once again, the architectural constraint is clearly the main
source of the design. Gould and Lewontin point out, “Anyone who tried to
argue that the structure exists because the alternation of rose and portcullis
makes so much sense in a Tudor chapel would be inviting ... ridicule”
(1978, 254).
In the cases cited, we would regard the adaptationist explanations of
spandrels and star-shaped spaces as bizarre. The lesson, then, is that we
must not simply accept our intuitions about the basic and central features
of an organism as a reliable guide to its decomposition into a set of
individual traits in need of adaptationist explanation. To do so is to commit
a fallacy of reification. In this fallacy intuitively striking but emergent
features of a complex object are reified, and direct explanations of each
feature are constructed. By trying to deal with high-level cognitive features
without first attending to their basic underpinnings, standard Al, I believe,
commits a version of this fallacy. Table 4.1 makes the parallel explicit. The
fact that the features listed to the right of human thought strike us as
suitable individual objects of computational investigation may be an effect
78 Chapter 4
air >
Figure 4.3
One of the spandrels of Saint Mark's. Reproduced by permission from Gould and Lewontin
1978, 582 :
*
Biological constraints 79
Figure 4.4
& d by permission from Gould and Lewontin
alae}uv 1S}i) 00 ) = NA 5 oDa OU 0 £ 9)vo 1o)& oO a.oO ~ v o,i=]S) ac!S y 13)
iolOv nNoo Vayie)Lae)
80 Chapter 4
Table 4.1
The parallel between architecture and cognition
Complex object Striking features .
Saint Marks spandrels
Kings College chapel _ star shapes
human thought learning
induction
word recognition
reaching
frame-based reasoning
memory
planning
rule following
The rich behaviour displayed by cognitive systems has the paradoxical character
of appearing on the one hand tightly governed by complex systems of hard rules,
and on the other to be awash with variance, deviation, exception, and a degree
of flexibility and fluidity that has quite eluded our attempts at simulation. ...
The subsymbolic paradigm suggests a solution to this paradox.
—Paul Smolensky, “On the Proper Treatment of Connectionism”
= a ne ‘
> = ; eh de
+ i ee a ‘. a, Aa
= ™“ ¥ : - 4 > =
~~ 7
. 7s “3
e-.
be wal +e4- ene ‘nico
. Ra)
~ - ‘ Py
— ’
: 4
ay
+ ee oo sy aaa a,
Lage aoa evecare ina duh wore ae
a ' , = w . we ; yi ' one oeid yi het wh wit i
‘slbe ee tya
at ‘hep ll verti tier tye nations aw ,
, EOE 3 4ae te iy ete
vane
Pa est: wa etanawlutey ts -
fete oa ek weet ee alts sp AP ot
Rapin anh 2 Seamearanlig area eal
‘
Cha ee a Fg + ]
- Sines einen ag
y ve. Meow , 68 Ua :
Bey tf by
a
7 fe
=
Chapter 5
Parallel Distributed Processing
Classical cognitivism was indeed the pure science of the mind. Or perhaps
it was the pure science of the mind’s own idea of the mind. However you
see it, it would be hard to exaggerate its deliberate intellectual distance
from the messy substrata of biological fact. On the structure of the brain
and the phylogeny of cognitive processes, the classical cognitivist main-
tained a studied indifference. But perhaps cognitive science cannot afford
that indifference.
Parallel distributed processing (or connectionism) is an attempt to pro-
vide slightly more biologically realistic models of mind. Such models,
though hardly accurate biologically, are at least inspired by the structure of
the brain. Moreover, they are tailored, in a sense to be explained, to
evolutionarily basic problem-solving needs, like perceptual pattern com-
pletion. These models, I shall argue, offer the best current prospect for
soothing the philosophical and biological sore spots inflamed (I hope) by
the first half of this book.
As counterpoint to the enthusiasm, a word of warning. There is a certain
danger in the extreme polarization of cognitive science which this treat-
ment may seem to imply. The danger is summed up in the slogan “PDP or
not PDP?” This, I hope to show, is not the question. PDP is not a magic
wand.! And a connectionist touch will not turn a Von Neumann frog into
a parallel distributed princess. Both our deeper explanatory understanding
of cognition and on some occasions our actual processing strategies may
well demand the use of higher level, symbolic, neoclassical descriptions. For
all that, connectionist models offer insights into the way nature may pro-
vide for certain properties that seem to be quite essential to what we
consider intelligent thought. Walking this tightrope between the cogni-
tivist and connectionist camps is a major task of the next five chapters.
First though, we had better get some idea of what a PDP approach actually
involves.
84 Chapter 5
Figure 5.1
A local hardwired network. From McClelland, Rumelhart, and Hinton 1986, 28
Table 5.1
The Jets and the Sharks
Name Gang Age Education Marital status Occupation
Art Jets 40s J.H. sing. pusher
Al Jets 30s J.H. mar. burglar
Sam Jets 20s col. sing. bookie
Clyde Jets 40s J.H. sing. bookie
Mike Jets 30s J.H. sing. bookie
Jim Jets 20s J.H. div. burglar
Greg Jets 20s HS. mar. pusher
John Jets 20s J.H. mar. burglar
Doug Jets 30s be Sy sing. bookie
Lance Jets 20s J.H. mar. burglar
George Jets 20s J.H. div. burglar
Pete Jets 20s LEW), sing. bookie
Fred Jets 20s HS. sing. pusher
Gene Jets 20s col. sing. pusher
Ralph Jets 30s J sing. pusher
Phil Sharks 30s col. mar. pusher
Ike Sharks 30s J.H. sing. bookie
Nick Sharks 30s ELS: sing. pusher
Don Sharks 30s col. mar. burglar
Ned Sharks 30s col. mar. bookie
Karl Sharks 40s HS. mar. bookie
Ken Sharks 20s HS. sing. burglar
Earl Sharks 40s HS. mar. burglar
Rick Sharks 30s HS. div. burglar
Ol Sharks 30s col. mar. pusher
Neal Sharks 30s H's: sing. bookie
Dave Sharks 30s HS. div. pusher
e e
@26© |
Figure 5.2
Inhibitory links
tween each burglar unit and the thirties unit. If, in addition, only
burglars are in their thirties, the thirties unit would be excitatorily
linked to the units representing burglars.
* Solid black spots signify individuals and are connected by excita-
tory links to the properties that the individual has, e.g., one such unit
is linked to units representing Lance, twenties, burglar, single, Jet, and
junior-high-school education.
By storing the data in this way, the system is able to buy, at very little
computational cost, the following useful properties: content addressable
memory, graceful degradation, default assignment, and generalization. |
shall discuss each of these in turn.
Figure 5.3
The pattern of activation for a Shark in his thirties. Hatching = input activations. Sunburst =
units to which activation spreads. The diagram is based on McClelland, Rumelhart, and
Hinton 1986, p. 28, fig. 11.
It is easy to see how this works. Suppose you want to know who satisfies
the description “is a Shark in his thirties.” The thirties and Shark units are
activated and pass positive values to the units to which they have excitatory
links. There is a chain of spreading activation in which first the individual-
signifying unit, then those other units to which it is excitatorily linked get
activated. The result is a pattern of activation involving the units for Shark,
thirties, burglar, divorced, a high-school education, and Rick. The process is
shown in figure 5.3. The important point is just this: the same final pattern
of activation (ie., the overall pattern of units active after the spread of acti-
vation) could have been achieved by giving the system any one of a num-
ber of partial descriptions, e.g., the inputs “Shark, high-school education,”
“Rick, thirties,” and so on. Simply by using a network representation
of the data, we obtain a flexible, content-addressable memory store.
Graceful degradation
Graceful degradation, as we saw in chapter 4, comes in two, related varieties.
The first demands that a system be capable of sustaining some hardware
90 Chapter 5
Thus, the S unit is stimulated by three times, and the L, and Ra, and A units
twice. But the various individual-representing units are themselves con-
nected in a mutually inhibitory fashion, so the strong, threefold activation
of the S unit will tend to inhibit the weaker, twofold activation of the A, L,
and Ra units. And when the activation spreads outwards from the indi-
vidual units, the S unit will pass on the most significant excitatory value.
The S unit is excitatorily linked to the name unit Sam. And the various
name units too, are competitively connected via mutually inhibitory links.
Thus “Sam” will turn out to be the network's chosen completion of the
error-involving description beginning “Jet,” “bookie,” “married,” “junior-
high-school education.” A sensible choice. The spread of activation respon-
sible is shown in figure 5.4.
Default assignment
Suppose that you don’t know that Lance was a burglar. But you do know
that most of the junior-high-school-educated Jets in their twenties are
burglars rather than bookies or pushers (see the data table in figure 5.2). It
Parallel distributed processing 91
Figure 5.4
The pattern of activation for a Jet bookie with a junior-high-school education. A unit
labeled with a gang member's initial stands for that individual. The input patterns are
marked with hatching. The strongly activated individual unit is marked with an x, and the
name unit it excites is marked with a sunburst. The diagram is based on McClelland,
Rumelhart, and Hinton 1986, p. 28, fig. 11.
Flexible generalization
The property of flexible generalization is closely related to those con-
sidered above. Indeed, in a number of respects we may look upon all the
properties treated in this example as involving different high-level descrip-
tions and uses of the same underlying computational strategy of pattern
completion. In this final case, the pattern-completing talent of the system is
used to generate a fypical set of properties associated with some descrip-
tion, even though all the system directly knows about are individuals, none
of whom need be a perfectly typical instantiation of the description in
question. Thus, suppose we seek a sketch of the typical Jet. It turns out, as
we saw in the discussion of default assignments, that there are indeed
patterns in Jet membership, although no individual Jet is a perfect example
of all these patterns at once. Thus the majority of Jets are single, in their
twenties and educated to the junior-high-school level. No significant pat-
terns exist to specify particular completions of a pattern beginning “Jet”
along the other dimensions that the system knows about. Thus, if the
system is given “Jet” as input, the units for single, twenties, and junior-
high-school education will show significant activity, while the rest will
mutually cancel out. In this way, the system effectively generalizes to the
nature of a typical Jet, although no individual Jet in fact possesses all three
properties simultaneously.
Perhaps what is most significant here is not the capacity to generalize
per se so much as the flexibility of the generalizing capacity itself. As
McClelland and Rumelhart point out, a conventional system might explicitly
create and store various generalizations. One striking feature of the PDP
version is its capacity to generalize in a very flexible way with no need
for any explicit storage or prior decisions concerning the form of required
generalizations. The network can give you a typical completion of any
pattern you care to name if there is some pattern in the data. Thus, instead
of asking for details of a typical Jet, we could have asked for details of a
typical person in his twenties educated at the junior-high-school level or a
typical married pusher, and so on. The network's generalization capacity is
thus flexible enough to deploy available data in novel and unpredicted
ways, ways we needn't have thought of in advance. As our account pro-
gresses, this kind of unforced flexibility will be seen to constitute a major
advantage of PDP knowledge representation.
4 Emergent Schemata
In chapter 2 (especially sections 2 and 3) we spoke of scripts and schemata.
These are special data structures that encode the stereotypic items or
events associated with some description. For example, Schank and Abelson’s
restaurant script held data on the course of events involved in a typical
Parallel distributed processing 93
where. But insofar as it exists, it acts like a stored schema. Yet there is no
need to decide in advance on a set of schemata to store. Instead, the system
learns (or is told) patterns of cooccurrence of individual items making up
when
the schema, and the rest (as we shall now see) comes in the back door
required.
The particular model detailed by McClelland, Rumelhart, et al. concerns
our understanding of a typical room, e.g., a typical kitchen or a typical
office.? The general interest of the model for us is twofold. First, it moves
us in the direction of more distributed representations. Second, it shows
how high-level symbolic characterizations (e.g., our idea of a typical kitchen)
can be among the emergent properties of a network of simpler entities and
how such high-level descriptions (“believes that ...”) may be perfectly
correct without indicating the underlying computational structure of the
system so described. This recalls our account of the emergence of spandrels
from the interaction of other, more basic, architectural features, and it will
prove important later on.
Room-dwelling human beings have ideas about the likely contents of
particular types of rooms. If you are told to imagine someone in the kitchen
standing by the cooker, you may well fill in some other details of the room.
If you did, there would likely be a refrigerator and a sink, some wall cup-
boards, and so on. How does this happen? One answer would be to
propose a “mental file” marked “kitchen” and detailing all the expected
items. But that approach brings in its wake all the difficulties raised earlier.
The answer we are being asked to consider goes like this. You were
exposed to the contents of lots of rooms. You saw objects clustering on
these occasions. Generally, when you were in a room with a cooker, you
were in a room with a sink and not in a room with a bed. So suppose you
had something like a set of PDP units responding to the presence of par-
ticular household items (this is clearly an oversimplification, but it will do
for current purposes). And you fixed it so that units that tended to be on
together got linked by an excitatory connection and units that tended to
go on and off independently got linked by an inhibitory connection. You
would get to a point where activating the unit for one prominent kitchen
feature (e.g., the cooker) would activate in a kind of chain all and only the
units standing for items commonly found in kitchens. Here, then, we have
an emergent schema in its grossest form. Turn on the oven unit and after a
while you get (in the particular simulation done by McClelland, Rumelhart
et al.) oven, ceiling, walls, window, telephone, clock, coffee cup, drapes,
stove, sink, refrigerator, toaster, cupboard, and coffeepot. So far, all is
conventional. But let’s look more closely at some of the propertiesof ’ this
mode of representing the data. The first interesting property is the distri-
buted nature of the system's representation of a kitchen. The concepttof a
kitchen here involves a pattern of activation over many units that may
Parallel distributed processing 95
stand for (or respond to) more basic items treated in our daily language,
or even for items not visible to daily talk at all. Such features might be
functional or geometrical properties of objects. Whether or not the low-
level features (known as microfeatures) are visible to daily talk, the strategy
of building functional correlates of high-level concepts out of such minute
parts in a PDP framework brings definite advantages. The main one, which
I shall reserve for treatment in the next chapter, is the capacity to represent
fine shades of meaning. A spin-off is the capacity of the gross, high-level
concept or ability to degrade gracefully in the first way mentioned in the
previous example. That is, it turns out to be robust enough to withstand
some damage to the system in which it is distributively encoded. Thus,
suppose the coffeepot unit or its links to other units got destroyed. The
system would still have a functional kitchen schema, albeit one lacking a
coffeepot default.
The distinction between a local and a distributed representational system
is somewhat perspectival.* Distribution is in the eye of the beholder or at
best in the functional requirements of the system itself. Thus, in the example
described, we have a distributed (hence, gracefully degradeable) representa-
tion of kitchens but a local and quite gracelessly degradable representation
of coffeepots. Of course, we could in turn have a distributed representa-
tion of coffeepots (e.g., as a pattern of activation across a set of units
standing for physical and functional features of coffeepots). But there will
always be some local representation at the bottom line; even if it lies well
below the level of any features we might fix on in daily talk. Conversely,
even an intuitively local network like the Jets and the Sharks net described
in section 3 can be seen as a distributed representation of some slightly
artificial concept such as Jet-Shark membership. The second and, I think,
more interesting property of the room network is the multiplicity and
flexibility of the emergent schemata it supports. McClelland, Rumelhart,
et al. ran a simulation involving forty names of household features. After
fixing connectivity strengths according to a rough survey of human opin-
ions, they found that the network stored five basic schemata, one for each
type of room they had elicited opinions on. (They were: kitchen, office,
bedroom, bathroom, and living room.) The sense in which it stored five
basic patterns was that if you set just one description unit on (e.g., the oven
or bed unit) the system would always settle on one of five patterns of
activation. But the system proved much more flexible than this state of
affairs initially suggests. For many other final patterns of activation proved
possible if more than one description was tured on. In fact, there were
24° possible states into which it might settle, one for each vertex of a
40-dimensional hypercube. It is just that some of these points are easier to
reach than others. It will thus find a “sensible” pattern completion (subject
96 Chapter 5
to how sensible its stored knowledge is) even for the input pattern “bed,
bath, refrigerator.” To show this in action, McClelland, Rumelhart, et al.
describe a case where “bed, sofa” is given as the input pattern. The system
then sets about trying to find a pattern of activation for the rest of its units
that respects, as far as possible, all the various constraints associated with
the presence of a bed and a sofa simultaneously. What we want is a set of
values that allows the final active schema to be sensitive to the effects of
the presence of a sofa on our idea of typical bedroom. In effect, we are
asking for an unanticipated schema: a typical bedroom-with-a-sofa-in-it.
The system’s attempt is shown in figure 5.5. It ends up having chosen
“large” as the size description (whereas in its basic bedroom scenario the
size is set at “medium”) and having added “easy-chair.” “floor lamp,” and
“fireplace” to the pattern. This seems like a sound choice. It has added a
subpattern strongly associated with sofa to its bedroom schema and has
adjusted the size of the room accordingly.
Here, then, we have a concrete example of the kind of flexible and
sensible deployment of cognitive resources that characterizes natural in-
telligence. Emergent schemata obviate the need to decide in advance what
possible situations the system will need to cope with, and they allow it to
adjust its default values in a way maximally consistent with the range of
possible inputs. Just this kind of natural flexibility and ‘informational holism’
constitutes, I believe, a main qualitative advantage of PDP approaches over
their conventional cousins.
5 Distributed Memory
My final example concerns the process of learning and remembering. It
follows work originally presented in McClelland and Rumelhart 1985 and
modified in volume 2 of McClelland, Rumelhart, et al. The goal of the work
is to generate a model of memory in which the storage of traces of specific
experiences gives rise in a very natural way to a general, nonspecific
understanding of the nature of the domain in question. Thus, for example,
storage of traces of specific experiences of seeing dogs will give rise to a
general, prototypical idea of doghood. This neatly sidesteps a recurrent
problem in modeling memory, namely, the choice between representing
specific and general information. In terms of its behavior, the model looks
as if it explicitly generates and stores prototypes (of e.g., the typical dog).
But as in the schema example, there are no such explicit, stored items.
Instead, the prototype-based understanding is an emergent property of the
system’s way of dealing with specific experiences. The model shares many
of the features of our two previous examples, but it enables us to extend
our discussion to include: e
Parallel distributed processing 97
oven
computer
coat—hanger
scale
toilet
bathtub
television
dresser
cotfee—pot
cupboard
toaster
refrigerator
sink
stove
drapes
fire—place
ash-tray.
coffee—cup
; O easy—chair
OOOOOROOOOROOU00 sofa
QOOOOQOOO00 floor—lamp
~ QOOOOO000000000 picture
*~*oeseQggn0o0000000 clock ’
desk—chair
‘poooo0o0000000000000 books
> pooooceb00oo00008008o carpet
-pppooocgsso000000n005 pean
ewriter
bed
telephone
es
very—small
small
medium
large
very—large
- QODDOOOODOODOOOO00000 window
OOOOOOODOOROOOR00000 door
-OODO0O000000000000000 walls
OOOOOOOoOOoOoOoOoOoOoOo000o ceiling
Figure 5.5
The output of the room network with bed, sofa, and ceiling initially clamped. The result may
be described as a large, fancy bedroom. White boxes indicate active units. The vertical sets
of such boxes, reading from left to right, indicate the successive states of activation of the
network. The starting state (far left) has the bed, sofa, and ceiling units active. The figure is
from Rumelhart, Smolensky, et al., 1986, 34.
98 Chapter 5
are obtained by varying any one of the features of the prototypical dog at
random. Now give each individual dog a name. For each dog code its name
as a pattern of activation across eight units. Give the network a series of
experiences of individual dogs by activating the units that correspond to
the dog description and the dog names. After each such exposure, allow
the system to deploy the delta rule to lay down a memory trace in the form
of a pattern of altered connectivity and to facilitate recall of the last dog
description.
After 50 such runs the system had never been exposed to the proto-
typical dog but only to these distorted instances. The system was then
given a fragment of the prototype as input and was able to complete the
prototypical pattern. No name units became active, as these, being different
for each individual dog, tended to cancel out. Indeed, the network had, in
effect, extracted the pattern common to all the slightly distorted inputs,
producing a general idea of the prototypical member of the set of which
the inputs were instances. Plato scholars will envy the system’s ability to
see the true form of doghood on the basis of these distorted shadows on
the wall of the cave.
The system can do more than extract the prototype from the examples.
It can also re-create the pattern of activation for a specific dog if it has had
a number of exposures to the dog in question and it is given as its
recall-prompting input a disambiguating cue, something distinctive about
that dog: the name of the dog or some distinctive physical feature.
Such, in brief, is the model. Clearly, it does quite well in its main goal of
exhibiting prototype knowledge in the absence of any explicit prototype
generation and storage procedure. The way of encoding and retrieving
specific information results in a functional correlate of prototype-based
reasoning. I gave details of a similar phenomenon in section 4 on emergent
schemata. This is, in fact, a rather general property of PDP approaches.
They exhibit behavior that, taken at face value, strongly suggests a reliance
on some special mechanism aimed at the generation and storage of explicit
no
hypotheses concerning the central structures of a domain. But in fact,
special mechanism is required and the hypotheses are not explicitly stored,
at least not in any normal sense. In a somewhat parallel case, some classical
linguistic
theorizing about human language acquisition suggest that our
competence arises as the end result of a three-sta ge process.
In a PDP model the storage and retrieval strategy targeted on the specific
utterances will yield, in much the same way as described above, behavior
that looks as if it depends on the formulation and deployment of specific
linguistic rules (see, e.g., McClelland, Rumelhart, and the PDP Research
Group 1986, vol. 2, chap. 18). But there is no special mechanism required
to seek these rules and no need to store them explicitly in advance of some
occasion of deployment. It is perhaps misleading to say that the network
does not in some sense learn and deploy the rules. For it becomes struc-
tured in a way that makes it yield outputs that tend to conform to the rule
in a nicely flexible manner. Insofar as rules can ever be stored inside a head,
or a mechanism, this seems to me to amount to a version of such storage.
What is interesting, however, is that such rules depend on no special
mechanism of rule generation and storage and are represented in a manner
that makes them extremely flexible and sensitive to contextual nuances
(more on this in chapter 6). In only this (very important) sense, I believe, do
“distributed models ... provide alternatives to a variety of models that
postulate abstract summary representations such as prototypes, ... seman-
tic memory representations, or even linguistic rules” McClelland, Rumelhart,
et al. 1986, vol. 2, p. 267.
It remains only to mention two further features of the current example
and to offer some comments on its limitations. The two further features are
superpositional storage and a capacity to model fine-grained experimental
data. By “superpositional storage” I mean the property that one network of
units and connections may be used to store a number of representations, so
long as they are sufficiently distinct (the term often used is “orthogonal”)
to coexist without confusion. Thus one network can be trained to exhibit
behavior appropriate to knowledge of a number of distinct prototypes,
such as dog, cat, and bagel (McClelland, Rumelhart, and the PDP Research
Group 1986, vol. 2, p. 185). This is because the delta rule can find a set of
weights that allows it to complete to quite different patterns according to
whether it is given a nonambiguous cue for a dog, cat, or bagel. Interest-
ingly, if it is given an input that is indeterminate between say, a cat and a
dog, it will complete to a blended overall pattern, as if it had an idea not
just of dogs and cats but also of something halfway between dogs and cats
(this property will loom large in later discussions). Indeed, it does have such
an idea, insofar as its prototypes come into being only in response to par-
ticular calls and so function in a maximally flexible way.
The other remaining feature—the capacity to model fine-grained ex-
perimental data—constitutes a major attraction of the approach. By fine-
grained experimental data I mean data on the way performance in sup-
posedly central cases is affected by other factors, including context and
breakdowns. One of the striking things about some PDP models is that
they have ended up, often unintentionally, producing fine-grained behavior
Parallel distributed processing 101
output unit
input units
Figure 5.6
An inclusive or network
output unit
hidden unit
+4
input units
Figure 5.7
A simple exclusive or network with one hidden unit. From Rumelhart, Hinton, and Williams
1986, 321.
6 Biology Revisited
Chapter 4 detailed some biological constraints on natural intelligence. The
PDP approach, it seems to me, goes a long way toward meeting these
constraints. (More philosophical or conceptual constraints, which must be
simultaneously satisfied, are dealt with in succeeding chapters.)
Natural intelligent systems, I argued, will set high store on robustness,
fast sensory processing and motor control, and flexibility in dealing with
new situations. Standard evolutionary pressures will favor economical,
approximate solutions to problems. And the gradualistic holism of the
evolutionary process means that the general form of solutions to early
information-processing problems will heavily constrain the form of more
recent ones. On all these counts, it appears, the PDP approach scores quite
well. Let’s review the situation.
PDP approaches deploy a means of encoding and processing informa-
tion that is particularly well suited to evolutionarily basic tasks like low-
level vision and sensorimotor control. For these are paradigm cases of
tasks requiring the simultaneous satisfaction of a large number of soft con-
straints. Most of the basic tasks listed in chapter 4, section 4, fall into this
category. PDP architectures (i.e., networks of units connected by excitatory
and inhibiting links and set up to follow learning rules) would be a natural
evolutionary choice as a means of performing such tasks at an acceptably
fast speed. The use of such architectures as the basic form of computational
solutions to more recent demands (e.g., semantic memory and even sen-
tence processing—see the next chapter) thus respects the requirement of
continuity with the form of earlier solutions.
Such an approach also brings a range of further benefits in the form of
various emergent properties associated with parallel distributed means of
encoding and retrieval. Thus, achievements that look from an intuitive
mind’s-eye perspective to be a number of distinct achievements, each in
need of its own computational model, can be seen merely as different high-
level ways of describing the same underlying form of processing—roughly,
a constant attempt to configure the system to a best match for various ex-
ogenous and endogenous inputs. Fix the mode of encoding, updating, and
retrieval and you get various forms of generalization, graceful degradation,
default assignment, prototype extraction effects, and so forth. It beginsto
Parallel distributed processing 105
look very much as if PDP goes at least some way to finding the domes and
arches out of which the spandrels visible to the mind’s eye emerge. Many
of the distinctions drawn at the upper level may still reflect our proper
epistemological interests without marking any differences in the underlying
computational form or activity of the system. Thus, we distinguish, say,
memory and confabulation. But both may involve only the same pattern
completing capacity. The difference lies, not in the computational sub-
structure, but in the relation of that structure to states of the world that
impinged on the knower. (For a discussion of how PDP from an internal
perspective blurs the distinction between memory and confabulation,
see McClelland, Rumelhart, and the PDP Research Group 1986, vol. 1,
p. 80-81.)
Obviously, there is considerable and biologically attractive economy
about all this. The use of one underlying algorithmic form to achieve so
many high-level goals and the superpositional storage of data should
appeal to a thrifty Mother Nature. And the double robustness of such
systems (being capable of acting sensibly on partially incorrect data and
capable of surviving some hardware damage) is a definite natural asset.
Finally, the sheer flexibility of the system’s use of its stored knowledge
constitutes a major biological advantage (and philosophical advantage too,
as we shall see). The capacity sensibly to deploy stored knowledge in a
way highly sensitive to particular (perhaps novel) current needs will surely
be a hallmark of any genuine intelligence.
In sum, the kind of approach detailed in the present chapter seems to
offer a real alternative (or complement) to MIND-style theorizing. What
makes PDP approaches biologically attractive is not merely their neuro-
physiological plausibility, as many seem to believe. They also begin to
meet a series of more general constraints on any biologically and evolu-
tionarily plausible model of human intelligence. PDP approaches may not
be the only kind of computational approach capable of meeting such con-
straints. But work within the tradition of semantically transparent systems
has so far failed to do so and instead has produced fragile, inflexible
systems of uncertain neural plausibility.
Le
a7
’ ve: ae ae
ain FR) OPUS = : m
Lite rurridl qr hhtes? (A Ona IBGE! t= Gif CET Wee vise coe teat
iT ait4” keeaetines up uy i awe, " akeiaaty an a Aisin inne
res a val heba = rT, 4 : ey sh aS at 4 esaailie ec eae SF :% a
ay habe s 7) res rh ris ae bere eaten asin agit, a
; die uy vib s ‘Spe aria’ SH: Ay vi¥ St ¥.) atin navaetogi
its Bee Wi lime ole Gum ftiod tad aitefidsines bak 3
Gomeite
the jarmitehy ce /i oi doh avi somodethl wlT inegs> grist
deslh wis wad i bpinte Of Syufomife Hell * HoreleA ) oi 4d sahinte <
Mereyies vz tw 2 204 wor jo neleerntth w s0F onea? wit AG me
Kilud bs iia Witte) Sew los eres aphcs eagtiedelay
WE RMT at }YonSi 18 aay ‘hvle ‘ragelinnad™ Cares
i ie we «Fees
ge # : = 7)
See | aie = wig Inea
: = r ee pres
2% a y Set ;
ee Sar es, ¢
- al as a i ae
oe er ; iti ee §
oe obs
a Peete a
oy
Por= aegra aya oon
7)
Pa oy aa oy a7%
ie, - be &* oe
e =e
Chapter 6
Informational Holism
1 In Praise of Indiscretion
Discretion, or at any rate discreteness, is not always a virtue. In the
previous chapter we saw how a profoundly nondiscrete, parallel distributed
encoding of room knowledge provided for rich and flexible behavior. The
system supported a finely gradated set of potential, emergent schemata. In
a very real sense, that system had no single or literal idea of the meaning
of, say, “bedroom.” Instead, its ideas about what a bedroom is are inextri-
cably tied up with its ideas about rooms and contents in general and with
the particular context in which it is prompted to generate a bedroom
schema. The upshot of this is the double-edged capacity to shade or alter
its representation of bedrooms along a whole continuum of uses informed
by the rest of its knowledge (this double-edged capacity is attractive yet
occasionally problematic—see chapter 9). This feature, which I shall call
the “informational holism” of PDP, constitutes a major qualitative differ-
ence between PDP and more conventional approaches. This difference is
tightly linked to the way in which PDP systems will fail to be semantically
transparent, in the sense outlined in chapter one.
The present chapter has two goals. First, to elaborate on the nature of
informational holism in PDP. Second, to discuss what conceptual relation
parallel distributed encoding bears to the more general phenomenon of
informational holism. On the latter issue, options range from the very weak
(PDP sustains such holism, but so can work in the STS paradigm) to the
very strong (only PDP can support such holism). The truth, as ever, seems
to lie somewhere in between.
Table 6.1
Microfeatures of a model of case-role assignment
Feature Values
volume small, medium, large
pointiness _ pointed, rounded
breakability —_fragile, unbreakable vr
same word are just different patterns of activation [of microfeatures]; really
different readings, ones that are totally unrelated, ... simply have very little
in common. Readings that are nearly identical with just a shading of a dif-
ference are simply represented by nearly identical patterns of activation”
(McClelland and Kawamoto 1986, 315).
The power of PDP systems to shade meanings across a whole continuum
of cases enables them to model a number of effects. Most straightforwardly,
it enables them to disambiguate words according to the context built up by
the rest of the sentence. Thus take a sentence like “THe bat ate the fruit.” In
this case the bat is clearly an animal not a cricket bat, and a PDP model
could use the context of occurrence to determine this fact. The features that
constitute the distributed representation of a live bat would be radically
different from those appropriate to a cricket bat. The presence of the other
words designating an act of eating fruit encourage activation of the fea-
tures of the live bat.
This kind of effect, as McClelland and Kawamoto point out, can quite
easily be captured in a conventional or in a local connectionist approach. In
each case one would have a separate unit or memory store for each of the
readings of “bat,” and a set of rules or heuristics (in the conventional case)
or a pattern of connectivity strengths (in the local connectionist version)
determining which chunk to deploy when. This works until we need to
model very-fine-grained differences in meaning. Separate chunks for a fruit
bat and a cricket bat seem okay. But words seem to take on different shades
of meaning in a continuously varying fashion, one that seems unspecifiable
in advance. Thus, consider:
(1) The boy kicked the ball.
(2) The ball broke the window.
(3) He felt a ball in his stomach.
Sentences (1) and (2) are from McClelland and Kawamoto 1986, 315.) In
case (1) we may imagine a soft, toy ball. In case (2) we imagine a hard ball
(a tennis or cricket ball). In case (3) we have a metaphorical use: there is no
ball in his stomach, but a feeling of a localized, hard lump. Everyday talk
and comprehension is full of such shading effects according to overall
context. Surely we don’t want to commit ourselves to predetermining all
such uses in advance and setting up a special chunk for the semantic mean-
ing of each.
The PDP approach avoids such ontological excess by representing all
these shades of meaning with various patterns in a single set of units
representing microfeatures. The patterns for sentences (1) and (2) might
share, e.g., the microfeature values spherical and game object, while ‘the
pattern for sentences (2) and (3) share the values small and hard. One inter-
Informational holism 111
esting upshot here is the lack of any ultimate distinction between meta-
phorical and literal uses of language. There may be central uses of a word,
and other uses may share less and less of the features of the central use. But
there would be no firm, God-given line between literal and metaphorical
meanings; the metaphorical cases would simply occupy far-flung corners of
a semantic-state space. There would remain very real problems concerning
how we latch on to just the relevant common features in understanding
a metaphor. But it begins to look as if we might now avoid the kind of
cognitivist model in which understanding metaphor is treated as the com-
putation of a nonliteral meaning from a stored literal meaning according to
high-level rules and heuristics. Metaphorical understanding, on the present
model, is just a limiting case of the flexible, organic kind of understanding
involved in normal sentence comprehension. It is not the icing on the cake.
It is in the cake mix itself."
So far, then, we have seen how the informational holism of distributed
models enables them to support the representation of subtle gradations of
meaning without needing to anticipate each such gradation in advance or
dedicate separate chunks of memory to each reading. And we also saw
hints of how this might undermine the rigidity of some standard linguistic
categories like metaphorical versus literal use. One other interesting aspect
of this informational holism concerns learning. The system learns by altering
its connectivity strengths to enable it to re-create patterns in its inputs.
At no stage in this process does it generate and store explicit rules stating
how to go on. Superpositional storage of data means that as it learns about
one thing, its knowledge of much else is automatically affected to a greater
or lesser degree. In effect, it learns gradually and widely. It does not (to use
the simplified model described) learn all about bats, then separately about
balls, and so on. Rather it learns about all these things all at once, by
example, without formulating explicit hypotheses. And what it can be said
to know about each is informed by what it knows about the rest—recall
the case of its knowledge that only hard things break things.
In sum, PDP approaches buy you a profoundly holistic mode of data
storage and retrieval that supports the shading of meanings and allows
gradual learning to occur without the constant generation and revising of
explicit rules or hypotheses about the pattern of regularities in the domain.
3 Symbolic Flexibility
Smolensky (1987) usefully describes PDP models as working in what he
calls the subsymbolic paradigm. In the subsymbolic paradigm, cognition is
match (or
not modeled by the manipulation of machine states that neatly
.
stand for) our daily, symbolic descriptions of mental states and processes
Rather, these high-leve l descripti ons (he cites goals, concepts, knowledg e,
112 Chapter 6
between the conceptual entity (“kitchen” etc.) and the microfeatures in the
network. In a treatment with many microfeatures, where such items as “bed”
and “sofa” are merely approximate top-level labels for subtle and context-
sensitive complexes of geometric and functional properties, the distance will
be great indeed, and a conceptual model only a very high approximation.
Another route to the approximation claim is to regard the classical
accounts as describing the competence of a system, i.e., its capacity to solve
a certain range of well-posed problems (see Smolensky 1988, 19). In
idealized conditions (sufficient input data, unlimited processing time) the
PDP system will match the behavior specified by the competence theory
(e.g., settling into a standard kitchen schema on being given “oven” and
“ceiling” as input). But outside that idealised domain of well-posed prob-
lems and limitless processing time, the performance of a PDP system will
diverge from the predictions of the competence theory in a pleasing way.
It will give sensible responses even on receipt of degraded data or under
severe time constraints. This is because although describable in that idealized
case as satisfying hard constraints, the system may actually operate by
satisfying a multitude of soft constraints. Smolensky here introduces an
analogy with Newtonian mechanics. The physical world is a quantum
system that looks Newtonian under certain conditions. Likewise with the
cognitive system. It looks increasingly classical as we approach the level of
conscious rule following. But in fact, according to Smolensky, it is a PDP
system through and through.
In the same spirit Rumelhart and McClelland suggest: “It might be argued
that conventional symbol processing models are macroscopic accounts,
analogous to Newtonian mechanics, whereas our models offer more micro-
scopic accounts, analogous to quantum theory.... Through a thorough
understanding of the relationship between the Newtonian mechanics and
quantum theory we can understand that the macroscopic level of de-
scription may be only an approximation to the more microscopic theory”
(Rumelhart and McClelland 1986, 125). To illustrate this point, consider a
simple example due to Paul Smolensky. Imagine that the cognitive task to
be modeled involves answering qualitative questions on the behavior of a
particular electrical circuit. (The restriction to a single circuit may appall
classicists, although it is defended by Smolensky on the grounds that a
small number of such representations may act as the chunks utilized in
general purpose expertise—see Smolensky 1986, 241.) Given a description
of the circuit, an expert can answer questions like “If we increase the
resistance at a certain point, what effect will that have on the voltage, i.e.,
will the voltage increase, decrease, or remain the same?”
Suppose, as seems likely, that a high-level competence-theoretic specifi-
cation of the information to be drawn on by an algorithm tailored to
answer this question cites various laws of circuitry in its derivations (what
116 Chapter 6
Smolensky refers to as the “hard laws” of circuitry: Ohm’s law and Kirchoff s’
law). For example, derivations involving Ohm’s law would invoke the
equation
voltage = current x resistance.
How does this description relate to the actual processing of the system?
The model represents the state of the circuit by a pattern of activity over a
set of feature units. These encode the qualitative changes found in the
circuit variables in training instances. They encode whether the overall
voltage falls, rises, or remains the same when the resistance at a certain point
goes up. These feature units are connected to a set of what Smolensky calls
“knowledge atoms,” which represent patterns of activity across subsets of
the feature units. These in fact encode the legal combinations of feature
states allowed by the actual laws of circuitry. Thus, for example, “The
system’s knowledge of Ohm’s law ... is distributed over the many knowl-
edge atoms whose subpatterns encode the legal feature combinations for
current, voltage and resistance” (Smolensky 1988, 19). In short, there is a
subpattern for every legal combination of qualitative changes (65 subpat-
terns, or knowledge atoms, for the circuit in question).
At first sight, it might seem that the system is merely a units-and-
connections implementation of a lookup table. But that is not so. In fact,
connectionist networks act as lookup tables only when they are provided
with an overabundance of hidden units and hence can simply memorize
input-output pairings. By contrast, the system in question encodes what
Smolensky terms “soft constraints,” i.e., patterns of relations that usually
obtain between the various feature units (microfeatures). It thus has general
knowledge of qualitative relations among circuit microfeatures. But it does
not have the general knowledge encapsulated in hard constraints like Ohm’s
law. The soft constraints are two-way connections between feature units
and knowledge atoms, which incline the network one way or another but
do not compel it, that is, they can be overwhelmed by the activity of other
units (that’s why they are soft). And as in all connectionist networks, the
system computes by trying simultaneously to satisfy as many of these soft
constraints as it can. To see that it is not a mere lookup tree of legal
combinations, we need only note that it is capable of giving sensible
answers to (inconsistent or incomplete) questions that have no answer in
a simple lookup table of legal combinations.
The soft constraints are numerically encoded as weighted inter-unit con-
nection strengths. Problem solving is thus achieved by “a series of many
node updates, each of which is a microdecision based on formal numerical
rules and numerical computations” (Smolensky 1986, 246).
The network has two properties of special interest to us. First, it can be
shown that if it is given a well-posed problem and unlimited processing
Informational holism 117
time, it will always give the correct answer as predicted by the hard laws
of circuitry. But, as already remarked, it is by no means bound by such laws.
Give it an ill-posed or inconsistent problem, and it will satisfy as many as
it can of the soft constraints (which are all it really knows about). Thus,
“outside of the idealised domain of well-posed problems and unlimited
processing time, the system gives sensible performance” (Smolensky 1988,
19). The hard rules (Ohm’s law, etc.) can thus be viewed as an external
theorist’s characterization of an idealized subset of its actual performance (it
is no accident if this brings to mind Dennett's claims about the intentional
stance—see Dennett 1981).
Second, the network exhibits interesting serial behavior as it repeatedly
tries to satisfy all the soft constraints. This serial behavior is characterized
by Smolensky as a set of macrodecisions each of which amounts to a “commit-
ment of part of the network to a portion of the solution.” These macrodeci-
sions, Smolensky notes, are “approximately like the firing of production
rules. In fact, these ‘productions’ ‘fire’ in essentially the same order as in a
symbolic forward-chaining inference system” (Smolensky 1988, 19). Thus,
the network will look as if it is sensitive to hard, symbolic rules at quite a
fine grain of description. It will not simply solve the problem “in extension”
as if it knew hard rules. Even the stages of problem solving may look as if
they are caused by the system’s running a processing analogue of the steps
in the symbolic derivations available in the competence theory.
But the appearance is an illusion. The system has no knowledge of the
objects mentioned in the hard rules. For example, there is no neat sub-
pattern of units that can be seen to stand for the general idea of resistance,
which figures in Ohm’s law. Instead, some sets of units stand for resistance
at R,, and other sets for resistance at R,. In more complex networks the
coalitions of units that, when active, stand in for a top-level concept like
resistance are, as we saw, highly context-sensitive. That is, they vary
according to context of occurrence. Thus, to use Smolensky’s own exam-
ple, the representation of coffee in such a network would not consist of
a single recurrent syntactic item but a coalition of smaller items (micro-
features) that shift according to context. Coffee in the context of a cup
may be represented by a coalition that includes the features (liquid) and
(contacting-procelain). Coffee in the context of jar may include the features
(granule) and (contacting-glass). There is thus only an “approximate equiv-
alence of the ‘coffee vectors’ across contexts,” unlike the “exact equivalence
of the coffee tokens across different contexts in a symbolic processing
system” (Smolensky 1988, 17). By thus replacing the conceptual symbol
“coffee” with a shifting coalition of microfeatures, the so-called dimension
shift, such systems deprive themselves of the structured mental representa-
tions deployed in both a classical competence theory and a classical symbol-
processing account (level 2). Likewise, in the simple network described,
118 Chapter 6
would not say that it no longer demonstrates intelligence but only that it
is too slow, or that the particular approach, though.successful, is imprac-
tical” (Krellenstein 1987, 155).
However, I shall not pursue this line. Instead, for the purpose of argument
I shall accept the point about irrelevance of hardware. For much of the
interest of PDP, it seems to me, lies not in the notion of parallel hardware
so much as in the particular properties of any virtual machine (whatever its
hardware base) running parallel cooperative algorithms of the kind we have
been discussing. The particular properties I have in mind (some of which
we have already met) can be grouped under the headings of “Searches” and
“Representation.”
Searches
Even the most conventional work in AI accepts a relationship between
degree of intelligence and efficiency of search. The point of heuristic
searches (chapter 1, section 4) is precisely to increase the intelligence of the
system by reducing the extent of the search space it must traverse to solve
a particular problem. Some problems require finding the best way of simul-
taneously satisfying a large number of soft constraints. In such cases PDP,
we saw, provides an elegant solution to the problem of efficient, content-
sensitive searches. A search is driven by partial descriptions and involves
simultaneously applying competing hypotheses against one another. It is
not dependant on fully accurate or even consistent information in the
partial description. PDP methods thus introduce a qualitatively different
kind of search into AI. This kind of search may not always be the best or
most efficient option; as always, it all depends on the nature of the space
involved.
Representation
We have seen how PDP models (or those PDP models which interest us)
employ distributed representations in superpositional storage. The deep in-
formational holism of PDP depends on this fact. Besides explaining shadings
of meaning, discussed above, this mode of representation has important
side effects. Some of these were discussed in chapter 5 (e.g., graceful
degradation, generalization, content-addressible retrieval). One very im-
portant side effect that I have not yet fully discussed is cross talk.
Cross talk is a distinctive PDP pathology that occurs when a single
network processes multiple patterns with common features. Because the
patterns have common features, there is a tendency for the overall pattern
of activation appropriate to one pattern to suffer interference from the
other pattern because the system tries simultaneously to complete the
other pattern. At times I have praised this property by calling it “free
generalization.” When it occurs in some contexts, however, it is a source of
Informational holism 123
errors. Thus, a system that seeks to recognize words might occasionally fall
into error when two simultaneously presented words share some features,
e.g., “sand” and “lane” might get read as “land” and “sane” (see McClelland
1986, 139). Or in a model of skilled typing, the keystrokes appropriate to
two successive letters may interfere and cause overlap or exchange (see
Rumelhart and Norman 1982).
Cross talk might be regarded as an irritating side effect of parallel dis-
tributed representation were it not for two facts. First, the kind of error
caused by cross talk is familiar to us in many areas of life. (Recall once again
the difficulty of remembering the phone numbers of two friends if the
numbers are very similar. We tend to mix up elements of each. That's cross
talk.) It turns out that rather fine-grained experimental data on human
performance, including such error profiles, can be explained and even
predicted using PDP models (see McClelland 1986, 139-140). This is
presumably good news for PDP considered as an attempt to model human
processing mechanisms.
But second and more important, the source of such occasional errors is
also the source of much of the power and flexibility of such systems. The
tendency to generalize is one example of this. Another even more powerful
effect is to provide a computational underpinning for the capacity to think
analogically, to see parts of our knowledge as exemplifying patterns and
structures familiar to us from other parts. This tendency to see one part of
an experience in terms of another, to see links and similarities and patterns
all around us, is surely at the heart of human creativity. It is what Hofstad-
ter (1985) calls “fluid thinking.”
At its best cross talk may underpin such classic anecdotes as Kekule’s
discovery of the benzine ring after dreaming of snakes biting their own
tails, or Wittgenstein’s capacity to see a court room simulation of a motor
accident using model vehicles as indicating a deep truth about the nature of
language. At its worst, cross talk pathologically amplified may explain the
schizophrenic’s tendency to see deep links between every item of daily ex-
perience. Either way, the idea of an informational substructure that forces us
continuously to seek and extend patterns (the very substructure responsible
for cross talk) is psychologically highly attractive. For, as Hofstadter puts it:
People perceive patterns anywhere and everywhere, without know-
ing in advance where to look. People learn automatically in all aspects
of life. These are just facets of common-sense. Common-sense is not
an “area of expertise,” but a general—that is, domain-independent—
capacity that has to do with fluidity in representation of concepts, an
ability to sift what is important from what is not, an ability to find
unanticipated analogical similarities between totally different con-
cepts. (1985, 640).
124 Chapter 6
“w
Chapter 7
The Multiplicity of Mind: A Limited Defence of
Classical Cognitivism
2 Against Uniformity
All too often, the debate between the proponents and doubters of PDP
approaches assumes the aspect of a holy war. One reason for this, I suspect,
is an implicit adherence to what I shall call the general version of the unifor-
mity assumption. It may be put like this:
Every cognitive achievement is psychologically explicable using only
the formal apparatus of a single computational architecture.
I shall say more about the terms of this assumption in due course. The
essential idea is easy enough. Some classical cognitivists believe that all
cognitive phenomena can be explained by models with a single set of basic
types of operation (see the discussion in chapter 1, section 4; in section 5
below; and in chapter 8). This set of basic operations defines a computa-
tional architecture in the sense outlined in chapter 1, section 4. Against this
view some PDP theorists seem to urge that the kinds of basic operation
made available by their models will suffice to construct accurate psycho-
logical models of all cognitive phenomena (see chapters 5 and 6 above).
Each party to this dispute thus appears to endorse its own version of the
uniformity claim. The Classical-cognitivist version is:
Every cognitive achievement is psychologically explicable by a model
that can be described using only the apparatus of classical cognitivism.
The PDP version is:
Every cognitive achievement is psychologically explicable by a model
that can be described using only the apparatus of PDP.
The argument I develop will urge that we resist the uniformity assurnption
in all its guises. Instead, I endorse a model of mind that consists of a multi-
tude of possibly virtual computational architectures adapted to various task
The multiplicity of mind 129
would find the task quite difficult. But suppose I allow you to use pen,
paper, and the Arabic numerals. The task becomes simple. For the series
7,4, 9, 5, 2, 1, 6, 9 isolate every second number (4,5, 1,9), add 2 (6, 7, 3, 11),
and sum the total (27).
Rumelhart, Smolensky, et al. develop a similar example involving long
multiplication. Most of us, they argue, can learn to just see the answer to
some basic multiplication questions, e.g., we can just see that 7 x 7 is 49.
This, they suggest, is evidence of a pattern-completing mechanism of the
usual PDP variety. But for most of us longer multiplications present a dif-
ferent kind of problem. 722 x 942 is hard to do in the head. Instead, we
avail ourselves (at least in the first instance—see below) of an external
formalism that reduces the bigger task to an iterated series of familiar
relaxation steps. We write:
722
x 942
One thing does seem clear. The apparatus that Rumelhart et al. give
won't quite do. The difficulty of seeing how we construct external for-
malisms in the first place is solved, they say, by the facts of cultural trans-
mission (Rumelhart, Smolensky, et al. 1986, 47). Representational systems,
they rightly point out, are not easy to come by. Those we have grew grad-
ually out of a long historical process of alternation and addition to simple
systems. This is fine. They are other examples, no doubt, of the pervasive
principle of gradualistic holism, examined in chapter 4. But it leaves un-
touched the real problem of how we can recognize anything as an external
representational formalism at all. It is here, I believe, that the most pro-
found problems lie.
And there are further difficulties. Even if one has both an external repre-
sentational formalism and an understanding of what it is for some squiggles
to represent something, still the very deployment of the formalism looks
more problematic than Rumelhart et al. allow. Take the example of long
multiplication. It is not simply a matter of performing an iterated series of
basic pattern-matching operations. For as I remarked earlier, we must store
the intermediate results on paper (or in our mental model) according to a
well-devised scheme. That is, if we compute 7 X 7 as part of a long multi-
plication, we do not simply store 49; rather, we store a “9” in one place in
a sequence on paper and carry on 4 to the next operation. What kind of
control and storage structures are necessary for this? And does PDP alone
enable us to instantiate them?
All these issues, I believe, are important lacunas in speculations about
real symbol processing. For the purposes of this chapter, I am assuming
they can be overcome. But we should bear them firmly in mind, especially
in the light of the suggestion that understanding the mind requires under-
standing a number of virtual architectures, with correct psychological model
associated with each (see below).
involve more than one of these virtual machines. We also need to consider
the qualitative effects of implementing a symbol processor in a parallel
distributed architecture. The PDP group’s insistence that the cognitivist
account, though often useful, must always have the status of some more
or less accurate approximation to the true computational, psychological
story is, I suggest, based on a subtle misreading of the moral of such
cases. I return to these matters in the next chapter. For the present, the
lesson is straightforward. If the account just developed is at all plausi-
ble, any assumptions of uniformity are premature. There may be a full
spectrum of relations between PDP and cognitivist models of cognition
according to whether the task aspect under study involves intuitive process-
ing, conscious reasoning, or a full simulation of a gross symbol-processing
mechanism.
6 BACON, an Illustration
discovery have two prime characteristics, which should by now attract our
attention.
+ The flash of insight is typically fast. The idea just comes to us, and
we have no conscious experience of working for it.
» The flash of insight involves using rather abstract perceived pat-
terns in one domain of our experience to suggest ways of structuring
our ideas about some other apparently far removed domain.
In the light of these characteristics it is not be absurd to suggest that some
PDP mechanism is operating in such cases. Simon appears to reject this
idea, insisting that these very fast processes are in no way fundamentally
different from the processes of serial, heuristic search used by BACON
(a full quote is given in chapter 1, section 4). This time, it seems, it may be
the conventional theorist who is too quickly assuming uniformity across
the cognitive domain.
A better position is well explained by D. Norman: “People interpret the
world rapidly, effortlessly. But the development of new ideas, or evaluation
of current thoughts proceeds slowly, serially, deliberately. People do seem
to have at least two modes of operation, one rapid, efficient, subconscious,
the other slow, serial and conscious” (1986, 542). According to this model
(which was also endorsed by Smolensky; see chapter 6, section 3), the com-
putational substrate of human thought comprises at least two strands. One,
the fast, pattern-seeking operations of a PDP mechanism, the other the
slow, serial, gross-symbol-using, heuristic-guided search of classic cogni-
tivism. If my earlier speculations are at all correct, this latter strand may at
times be dependent on a virtual symbol-processing architecture, possibly
created by our capacity to exploit real environmental structures (but the
genesis of the simulation is not what is at issue here). Were it not for their
strange commitment to a single functional architecture (see chapter 1),
such a picture ought to be quite acceptable to theorists like Langley, Simon,
Bradshaw, and Zytkow. Certainly, they actually set out to model the
processes of slow, conscious, serial reasoning. For example, in defence of
the seriality of their approach, they suggest that “for the main processes
in any task requiring conscious attention, short term memory serves as a
severe bottleneck that forces processes to be executed serially” (Langley
et al. 1987, 113; my emphasis). It seems, then, as if the folk-psychological
category of “scientific discovery” fails to pick out a single computational
kind. Instead, it papers over the difference between conscious toil and
unconscious insight to fix on the product of both processes (new scien-
tific ideas), a product of undoubted significance in human life. This is just
what the conjectures of chapter 3 should lead us to expect. The folk-
psychological view of the cognitive terrain is a view from within the
environmentally and culturally rich mesh of human practices. It has no
The multiplicity of mind = 141
x ae = a
. ro ee > $m *
S, 4, Bet Oat SAT tr See.
re a ‘ei War ti aes ms
ae."
; iy oth gl ane «oe hate a
bare
: Wee o eel ale,=age iad
eee
a
7 kc fame’ ae ‘? sioekesies
sgety
mi t a, cao
=
“tees
=o
he
c
= ~<
J :
¥
a
cpa pote wt here |
2
% ea Fi’ gto
t a ce wit patie
ke aie ve Cees,He
+s (Om FIX aces
Chapter 8
Structured Thought, Part 1
think bRa also. But they allow that this need not be so. “It is,” they write, “an
empirical question whether the cognitive capacities of infraverbal orga-
nisms are often structured that way” (1988, 41). Now, it is certainly true
that an animal might be able to respond to aRb and not to bRa. But my
claim is that in such a case (ceteris paribus) we should conclude not that it
has, say, the thought “a is taller than b” but cannot have the thought “b is
taller than a.” Rather, its patent incapacity to have a spectrum of thoughts
involving a, b, and the taller-than relation should defeat the attempt to
ascribe to it the thought that a is taller than b in the first place. Perhaps it
has a thought we might try to describe as the thought that a-is-taller-than-
b. But it does not have the thought reported with the ordinary sentential ap-
paratus of our language. For grasp of such a thought requires a grasp of its
component concepts, and that requires satisfying the generality constraint.
In short, Fodor and Pylyshyn’s “empirical” observation that you don’t
find creatures whose mental life consists of seventy-four unrelated thoughts
is no empirical fact at all. It is a conceptual fact, just as the “thinking” robot's
failure to have a single, isolated thought is a conceptual fact. Indeed, the
one is just a limiting case of the other. A radically punctate mind is no mind
at all.
These observations should begin to give us a handle on the actual nature
of thought ascription. Thought ascription, as we saw in chapter 3, is a
means of making sense of a whole body of behavior (actual and counter-
factual). We ascribe a network of thoughts to account for and describe a rich
variety of behavioral responses. This picture of thought ascription echoes
the claims made in Dennett (1981). The folk-psychological practice of
thought ascription, he suggests, “might best be viewed as a rationalistic
calculus of interpretation and prediction—an idealizing, abstract, instru-
mentalistic interpretation method that has evolved because it works” (1981,
48). If we put aside the irrealistic overtones of the term “instrumentalism”
(a move Dennett himself now approves of—see Dennett 1987, 69-81), the
general idea is that thought ascription is an “abstract, idealising, holistic”
process, which therefore need not correspond in any simple way to the
details of any story of in-the-head processing. The latter story is to be told
by what Dennett (1981) calls “sub-personal cognitive psychology.” In
short, there need be no neat and tidy quasireductive biconditional linking
in-the-head processing to the sentential ascriptions of belief and thought
made in daily language. Instead, a subtle story about in-the-head processing
must explain a rich body of behavior (actual and counterfactual, external
and internal), which we then make holistic sense of by ascribing a systematic
network of abstract thoughts.
It may now seem that we have succeeded in merely relocating the system-
aticity that Fodor and Pylyshyn require. For though it is a conceptual fact,
and hence as unmysterious to a connectionist as to a classicist, that thoughts
148 Chapter 8
are systematic, it is a plain old empircal fact that behavior (which holistically
warrants thought ascriptions) is generally as systematit as it is. If behavior
wasn't systematic, the upshot would be, not punctate minds, but no minds.
But that it is systematic is an empirical fact in need of explanation. That
explanation, according to Fodor and Pylyshyn, will involve wheeling
out the symbolic combinatorial ‘apparatus of classical AI. So doesn’t the
classicist win, though one level down, so to speak? No, at least not without
an independent argument for what I called conceptual-level compositional
structure. It’s time to say what that means. ?
One pivotal difference between classical accounts and those that are
genuinely and distinctively connectionist lies, according to Fodor and
Pylyshyn, in the nature of the internal representations they posit. Recall
that classicists posit internal representations that have a semantic and syn--
tactic structure similar to the sentences of a natural language. This is often
put as the claim that “classical theories—but not connectionist theories—
postulate a ‘language of thought’” (Fodor and Pylyshyn 1988, 12). And
what that amounts to is at least that the internal representation, like a
sentence of natural language, be composed of parts that, together with
syntactic rules, determine the meanings of the complex strings in which
they figure. It is further presumed that these parts will more or less line up
with the very words that figure in the sentences that report the thoughts.
Thus, to have the thought that John loves the girl is to stand in some
relation to a complex internal token whose proper parts have the context-
independent meanings of “John,” “loves,” and so on. This is what it is to
have a conceptual-level compositional semantics for internal representations.
Distributed connectionists, in contrast, were seen not to posit recurrent.
internal items that line up with the parts of conceptual-level descriptions.
Thus, “The coffee is in the cup” would, we saw, have a subpattern that
stands for “coffee.” But that subpattern will be heavily dependent on
context and will involve microfeatures that are specific to the in-the-cup
context. We need not dwell further on the details of this difference here.
For our purposes, the important point is simply this: There is no inde-
pendent argument for the conceptual-level compositionality of internal
representations. And without one, systematicity does not count against
connectionism.
Let us see how this point works. Fodor and Pylyshyn require a kind
of systematicity that argues for a language of thought, ie., for a system
of internal representations with conceptual-level compositionality. One
approximation to such an argument in their text is the following: “It is ...
only insofar as ‘the’ ‘girl’ ‘loves’ and ‘John’ make the same semantic con-
tribution to ‘John loves the girl’ that they make to ‘The girl loves John’ that
understanding the one sentence implies understanding the other” (Fodor
and Pylyshyn 1988, 42). If the locus of systematicity in need of explanation
Structured thought 149
4 Cognitive Architecture
Fodor and Pylyshyn also criticize connectionists for confusing the level of
psychological explanation ard the level of implementation. Of course, the
brain is a connectionist machine at one level they say. But that level may
not be identical with the level of description that should occupy anyone
interested in our cognitive architecture. For the latter may be best described
in the terms appropriate to some virtual machine (a classical one, they
believe) implemented on a connectionist substructure. A cognitive archi-
tecture, we are told, “consists of the set of basic operations, resources,
functions, principles, etc. ... whose domain and range are the representa-
tional states of the organism” (Fodor and Pylyshyn, 1988, 10). Fodor and
Pylyshyn’s claim is that such operations, resources, etc. are fundamentally
classical; they consist of structure-sensitive processes defined over internal,
classical, conceptual-level representations. Thus, if we were convinced of
the need for classical representations and processes, the mere fact that the
brain is a kind of connectionist network ought not to impress us. Connec-
tionist architectures can be implemented in classical machines and vice
versa.
This argument in its pure form need not concern us if we totally rejeet
Fodor and Pylyshyn’s reasons for believing in classical representations and
processes. But it is, I think, worth pausing to note that an intermediate
Structured thought 151
Questioner: Are you claiming that that’s how human beings think then?
Speaker: Oh no. Absolutely not. Of course, our brains don’t use a
predicate calculus. In fat, it’s unlikely that the algorithms we use bear
any relation to logical calculi at all.
Questioner: So your project is really part of technological Al. You
want some program to get a certain input-output mapping right, but
you don't really care about how humans think.
Speaker: Well, not exactly. Really, it’s hard to say what the project is,
because it is human thought that we're trying to model.
Such exchanges are by no means uncommon. They are not restricted to
fledgling AI workers, nor are they the exclusive hallmark of conventional
cognitivists.
Here is an admittedly highly speculative hypothesis about the cause of
such confusion. The received wisdom is that AI comes in two varieties:
technological Al, in which the goal is simply to get a machine to do
something with no commitment to producing a model of human psychol-
ogy, and psychological AI (or cognitive science), in which the goal is to
produce a computational model of human or animal psychological states
and processes.
But suppose that the arguments developed in chapter 3 have some force.
Suppose, that is, that thought ascription is essentially a matter of imposing
a holistic interpretation upon a large body of behavior in an environmental
they
context. The individual thoughts thus ascribed are perfectly real, but
projectibl e, computat ional
are not the kind of entities that have neat,
ical
analogue in the brain. What then becomes of the project of psycholog
Al or, more generally, cognitive science?
ve
The radical conjecture I would like briefly to pursue is this. Cogniti
, each laudabl e and legitim ate
science turns out to encompass two projects
of under-
but absolutely distinct. These projects coincide with two ways
model. A psychol ogical model may
standing the notion of a psychological
thought i.e., the
be a model of the complex structure of human (or animal)
could be a model
holistic network of ascriptions of contentful states. Or it the
make possible
of the computational operations in the brain that in part
e using proposi tional- attitude talk.
rich and varied behavior we describ
models will typicall y be nonisomor-
Contrary to what Fodor thinks, these are two
ons (see below). In short, there
phic, though there may be excepti cogni-
cogniti ve science and causal
kinds of cognitive science: descriptive
tive science.
l theory or model
Descriptive cognitive science attempts to give a forma
hts, using the com-
of the structure of the abstract domain of thoug
puter program as a tool or medium.
154 Chapter 8
and grammars of (1) but also on the structure of the brain, psycholinguistic
evidence, and even, perhaps, evolutionary conjectures concerning the origins
of speech and language (see, e.g., Tennant 1984). In short, what is needed is
clarity concerning the goals of various studies, not a victory of one choice
of study over another. Devitt and Sterelny strike a nice balance, concluding
that linguists are usefully studying not internal mechanisms but “the truth-
conditionally relevant syntactic properties of linguistic symbols” (1984,
146), while nonetheless allowing that such studies may illuminate some
general features of internal mechanisms and hence (quite apart from their
intrinsic interest) may still be of use to the theorist concerned with brain
structures.
What is thus true of the study of grammar is equally true, I suggest, of the
study of thought. Contentful thought is what is described by propositional-
attitude ascriptions. These ascriptions constitute a class of objects suscep-
tible to various formal treatments, just as the sentences judged grammatical
constitute a class of objects susceptible to various formal treatments. In
both cases, computational approaches can help suggest and test such treat-
ments. But in both cases these computational treatments and a psychologi-
cally realistic story about the brain basis of sentence production or holding
propositional attitudes may be expected to come apart.
or
Chapter 9
Structured Thought, Part 2
McClelland 1986 (216-271). The point of the exercise for Rumelhart and
McClelland was to provide an alternative to the psychologically realistic
interpretation of theories of grammar described briefly in the previous
chapter. The counter-claim made by Rumelhart and McClelland is that “the
mechanisms that process language and make judgments of grammaticality
are constructed in such a way that their performance is characterisable by
[grammatical] rules, but that the-tules themselves are not written in explicit
form anywhere in the mechanism” (1986, 217).
Thus construed, the past-tense-acquisition network, would aim to pro-
vide an alternative to what I called propositional psychological realism in
chapter 8, section 6, i.e. the view that grammatical rules are encoded in a
sentential format and read by some internal mechanism. But this, as we
saw, is a very radical claim and is by no means made by all the proponents
of conventional symbol-processing models of grammatical competence. It
turns out, however, that this PDP model in fact constitutes a challenge even
to the weaker, and more commonly held, position of structural psycho-
logical realism. Structural psychological realism is here the claim that the
in-the-head information-processing system underlying grammatical com-
petence is structured in a way that makes the rule-invoking description
exactly true. As Pinker and Prince put it, “Rules could be explicitly inscribed
and accessed, but they also could be implemented in hardware in such a
way that every consequence of the rule-system holds. [If so] there is a clear
sense in which the rule-theory is validated” (1988, 168).
The past-tense network challenges structural psychological realism by
generating the systematic behavior of past-tense formation without re-
specting the information-processing articulation of a conventional model.
At its most basic, such articulation involves positing separate, rule-based
mechanisms for generating the past tense of regular verbs and straight-
forward memorization mechanisms for generating the past tense of irregular
verbs. Call these putative mechanisms the nonlexical and the lexical com-
ponents respectively. On the proposed PDP model, “The child need not
decide whether a verb is regular or irregular. There is no question as to
whether the inflected form should be stored directly in the lexicon or
derived from more general principles. ... A uniform procedure is applied for
producing the past tense form in every case” (Rumelhart and McClelland
1986, 267).
One reason for positing the existence of a rule-based, nonlexical com-
ponent lies in the developmental sequence of the acquisition of past tense
competence. It is this developmental data that Rumelhart and McClelland
are particularly concerned to explain in a novel way. The data show three
stages in the development of a child’s ability to correctly generate the past
tense of verbs (Kuczaj 1977). In the first stage the child can give the cor-
rect form for a small number of verbs, including some regular and some
Structured thought 163
irregular ones. In the second stage the child overregularizes; she seems to
have learned the regular “-ed” ending for English past tenses and can give
this ending for new and even made-up verbs. But she will now mistakenly
give an “-ed” ending for irregular verbs, including ones she got right at
stage one. The overregularization stage has two substages, one in which
the present form gets the -ed ending (e.g., “come” becomes “comed”) and
one in which the past form gets it (e.g., “ate” becomes “ated” and “came”
becomes “camed”). The third and final stage is when the child finally gets
it right, adding ‘-ed” to regulars and novel verbs and generating various
irregular or subregular forms for the rest.
Classical models, as Pinker and Prince note, account for this data in an
intuitively obvious way. They posit an initial stage in which the child has
effectively memorized a small set of forms in a totally unsystematic and
unconnected way. This is stage one. At stage two, according to this story,
the child manages to extract a rule covering a large number of cases. But
the rule is now mistakenly deployed to generate all past tenses. At the final
stage this is put right. Now the child uses lexical, memorized, item-indexed
resources to handle irregular cases and nonlexical, rule-based resources to
handle regular ones.
Classical models, however, typically exhibit a good deal more structure
than this bare minimum (see, e.g., the model in Pinker 1984). The processing
is decomposed into a set of functional components including a lexicon of
structural elements (items like stems, prefixes, suffixes, and past tenses), a
structural rule system for such elements, and phonetic elements and rules.
A classical model so constructed will posit a variety of mechanisms that
represent the data differently (morphological and phonetic representations)
with access and feed relations between the mechanisms. In a sense, the
classical models here are transparent with respect to the articulation of
linguistic theory. Distinct linguistic theories dealing with, e.g., morphology
and phonology are paired with distinct in-the-head, information-processing
mechanisms.
The PDP model challenges this assumption that in-the-head mechanisms
mirror structured, componential, rule-based linguistic theories. It is not
necessary to dwell in detail on the Rumelhart and McClelland model to see
why this is so. The model takes as input a representation of the verb con-
structed entirely out of phonetic microfeatures. It uses a standard PDP
pattern associator to learn to map phonetic microfeature representations of
the root form of verbs to a past-tensed output (again expressed as a set of
phonetic microfeatures). It learns these pairings by the usual iterated pro-
cess of weight adjustments described in previous chapters. The basic struc-
ture of the model is thus: phonetic representations of root forms are input
into a PDP pattern associator, and phonetic representations of past forms
result as output.’
164 Chapter 9
produce almost any behavior you care to name. But a deep reliance on
highly structured inputs may reduce the psychological attractiveness of
such models. Moreover, the space of counterfactuals associated with an
input-driven model may be psychologically implausible. Given a different
set of inputs, these models might go straight to stage 2, or even regress
from stage 2 to stage 1. It is at least not obvious that human infants enjoy
the same degree of freedom. ~
Blending
We saw in section 2 above how the model generates errors by blending
two such patterns as from “eat” to “ate” and from “eat” to “eated” to
produce the pattern from “eat” to “ated.” By contrast a conventional rule-
based account would posit a mechanism specifically geared to operate on
the stems of regular verbs, inflecting them as required. If this nonlexical
component were mistakenly given “ate” as a stem, it would simply inflect
it sausage-machine fashion into “ated.” The choice, then, is between an
explanation by blending within a single mechanism and an explanation of
misfeeding within a system that has a distinct nonlexical mechanism. Pinker
and Prince (1988, 157) point to evidence which favors the latter, classical
option.
If blending is the psychological process. responsible, it is reasonable to
expect a whole class of such errors. For example, we might expect blends
of common middle-vowel changes and the “-ed” ending (from “shape” to
“shipped” and from “sip” to “sepped”). Children exhibit no such errors. If,
on the other hand, the guilty process is misfeed to a nonlexical mechanism,
we should expect to find other errors of inflection based on a mistaken stem
(from “went” to “wenting”). Children do exhibit such errors. ,
Structured thought 167
Microfeature representations
The Rumelhart and McClelland model relies on the distinctive PDP device
of distributed microfeature representation. The use of such a form of repre-
sentation buys a certain kind of automatic generalization. But it may not
be the right kind. The model, we saw, achieves its ends without applying
computational operations to any syntactic entities with a projectible se-
mantics given-by such labels as “stem” or “suffix.” Instead, its notion of
stems is just the center of a state space of instances of strings presented for
inflection into the past tense. The lack of a representation of stems as such
deprives the system of any means of encoding the general idea of a regular
past form (i.e., “stem + ed”). Regular forms can be produced just in case
the stem in a newly presented case is sufficiently similar to those encoun-
tered in training runs. The upshot of this is a much more constrained gen-
eralization than that achieved within a classical model, which incorporates
a nonlexical component. For the latter would do its work whatever we gave
it as input. Whether this is good or bad (as far as the psychological realism
of the model is concerned) is, I think, an open question. For the moment, I
simply note the distinction. (Pinker and Prince clearly hold it to be bad; see
Pinker and Prince 1988, 124.)
A more general worry, stemming from the same root, is that generaliza-
tion based on pure microfeature representation is blind. Pinker and Prince
note that when humans generalize, they typically do so by relying on a
theory of which microfeatures are important in a given context. This knowl-
edge of salient features can far outweigh any more quantitative notion of
similarity based simply on the number of common microfeatures. They
write, “To take one example, knowledge of how a set of perceptual features
was caused ... can override any generalizations inspired by the object's
features themselves: for example, an animal that looks exactly like a skunk
will nonetheless be treated as a raccon if one is told that the stripe was
painted onto an animal that had raccon parents and raccoon babies” (Pinker
and Prince 1988, 177). Human generalization, it seems, is not the same
as the automatic generalization according to similarity of microfeatures
found in PDP. Rather, it is driven by high-level knowledge of the domain
concerned.
To bring this out, it may be worth developing a final example of my
own. Consider the process of understanding metaphor, and assume that a
successful metaphor illuminates a target domain by means of certain fea-
tures of the home domain of the metaphor. Suppose further that both the
metaphor and the target are each represented as sets of microfeatures thus:
<MMF,,..., MME,> and <TMF,,..., TMF,> (MMF = metaphor micro-
feature, TMF = target microfeature). It might seem that the necessary
capacity to conceive of the target in the terms suggested by the metaphor
is just another example of shading meaning according to context, a ca-
168 Chapter 9
pacity that as we've seen, PDP systems are admirably suited to exhibit.
Thus, just as we earlier saw how to conceive of a bedroom along the lines
suggested by inclusion of a sofa, so we might now expect to see how to
conceive of a raven along the lines suggested by the contextual inclusion
of a writing desk.
But in fact there is a very importance difference. For in shading the
meaning of bedroom, the relevant microfeatures (i.e., sofa) were already
specified. Both the joy and mystery of metaphor lies in the lack of any such
specification. It is the job of one who hears the metaphor to find the salient
features and then to shade the target domain accordingly. In other words,
we need somehow to fix on a salient subset of (MMF,,..., MMF,,>. And
such fixation must surely proceed in the light of high-level knowledge
concerning the problem at hand and the target domain involved. In short,
not all microfeatures are equal, and a good many of our cognitive skills
depend on deciding according to high-level knowledge which ones to attend
to in a given instance.
4 Pathology
And the bad news just keeps on coming. Not only do we have the charges
of the Pinker and Prince critique to worry about. There is also a body of
somewhat recalcitrant pathological data.
Consider the disorder known as developmental dysphasia. Develop-
mental dysphasics are slow at learning to talk, yet appear to suffer from
no sensory, environmental, or general intellectual defect. Given the task
of repeating a phonological sequence, developmental dysphasics will typi-
cally return a syntactically simplified version of the sentence. For example,
given “He can’t go home,” they produce “He no go” or “He no can go.”
The simplifications often include the loss of grammatical morphemes—
suffixes marking tense or number—and generally do not affect word
stems. Thus “bees” may become “bee,” but “nose” does not become “no.”
(The above is based on Harris and Coltheart 1986, 111.) The existence of a
deficit that can impair the production of the grammatical morphemes while
leaving the word stem intact seems prima facie to be evidence for a distinct
nonlexical mechanism. We would expect such a deficit whenever the non-
lexical mechanism is disengaged or its output ignored for whatever reason.
Or again, consider what is known as surface dyslexia. Some surface
dyslexics lose the capacity correctly to read aloud irregular words, while
retaining the capacity to pronounce regular words intact. When faced with
an irregular words, such patients will generate a regular pronounciation for
it. Thus, the irregular word “pint” is pronounced as if it rhymed with
regular words like “mint.” This is taken to support a dual-route account of
reading aloud, i.e. an account in which a nonlexical component deals with
Structured thought 169
regular words. “If the reading system does include these two separate pro-
cessing components, it might be possible that neurological damage could
impair one component whilst leaving the other intact, to produce [this]
specific pattern of acquired dyslexia” (Harris and Coltheart 1986, 244).
Such data certainly seems to support a picture that includes at least some
distinct rule-based processing, a picture that on the face of it is ruled out by
single-network PDP models.
However, caution is needed. Martin Davies has pointed out that such a
conclusion may be based on an unimaginative idea of the ways in which a
single network could suffer damage (Davies, forthcoming, 19). Davies does
not develop a specific suggestion in print,* but we can at least imagine the
following kind of case. Imagine a single network in which presented words
must yield a certain level of activation of some output units. And imagine
that by plugging into an often-repeated pattern, the regular words have, as
it were, worn a very deep groove into the system. With sufficient training,
the system can also learn to give correct outputs (pronounciation instruc-
tions) for irregular words. But the depth of groove here is always less than
that for the regular words, perhaps just above the outputting threshold. Now
imagine a kind of damage that decrements all the connectivity strengths by
10 percent. This could move all the irregular words below the threshold,
while leaving the originally very strong regular pattern functional. This
kind of scenario offers at least the beginnings of a single network account
of surface dyslexia. For some actual examples of the way PDP models could
be used to account for pathological data, see McClelland and Rumelhart
1986, which deals with various amnesic syndromes.
Pathological data, I conclude, at best suggests a certain kind of classical
structuring of the human information-processing system into lexical and
nonlexical components. But we must conclude with Davies that such data
is not compelling in advance of a thorough analysis of the kinds of break-
down that complex PDP systems can exhibit. It seems, then, that we are
left with the problems raised by the Pinker and Prince critique. In the next
section I shall argue that although these problems are real and significant,
the conclusions to which they lead Pinker and Prince are by no means
commensurate with their content.
models are not mere isotropic node tangles, they will themselves have
properties that call out for explanation. We expect that in most cases these
explanations will constitute the macro-theory of the rules that the system
would be said to implement” (Pinker and Prince 1988, 171). There are two
claims here that need to be distinguished.
(1) Any PDP model exhibiting some classical componential struc-
turing is just an implementation of a classical theory.
(2) The explanation of this broad structuring will typically involve
the use of classical rule-based models.
Claim (1) is clearly false. Even if a large connectionist system needs to
deploy a complete, virtual, symbol-processing mechanism (recall chapter 7),
it by no means follows that the overall system produced merely imple-
ments a classical theory of information processing in that domain. This is
probably best demonstrated by some examples.
Recall the example (chapter 8, section 4) of a subconceptually imple-
mented rule interpreter. This is a virtual symbol processor—a symbol pro-
cessor and rule-user realized in a PDP substructure. Now take a task such as
the creation of a mathematical proof. In such a case, we saw, the system
could use characteristic PDP operations to generate candidate rules that
would be passed to the rule interpreter for inspection and deployment.
Such a system has the best of both worlds. The PDP operations provide
an intuitive (best-match), context-sensitive choice of rules. The classical
operations ensure the validity of the rule (blends are not allowed) and its
strict deployment.
Some such story could be told for any truly rule-governed domain. Take
chess, for example. In such a domain a thoroughly soft and intuitive system
would be prone to just the kinds of errors suggested by Pinker and Prince.
The fact that someone learns to play chess using pieces of a certain shape
ought not to cause her to treat the bishops in a new set as pawns because
of their microfeature similarity to the training pawns. Chess constitutes a
domain in which absolute hard, functional individuation is called for; it also
demands categorical and rigid rule-following. It would be a disaster to
allow the microfeature similarity of a pawn to a bishop to prompt a
blending of the rules for moving bishops and pawns. A blend of two good
rules is almost certain to be a bad one. Yet a combined PDP and virtual
symbol-processing system would again exhibit all the advantages outlined.
It would think up possible moves fluidly and intuitively, but it could
subject these ideas to very high-level scrutiny, identify pieces by hard,
functional individuation and be absolutely precise in its adherence to the
explicit rules of the game.
As a second example, consider the problem of understanding metaphor
raised earlier. And now imagine a combined PDP and virtual symbol-
172 Chapter 9
processing (VSP) system that operates in the following way. The VSP
system inspects the microfeature representation of the metaphor and the
target. On the basis of high-level knowledge of the target domain it
chooses a salient set of metaphor microfeatures. It then activates that set
and allows the characteristic PDP shading process to amend the represen-
tation of the target domain as required.
Finally, consider the three-stage-developmental case itself, and imagine
that there is, as classical models suggest, a genuine distinction between
lexical and nonlexical processing strategies. But suppose, in addition, that
the nonlexical process is learned by the child and that the learning process
itself is to be given a PDP model. This yields the following picture:
Stage 1. Correct use, unsystematic. This stage is explained by a pure
PDP mechanism of storage and recall.
Transition. A PDP model involving endogenous (and perhaps innate)
structuring, which forces the child to generate a nonlexical proces-
sing strategy to explain to itself the regularities in its own language
production
Stage 2. Overregularization due to sudden reliance on a newly formed
nonlexical strategy
Transition. A PDP model of tuning by correction
Stage 3. Normal use. The coexistence of a pure PDP mechanism of
lexical access and a nonlexical mechanism implemented with PDP
If some such model were accurate (and something like this model is in fact
contemplated in Karmiloff-Smith 1987), we would not have a classical picture
of development, although we might have a classical picture of adult use.°
To sum up, the mere fact that a system exhibits a degree of classical
structuring into various components (one of which might be a rule inter-
preter) does not force the conclusion that it is a mere implementation of a
classical theory. This is so because (a) the classical components may call
and access powerful PDP operations of matching, search, blending and
generalization and (b) the developmental process by which the system
achieves such structure may itself require a PDP explanation. Claim (1) thus
fails. It may be, however, that to understand why the final system must
have the structure it does, we will need to think in classical, symbol-
manipulating terms. This second claim (claim 2, p. 171) is considered in the
next section.
ulate gross symbolic representations but lack any rich pattern matching
substructure.
If this picture is correct, we should maintain a dual thesis concerning
explanation and instantiation. We should hold that good psychological
explanations will often involve mixed models and hence will require analysis
in both PDP and classical symbol-manipulating terms. But we may also hold
that instantiating any contentful psychological state requires not just the
manipulation of gross symbolic structures but also access to the output of
a powerful subsymbolic processor. A virtual symbol processor provides
guidance and rigor; the PDP substrate provides the fluidity and inspiration
without which symbol processing is but an empty shell. In words that Kant
never used: subsymbolic processing without symbolic guidance is blind;
symbolic processing without subsymbolic support is empty.
if >
a
:_
gar is wus hpsyee 7 : ao
Stee abowtde tote kul gem Mpet trang waka rent. olay 7
us . ‘ aie -
oo aides o. ela tieyiaaw ahapeatal cee tl <
Oe ee ee ee eeee tel Ai: ‘ontnantea >
divplens pugeh Vasewrarot Uedybinavie pc Btiernl venation sae _
bicti whxeteawnot pasha citertence vadhev enSeaizgehs bonymeena z
one, tpg menawy, 2 ee giaieg= alk Ne (ie IRE
teehee SS Siete ces inhepeyur SAE de
CY fy yt qaeet 4
inti mera) aT. ELS. wT A uwodyriety
Slate
yi trae, Dal. ck calpweng tao intone SCS agh aes bane Mani
Dither Maoh ators eh. yes ghana hays Lantengsbpbiige ce
abel i eee hdge oa or eu ee ean eangeeie dere ae ie
~S wn 6! Seieyeeie sabitiie dnetilidics op 12 oe
os Dey bia = of ge Aw ihe Ruteo heed i
9 i bas 90 Bad rite ( ee i 4
- al 2 rhc paleo ur, ty <Qnkepeth
+ at ae oe o wien y O18 che ay (Poeent
——— ed S- at avciucde . “ i
Ps,
> an teapots qfeertdin
Nd Se" aoetincatieg
> hg ape im,Bindiig:
a. ol
oar
1 yo
ane
,
7 ¢Y
ie
Chapter 10
Reassembling the Jigsaw
1 The Pieces
All the pieces of the jigsaw are now before us, and their subgroupings
are largely complete. Semantically transparent AI models have been de-
scribed and compared with highly distributed connectionist systems. Various
worries about the power and methodology of both kinds of work have
been presented. The possibility of mixed models of cognitive processing
has been raised, and the nature of folk-psychological talk and its role in a
science of cognitive processing has been discussed. Along the way | have
criticized the arguments in favor of Fodor's radical cognitivism, and I was
forced to distinguish two projects within cognitive science: one descriptive
and involving the essential use of classical representations; the other con-
cerned with modeling the computational causes of intelligent behavior and
typically not dependent on such representations. At the end of the previous
chapter I also drew a distinction internal to causal cognitive science: the
distinction between the project of psychological explanation (laying out
the computational causes of intelligent behavior) and that of instantiation
(making a machine that actually has thoughts). These two projects, I sug-
gested, may come apart. This final chapter (which also functions as a kind
of selective summary and conclusion) expands on this last piece of the
jigsaw and tries to display as clearly as possible the overall structure of
what I have assembled. In effect, it displays a picture of the relations of
various parts of an intellectual map of the mind.
One word of warning. Since I should be as precise as possible about
what each part of this intellectual map is doing, for the duration of the
chapter I shall largely do away with shorthand talk of representations,
beliefs, and so on to describe contemporary computer models (recall chap-
ter 6, section 2, and chapter 5, footnote 4). At times this will result in
language that is somewhat cumbersome and drawn out.
178 Chapter 10
2 Building a Thinker
What does it take to build a thinker? Some philosophers are sceptical that a
sufficient condition of being a thinker is satisfying a certain kind of formal
description (see chapter 2). Such worries have typically focused on the
kinds of formal descriptions appropriate to semantically transparent Al. In
one sense we have seen virtue in such worries.’ It has indeed begun to
seem that ‘satisfying certain formal descriptions is vastly inadequate to
s
ensure that the creature satisfying the description has a cognitive apparatu
organized in a way capable of supporting the rich, flexible actual and
counterfactual behavior that warrants an ascription of mental states to it.
(Apologies for the lengthy formulation—you were warned!) Some reasons
for thinking this were developed in chapter 6, where I discussed the holism
and flexibility achieved by systems that use distributed representations and
superpositional storage.
In short, many worries can usefully be targeted in what I am calling the
project of instantiation. They can be recast as worries to the effect that
satisfying the kind of formal description that specifies a conventional,
semantically transparent program will never isolate a class of physical
mechanisms capable of supporting the rich, flexible actual and counterfactual
behavior that warrants ascribing mental states to the system instantiating
such mechanisms. The first stage in an account of instantiation thus involves
the description of the general structure of a mechanism capable of sup-
porting such rich and flexible behavior at the greatest possible level of abstrac-
tion from particular physical devices. Searle seems to believe that we reach
this level of abstraction before we leave the realms of biological descrip-
tion (see chapter 2). I see no reason to believe this, although it could con-
ceivably turn out to be true. Instead, my belief is that some nonbiological,
microfunctional description, such as that offered by a value-passing PDP
approach, will turn out to specify at least one class of physical mechanisms
capable of supporting just the kind of rich and flexible behavior that
warrants ascribing mental states.
This is not to say, however, that its merely satisfying some appropriate
formal description will warrant calling something a thinker. Instead, we need
to imagine a set of conditions jointly sufficient for instantiating mental states,
one of which will involve satisfying some microfunctional description like
those offered by PDP. I spoke of systems that could be properly credited
with mental states if they instantiated such descriptions. And I spoke also
of a mechanism that, suitably embodied, connected and located in a system
would allow us to properly describe it in mentalistic terms. These provisos
indicate the second and final stage of an account of instantiation.
Instantiating a mental state may not be a matter of possessing a certain
internal structure alone. In previous chapters we discovered two reasons to
Reassembling the jigsaw 179
believe that the configuration of the external world might figure among
the conditions of occupying a mental state. The first reason was that
the ascription of mental states may involve the world (chapter 3). The
content of a belief may vary according to the configuration of the world
(recall the twin earth cases [chapter 3, section 4]). And some beliefs (e.g.,
those involving demonstratives) may be simply unavailable in the absence
of their objects. What this suggests is that instantiating certain mental states
may involve being suitably located and connected to the world. (It does
not follow, as far as I can see, that a brain in a vat can have no thoughts at
all.)
We also noted in chapters 4 and 7 a second way in which external facts
may affect the capacity of a system to instantiate mental states. This is
the much more practical dimension of exploitation. A system (ie., a brain
or a PDP machine) may need to use external structures and bodily opera-
tions on such structures to augment and even qualitatively alter its own
processing powers. Thus, suppose that we accept that stage one of an
instantiation account (an account of brain structures) involves a microfunc-
tional specification of something like a PDP system. We might also hold
that instantiating some mental states (for example, all those involving con-
scious, symbolic and logical reasoning) requires that such systems emulate
a different architecture. And we might believe that such emulation is made
possible only by the capacity of an embodied system located in a suitable
environment to exploit real-world structures to reduce complex, serial
processing tasks to an iterated series of PDP operations. PDP systems are
essentially learning devices, and learning devices (e.g., babies) come to
occupy mental states by interacting with a rich and varied environment.
For these very practical reasons the project of full instantiation may be
as dependent on embodiment and environmental structure as on internal
structure.
Most important of all, I suspect, is the holistic nature of thought ascrip-
tion. Thoughts, we may say, just are what gets ascribed using sentences
expressing propositional attitudes of belief, desire, and the like. Such ascrip-
tions are made on the basis of whole nexuses of actual behavior. If this is
the case, to have a certain thought is to engage in a whole range of
behaviors, a range that, for daily purposes, is usefully codified and ex-
plained by a holistically intertwined set of ascribed beliefs and desires.
Since there will be no neat one-to-one mapping of thoughts so ascribed to
computational brain states (see chapters 3 and 8 especially), it follows a
fortiori that there will be no computational brain state that is a sufficient con-
dition of having that thought. The project I have called descriptive cognitive
science in effect gives a formal model of the internal relations of sentences
used to ascribe such thoughts. This is a useful project, but instantiating that
kind of formal description certainly won't give you a thinker. For the
180 Chapter 10
sentences merely describe regularities in the behavior and are not geared to
pick out the syntactic entities that are computationally manipulated to
produce the behavior.
To sum up, the project of instantiation is just the project of creating a
system properly described as occupying mental states. And it involves two
stages (to be pursued cooperatively, not in series). Stage 1 is the descrip-
tion, at the highest possible level of abstraction, of a class of mechanisms
capable of supporting the kind of rich, flexible, actual and counterfactual
behavior needed to warrant the use of a mentalistic vocabulary. That level
of description will turn out, I believe, to be a microfunctional one. It may
well turn out to involve in part a microfunctional specification of PDP
systems in terms of value passing, thresholds, and connectivity strengths.
I also urged that no highly semantically transparent model can fulfil the
requirements of stage 1 of an instantiation, despite the claim made by
Newell and Simon that such approaches capture the necessary and suffi-
cient conditions of intelligent action (see chapter 1, sections 2, 3, and 4).
Stage 2 of the project of instantiation will involve the embodying and
environmental embedding of mechanisms picked out in stage 1. Only once
these systems are embodied, up, and running in a suitably rich environment
will we be properly warranted in our ascriptions of mental states.
3 Explaining a Thinker
The project of instantiation and the project of psychological modeling and
explanation are different. This may seem obvious, but I suspect a great deal
of confusion within cognitive science is a direct result of not attending to
this distinction.
First and most obviously, the project of instantiation requires only that
we delimit a class of mechanisms capable of providing the causal substruc-
ture to ground rich and varied behavior of the kind warranting the ascrip-
tion of mental states. There may be many such classes of mechanisms, and
an instantiation project may thus succeed without first delimiting the class
of mechanisms that human brains belong to. But we may put this notion
aside at least as far as our interest in PDP is concerned. PDP is certainly
neurally inspired and aims to increase our knowledge of the class of
mechanisms to which we ourselves are in some significant way related.
Second and more importantly, even if the microfunctional description
that (for the instantiation project) delimits the class of mechanisms to which
we belong is entirely specified by a PDP-style account, correct psychologi-
cal models and explanations of our thought may also require accounts
couched at many different levels. To bring this out, recall my account of
Marr's picture of the levels of understanding of an information processing
task (chapter 1, section 5). Psychological explanation, according to Rumelhart
Reassembling the jigsaw 181
4 Some Caveats
“er
Epilogue
The Parable of the High-Level Architect
This little story won’t make much sense unless it is read in the context
provided by chapter 4, section 5, and with an eye to the general distinction
between semantically transparent and semantically opaque systems.’
One fine day, a high-level architect was idly musing (reciting Words-
worth) in the cloistered confines of Kings College Chapel. Eyes raised to that
magnificant ceiling, she recited its well-publicized virtues (“that branching
roof, self-poised and scooped into ten thousand cells, where light and shade
repose ...”). But her musings were rudely interrupted.
From a far corner, wherein the fabric of reality was oh so gently parting,
a hypnotic voice commanded: “High Level Architect, look you well upon
the splendours of this chapel roof. Mark well its regular pattern. Marvel
at the star shapes decorated with rose and portcullis. And marvel all the
more as I tell you, there is no magic here. All you see is complex, physical
architecture such as you yourself might re-create. Make this your pro-
ject: go and build for me a roof as splendid as the one you see before
you.
The high-level architect obeyed the call. Alone in her fine glass and steel
office, she reflected on the qualities of the roof she was to re-create. Above
all, she recalled those star shapes, so geometric, so perfect, the vehicle of
the rose and portcullis design itself. “Those shapes,” she concluded, “merit
detailed attention. Further observation is called for. I shall return to the
chapel.”
There ensued some days of patient observation and measurement. At
the end of this time the architect had at her command a set of rules to
locate and structure the shapes in just the way observed. These rules, she
felt sure, must have been followed by the original designer. Here is a small
extract from the high-level architect’s notebook.
| To create ceiling shapes instruct the builder (Christopher Paul Ewe?)
! as follows:
if (build-shapes) then
[(space-shapes (3-foot intervals)),
(align-shapes (horizontal)),
186 Epilogue
(arrange-shapes (point-to-point)),
(locate-shapes (intersection-of-pillar-diagonals))].
Later she would turn her attention to the pillars, but that could wait.
When the time came, she felt, some more rules would do the trick. She had
an idea of one already. It went “If (locate-pillar) then (make-pillar (45°,
star-shape)). It was a bit rough, but it could no doubt be refined. And of
course, there’d be lots more rules to discover. “I do hope,” she laughed,
“that Christopher Paul Ewe is able to follow all this. He'll need to be a fine
logical thinker to do so.” One thought, however, kept on returning like a
bad subroutine. “Why are things arranged in just that way? Why not have
some star shapes spaced further apart? Why not have some in a circle
instead of in line? Just think of all the counterfactual possibilities. What an
unimaginative soul the original architect must have been after all.”
Fortunately for our heroine’s career, this heresy was kept largely to
herself. The Society for the Examination and Reconstitution of Chapels
gave her large research grants and the project of building a duplicate
ceiling went on. At last a prototype was ready. It was not perfect, and the
light and shadow had a subtly different feel to it. But perhaps that was mere
supersition. C. P. Ewe had worked well and followed instructions to the
letter. The fruits of their labors were truly impressive.
One day, however, a strange and terrible thing happened. An earth-
quake (unusual for the locale) devastated the original chapel. Amateur
video, miraculously preserved, records the event. The high-level architect,
upon viewing the horror, was surprised to notice that the star shapes fell
and smashed in perfect coincidence with the sway and fall of neighbouring
pillars. “How strange,” she thought; “I have obviously been missing a
certain underlying unity of structure here.” The next day she added a
new rule to her already massive notebooks: “If (pillar-falls) then (make-fall
(neighboring-star-shape)).” “Of course,” the architect admitted, “such a rule
is not easy for the builder to follow. But it’s nothing a motion sensor and
some dynamite can’t handle.”
“wr
Appendix
Beyond Eliminativism
1 A Distributed Argument
The main body of the text contains, in a somewhat distributed array, an
argument against the use of connectionist models to support the position
known as eliminative materialism. In this appendix, I gather those distrib-
uted threads and weave them into an explicit rejection of eliminativism.
The appendix expands on various hints given in chapters 3 and 10 and
connects these, in a somewhat unexpected way, with the idea of mixed
symbolic and connectionist models, introduced in chapter 7. As a bonus, it
introduces a new and interesting way of describing connectionist systems
with a statistical technique known as cluster analysis.
The appendix begins by laying out various types of descriptions of
connectionist systems of the kind we have been considering (section 2). In
section 3 it goes on to expand on the idea (chapter 10) of explanations
that seek to group systems into equivalence classes defined for various
purposes. Each such grouping requires a special vocabulary, and the con-
structs of any given vocabulary are legitimate just insofar as the grouping
is interesting and useful. Section 4 then shows that relative to such a model
of explanation, the constructs of both symbolic AI and commonsense psy-
chology may have a legitimate role to play in giving psychological expla-
nations. This role is not just that of a useful approximation. Section 5 is a
speculative section in which the argument for the theoretical usefulness of
such symbolic constructs is extended to individual processing in a very
natural way. Here the cognizer, in the process of regulating, debugging,
and understanding her own representations, creates symbols to stand for
sets of distributed activity patterns. The section points out the difficulties
for a pure distributed approach that may be eased by the addition of such
symbolic constructs, and it relates my speculations to the continuing debate
over the “correct” architecture of cognition.
188 Appendix
profile of any given network. (For a more detailed account see Smolensky
1988, sections 1 and 2.)
These mathematical specifications play a large and important role in
connectionist cognitive science. They are often the only way to under-
stand the distinctive wrinkles in the learning behavior of different kinds
of connectionist systems (for example, Boltzmann-machine learning versus
various forms of supervised learning). They also figure in explanations of
specific behaviors and pathologies. In this sense, as Smolensky observes,
“the explanations of behavior provided are like those traditional in the
physical sciences, unlike the explanations provided by symbolic models”
(Smolensky 1988, 1).
in most (or perhaps even all) of the toy examples found in the literature.
Nonetheless, there is clearly a theoretical commitment to something more
radical. Thus, concerning the coffee example, Smolerisky adds, “we should
really use subconceptual features, but even these features (e.g., ‘hot liquid’)
are sufficiently low level to make the point” (1988, 16).
The official line on the semantic shift (or dimension shift) in connec-
tionist representation is that dimension-shifted representations must be of
features that are more subtle than those in an ordinary task analysis of the
problem. The claim is that the elements of a subsymbolic program do not
refer to the “same concepts as are used to consciously conceptualise the
task domain” (Smolensky 1988, 5). Or again, “the units do not have the
same semantics as words of natural language” (Smolensky 1988, 6). We can
now see that these claims can be taken in two ways. The stronger way is
to take the claims to mean that the content to be associated with the
activation of a given unit in context cannot be captured by any formulation
in natural language, however long and hyphenated. The weaker way is to
take the claims to mean that individual unit activations don’t have the
semantics of the single words that occur in a conscious task analysis of the
domain. The latter is clearly the safer claim, at least as long as we avoid
being too imaginative concerning the nature of the task analysis. But it
seems that Smolensky believes that the former, more radical reading will
ultimately prove correct. He thus notes, “Semantically, the subconceptual
level seems at present rather close to the conceptual level” (1988, 8). But
this, he conjectures, is probably because the choice of input and output
representations, a crucial factor in determining what a system will learn, is
heavily based on “existing theoretical analyses of the domain.” It may well
be that truly subsymbolic models (i-e., in the strong sense) will not become
available unless input and output representations can be divorced from our
existing analyses of the domain. Whether this is possible is a question that
would take us too far afield.
It is likely, however, that the real importance of subsymbolic representa-
tion lies not just in what gets represented but in the special properties of
the representational medium that connectionists employ. Much work in
ordinary AI (vision, natural-language processing) depends, after all, on the
representation and manipulation of features quite invisible to daily, con-
scious reflection on the task at hand. Where the two paradigms differ most
radically is surely in the general mode of representation and its associated
properties. In particular, subsymbolic (i.e., connectionist) representation
naturally embodies a kind of semantic metric (I owe this term to Andler
1988), which powers the distinctive features of generalization, graceful
degradation, and so on. The semantic metric is best pictured as a spatial
arrangement of units in a multidimensional space arranged so that semtan-
tically related items are coded for by spatially related feature units. This
Beyond eliminativism 191
fact renders each individual unit pretty well expendable, since its near
neighbors will do almost the same job in generating patterns of activation.
And it is this same fact that allows such systems to generalize (by grouping
the semantically common parts of various items of knowledge), to extract
prototypes, and so on. Classical representation does not involve any such
built-in notion of semantic metric.
Distributed (i.e., microfeatural) representations with a built-in semantic
metric are also responsible for the context dependence of connectionist
representations of concepts. Recall that in what I am calling pure distrib-
uted connectionism there are no units that represent classical conceptual-
level features, such as coffee. Instead, coffee is represented as a set of active
microfeatures. The point about context dependence is that this set will vary
according to the surrounding context. For example, “coffee in cup” may
involve a distributed representation of coffee that includes contacting pro-
celain as a microfeature. But “coffee in jar” would not. Conceptual-level
entities (or “symbols,” to fall in with a misleading terminology) thus have
no stable and recurrent analogue as a set of unit activations. Instead, the
unit-activation vector will vary according to the context in which the
symbol occurred. This, we saw, is an important feature (though at times it
may be a positive defect). It is directly responsible for the oft-cited fluidity
of connectionist representation and reasoning.
If it is not the dimension shift in itself so much as the dimension shift in
conjunction with a built-in semantic metric that is the crucial fact in connec-
tionist processing, then a question arises about the status of the subsym-
bolic level of description. For such descriptions seemed to involve just
listing a set of microfeatures corresponding to an activation vector. But
such a listing leaves out all the facts of the place of each feature in the
general metric embodied by the network. And these facts seem to be of
great semantic significance. What a microfeature means is not separable
from its place in relation to all the other representations the system em-
bodies. For this reason, I would dispute the claim that subsymbolic descrip-
tion (at least, if it is just a listing of microfeatures) affords an accurate
interpretation of the full numerical specifications available in level 1. Perhaps
the resources of natural language (however cannily deployed) are in princi-
ple incapable of yielding an accurate interpretation of an activation vector.
At first sight, such a concession may seem to give the eliminativist an easy
victory. Fortunately, this impression is wrong, as we shall see in due course.
tion its hidden unit space more subtly (in fact, into a distinctive pattern for
each of 79 possible letter to phoneme pairings). Cluster analysis as carried
out by Rosenberg and Sejnowski in effect constructs a hierarchy of parti-
tions on top of this base level of 79 distinctive stable patterns of hidden-
unit activation. The hierarchy is constructed by taking each of the 79
patterns and pairing it with its closest neighbor, i.e., with the pattern that
has most in common with it. These pairings act as the building blocks for
the next stage of analysis. In this stage an average activation profile
between the members of the original pair is calculated and paired with its
nearest neighbor drawn from the pool of secondary figures generated by
averaging each of the original pairs. The process is repeated until the final
pair is generated. This represents the grossest division of the hidden-unit
space that the network learned, a division that, in the case of NETtalk,
turned out to correspond to the division between vowels and consonants.
Cluster analysis thus provides a picture of the shape of the space of the
possible hidden-unit activations that power the network’s performance.
A few comments. First, it is clear that the clusterings learned by NETtalk
(e.g., the vowel and consonant clusterings at the top level) do not involve
novel, unheard of subsymbolic features. This may be due in part to the
system’s reliance on input and output representations that reflect the clas-
sical theory. Even so, the metric of similarity built into the final set of
weights still offers some clear advantages over a classical implementation.
Such advantages will include generalization, various forms of robustness,
and graceful degradation.
For our purposes, the most interesting questions concern the status of the
cluster-theoretic description. Is it an accurate description of the system’s
processing? One prominent eliminativist, Churchland (1989), answers firmly
in the negative. Cluster analysis, he argues, is just another approximate,
high-level description of the system’s gross behavior (see my comments on
levels 4 and 5) and does not yield an accurate description of its processing.
The reason for this is illuminating. It is that the system itself knows nothing
about its own clustering profile, and that profile does not figure in the
statement of the formal laws that govern its behavior (the activation-
evolution and connection-evolution equations of level 1). Thus, Church-
land notes, “the learning algorithm that drives the system to new points in
weight space does not care about the relatively global partitions that have
been made in activation space. All it cares about are the individual weights
and how they relate to apprehended error. The laws of cognitive evolution,
therefore, do not operate primarily at the level of the partitions.... The
level of the partitions certainly corresponds more closely to the “concep-
tual” level..., but the point is that this seems not to be the most important
dynamical level” (1989, 25).
194 Appendix
But if you give the system an ill-posed problem or artificially curtail its
processing time, it still gives what Smolensky calls “sensible performance,”
This is explained by. the underlying subsymbolic nature of its processing,
which will always satisfy as many soft constraints as it can, even if given
limited time and degraded input. The moral of all this, as Smolensky sees it,
is that the theorist may analyze the system at the higher level of, e.g., a set
of production rules. This level will capture some facts about its behavior.
But in less ideal circumstances the system will also exhibit other behavior
that is explicable only by describing it at a lower level. Thus, the unified
account of cognition lies at one of the lower levels (level 2 or level 1,
according to your preference). Hence the famous analogy with Newtonian
mechanics. Symbolic AI describes cognitive behavior, much as Newtonian
mechanics describes physical behavior. They each offer a useful and accu-
rate account in a circumscribed domain. But the unified account lies else-
where (in quantum theory in physics, and in connectionism in cognitive
science). Thus, commenting on the model for solving circuitry problems,
Smolensky notes, “A system that has, at the micro-level, soft constraints
satisfied in parallel, appears at the macro-level, under the right circum-
stances to have hard constraints, satisfied serially. But it doesn’t really,
and if you go outside the “Newtonian” domain you see that it’s really
been a quantum system all along” (1988, 20). Such “Newtonian” analyses
are conceded to be useful in that they may help describe interrelations
between complex patterns of activity that approximate various conceptual
constructs in which the theorist is interested. As Smolensky points out
(1988, 6), such interactions will not be “directly described by the formal
definition of a subsymbolic model”; instead, they must be “computed by
the analyst.”
vidual words used in a belief ascription will not have discrete, recurrent
analogues in the actual processing of the system. Thus, the word “chair”
will not have a discrete analogue, since “chair” will be represented as an
activation vector across a set of units that stand for subsymbolic micro-
features, and it will not have a single recurrent analogue (not even as an
activation vector), since the units that participate and the degree to which
they participate will vary from context to context.
The radical eliminativist takes these facts and conjoins them with a
condition of causal efficacy, which states: a psychological ascription is only
warranted if the items it posits have direct analogues in the production (or
possible production) of behavior. Thus ascribing the belief that cows can’t
fly to John is justified only if there is some state in John in which we can
in principle identify a discrete, interpretable substate with the meaning of
“cow,” “fly,” and so on. Since, according to connectionism, there are no
such discrete, recurrent substates, the radical eliminativist concludes that
commonsense psychology is mistaken and does not afford an accurate
higher-level description of the system in question (John). This is not to say
that such descriptions are dispensable in practice; it is to say only that they
are mistaken in principle.
In the next section I shall sketch an account of explanation that dis-
sociates the power and accuracy of higher-level descriptions from the
condition of causal efficacy, which thereby gives a more liberal, more
plausible, and more useful picture of explanation in cognitive science and
daily life.
3 Explanation Revisited
The eliminativist argues her case as follows.
Step 1. Suppose that pure distributed connectionism offers a correct
account of cognition.
Step 2. It follows that there will be no discrete, recurrent, in-the-
head analogues to the conceptual-level terms that figure in folk-
psychological belief ascription.
Step 3. Hence by the condition of causal efficacy, such ascriptions are
not warranted, since they have no in-the-head counterpart in the
causal chains leading to action.
Step 4. Hence, the causal explanations given in ordinary terms of
beliefs and desires (e.g., “She went out because she believed it was
snowing”) are technically mistaken.
My claim will be that even if pure distributed connectionism offers a
correct and (in a way) complete account of cognition, the eliminativist con-
Beyond eliminativism 197
clusion (step 4) doesn’t follow. It doesn’t follow for the simple reason that
good causal explanation in psychology is not subject to the condition of
causal efficacy. Likewise, even if pure distributed connectionism is true, it
does not follow that the stories told by symbolic AI are mere approxi-
mations. Instead, I shall argue, these various vocabularies (e.g., of folk-
psychology and of symbolic Al) are geared accurately to capture legitimate
and psychologically interesting equivalence classes, which would be invisi-
ble if we restricted ourselves to subsymbolic levels of description. In a
sense, then, I shall be offering a version of Dennett's well-known position
on folk-psychological explanation but extending it, in what seems to me
to be a very natural way, to include the constructs of symbolic Al (e.g.,
schemata, productions.) If I am right, it will follow that many defenders
of symbolic AI and folk psychology (especially Fodor and Pylyshyn) are
effectively shooting themselves in the feet. For the defences they attempt
make the condition of causal efficacy pivotal, and they try to argue for
neat, in-the-head correlates to symbolic descriptions (see, e.g., Fodor 1987;
Fodor and Pylyshyn 1988). This is accepting terms of engagement that
surely favor the eliminativist and that, as we shall see, make nonsense of a
vast number of perfectly legitimate explanatory constructs.
What we need, then, is a notion of causal explanation without causal
efficacy. | tried for such a notion in Clark, forthcoming. But a superior case
has since been made by Frank Jackson and Philip Pettit, so I begin by
drawing on their account. Jackson and Pettit ask the reader to consider the
following case. “Electrons A and B are acted on by independent forces F,
and F, respectively, and electron A then accelerates at the same rate as
electron B. The explanation of this fact is that the magnitude of the two
forces is the same.... But this sameness in magnitude is quite invisible to
A.... This sameness does not make A move off more or less briskly” (1988,
392-393). Or again, “We may explain the conductor’s annoyance at a
concert by the fact that someone coughed. What will have actually caused
the conductor’s annoyance will be the coughing of some particular person,
Fred, say” (Jackson and Pettit 1988, 394). This is a nice case. For suppose
someone, in the interests of “accuracy,” insisted that the proper (fully causal)
explanation of the conductor's annoyance was in fact Fred's coughing. There
is a good sense in which their “more accurate” explanation would in fact be
less powerful. For the explanation which uses “someone” has the advantage
of making it clear that “any of a whole range of members of the audience
coughing would have caused annoyance in the conductor” (Jackson and
Pettit 1988, 395). This increase in generality, bought at the cost of sacrificing
the citation of the actual entity implicated in the particular causal chain in
question, constitutes (I want to say) an explanatory virtue, and it legiti-
mizes a whole range of causal explanations that fail to meet the condition
198 Appendix
learned. We have clearly lost explanatory power for explaining the per-
formance of an individual network. For as we saw, the network will per-
form well with degraded information in a way that cannot be explained
by casting it as a standard symbolic AI system. But as with the cluster
analysis, we gain something else. For we can now define an even wider
equivalence class that is still, I suggest, of genuine psychological interest.
Membership of this new, wider class requires only that the system behave
in the ways in which the pure production system would behave in some
central class of cases. The production-system model would thus act as an
anchor, dictating membership of an equivalence class, just as the cluster
analysis did in the previous example. And the benefits would be the same
too. Suppose there turns out to be a lot of systems (some connectionist,
some classical, some of kinds still undreamed of) all of which nonaccidentally
approximate the behavior of the pure production system in a given range
of cases. They are all united by being able to convert text to phonemes. If
we seek some principled and informative way of grouping them together
(i.e., not a bare disjunction of systems capable of doing such and such), we
may have no choice but to appeal to their shared capacity to approximate
the behavior of such and such a paradigmatic system. We can then plot
how each system manages, in its different way, to approximate each separ-
ate production. Likewise, there may be a variety of systems (some con-
nectionist, some not) capable of supporting knowledge of prototypical
situations. The symbolic AI construct of a schema or frame may help us
understand in detail, beyond the gross behavior, what all these systems
have in common (e.g., some kind of content addressability, default assign-
ment, override capacity, and so forth). In short, we may view the constructs
of symbolic AI, not as mere approximations to the connectionist cognitive
truth, but as a means of highlighting a higher level of unity between
otherwise disparite groups of cognitive systems. Thus, the fact that a
connectionist system a and some architecturally novel system of the future
b are both able to do commonsense reasoning may be explained by saying
that the fact that a and b each approximate a classical script or frame-based
system causally programs their capacity to do commonsense reasoning.
And this means that a legitimate higher-level shared property of a and b is
invisible at the level of a subsymbolic analysis of a. This is not to say, of
course, that the subsymbolic analysis is misguided. Rather, it is to claim
that that analysis, though necessary for many purposes, does not render
higher levels of analysis defunct or of only heuristic value.
Finally, let us move on to full-fledged folk-psychological talk. On the
present analysis, such talk emerges as just one more layer in rings of ever-
more explanatory virtue. The position is beautifully illustrated by Daniel
Dennett's left-handers thought experiment. “Suppose,” Dennett says, *that
the sub-personal cognitive psychology of some people turns out to be dra-
Beyond eliminativism 201
matically different from that of others.” For example, two people may have
very different sets of connection weights mediating their conversions of
text to phonemes. More radically still, it could be that left-handed people
have one kind of cognitive architecture and right-handed people another.
For all that, Dennett points out, we would never conclude on those grounds
alone that left-handers, say, are incapable of believing.
Let left- and right-handers be as internally different as you like, we
already know that there are reliable, robust patterns in which all
behaviourally normal people participate—the patterns we traditionally
describe in terms of belief and desire and the other terms of folk
psychology. What spread around the world on July 20th, 1969? The
belief that a man had stepped on the moon. In no two people was the
effect of the receipt of that information the same ..., but the claim
that therefore they all had nothing in common ... is false, and obvi-
ously so. There are indefinitely many ways one could reliably distin-
guish those with the belief from those without it. (Dennett 1987, 235)
In other words, even if there is no single internal state (say, a sentence in
the language of thought) common to all those who are said to believe that
so and so, it does not follow that belief is an explanatorily empty construct.
The sameness of the forces acting on the two electrons is itself causally
inefficacious but nonetheless figures in a useful and irreducible mode of
explanation (program explanation), which highlights facts about the range
of actual forces that can produce a certain result (identical acceleration). Just
so the posit of the shared belief highlights facts about a range of internal
cognitive constitutions that have some common implications at the level of
gross behavior. This grouping of apparently disparite physical mechanisms
into classes that reflect our particular interests is at the very heart of the
scientific endeavor. To suppose that the terms and constructs proper to
such program explanations are somehow inferior or dispensable is to
embrace a picture of science as an endless and disjoint investigation of
individual causal mechanisms.
There is, of course, a genuine question about what constructs best serve
our needs. The belief construct must earn its keep by grouping together
creatures whose gross behaviors really do have something important in
common (e.g., all those likely to harm me because they believe I am a
predator). In recognizing the value and status of program explanations I am
emphatically not allowing that anything goes. My goal is simply to counter
the unrealistic and counterproductive austerity of a model of explanation
that limits “real explanations” to those that cite causally efficacious features.
The eliminativist argument, it seems, depends crucially on a kind of austerity
that the explanatory economy can ill afford.
202 Appendix
'
‘
}
204 Appendix
involve not just practice but very carefully chosen practice, practice aimed
at some aspect of her swing that she feels is the root of the trouble (say,
wrist control). For the expert golfer, having a higher-level articulation of
the swing into wrist, arm, and leg components is an essential aid to im-
proving and debugging performance.
What's true of golf is often true of life in general, and the present case is
no exception. The really expert cognizer, I suggest, builds for itself various
higher-level representations of its own lower-level (pure distributed con-
nectionist) reasoning. These representations (which must group activation
vectors into rough classes according to their various roles in producing
behavior) provide the key to efficient improvement and debugging. The
studies by Karmiloff-Smith referred to above provide evidence that just
such a process of higher-level representation occurs in children. Thus, she
shows how various aspects of linguistic competence seem to arise in three
phases. The first phase involves the child’s attainment of basic behavioral
success. The child learns to produce the required linguistic forms. Internally
driven by a goal of control over the organization of internal representa-
tions, the child then goes on to establish a structured description of her
own basic processing (phase two). “The initial operation of phase two is to
re-describe the phase one representations in a form which allows for (albeit
totally unconscious) access” (Karmiloff-Smith 1986, 107). This added cog-
nitve load, however, may cause new errors, which are only corrected once
a balance of phase 1 (procedural competence) and phase 2 (redescriptive
competence) is achieved. This balancing act constitutes phase 3. The gen-
eral message (backed by a wealth of data in Karmiloff-Smith 1985, 1986,
1987) is that the child (and also the adult, for this is a general learning
pattern) first learns to produce correct outputs, and then yields to endo-
genous pressure to form representations of the form of the processing that
yields the outputs. These higher-level representations are much closer to
classical discrete symbol structures than to highly distributed connectionist
representations.
Perhaps, then, the pure distributed connectionist underrates the value
of the discrete symbol structures used in natural language and classical
AI. Such structures, according to the radical connectionist, serve at best
two roles, to mediate interpersonal communication and to help the novice
acquire rudimentary skills in a domain. I have argued, in contrast, that such
symbol structures may also be a vital aid to the expert by providing an artic-
ulated model of her own problem-solving strategies. This model, though
lacking the fluidity and power provided by the connectionist substratum,
may be an indispensable aid to debugging one’s own performance.
Finally, I wish to consider an internal correlate to the role of interper-
sonal communication. Just as the gross symbol structures of public lan-
guage facilitate communication between whole agents who may have very
206 Appendix
6 A Double Error
The eliminativist, Ihave argued, makes a double error. First, her conditional
argument is flawed. Even if pure distributed connectionism were a complete,
formal account of individual processing, it would not follow that the higher-
level constructs of symbolic AI and folk psychology were inaccurate, mis-
guided, or dispensable. Instead, such constructs may be the essential and
accurate grouping principles for explanations which prescind from causal
process to causal program. The eliminativist’s conditional argument was
seen to rest on an insupportable condition of causal efficacy, a condition
that, if applied, would rob us of a whole range of perfectly intelligible and
legitimate explanations both in cognitive science and in daily life.
Second, the antecedent of the eliminativist’s conditional is itself called
into doubt by the power and usefulness of higher-level articulations of
processing. Such articulations, I argued, may do in the first person what
program explanations do in the third. That is, they may enable the system
to observe and mark the common contributions of a variety of activation
vectors. Such marking would provide the system with the resources to
reflect on (and hence improve and debug) its own basic processing strate-
gies. In this way, entities that the pure distributed connectionist views as
emergent at a higher level of external analysis (high-level clusterings, types,
208 Appendix
Chapter 1
i. On the pro side, see Pylyshyn 1986. On the con side, Searle 1980 and Dreyfus 1981,
both discussed at length in chapter 2. For rare philosophical treatments sensitive to
alternative computational approaches, see Boden 1984b; Davies, forthcoming; Hofstadter
1985; and, somewhat surprisingly, Dreyfus and Dreyfus 1986.
. See, for example, Marr and Poggio 1976; Hinton and Anderson 1981; and McClelland,
Rumelhart, and the PDP Research Group 1986, as well as almost any work in computa-
tional linguistics.
. For thorough summaries see Haugeland 1985, Hodges 1983, Turing 1937, and Turing
1950.
. In fact, the idea of list processing (in which data structures contain symbols that point to
other data structures, which likewise contain symbols that point to other data structures
and so on, thus facilitating the easy association of information with symbols) was intro-
duced by Allan Newell and Herbert Simon in their program Logic Theorist of 1956.
For a detailed treatment of list processing see E. Charniak and D. McDermott 1985,
chapter 2, and also A. Newell and H. Simon 1976. One attractive and important feature
of LISP was its universal function eval, which made it as adaptable as a universal Turing
machine, barring constraints of actual memory space in any implementation.
. A production system is essentially a set of “if-then” pairs in which the “if” specifies a
condition that, if satisfied, causes an action (the “then”) to be performed. Each “if-then”
pair or condition-action rule is called a production (see, for example, Charniak and
McDermott 1985, 438-39).
. The AM program, as Lenat himself later pointed out, worked partly due to the amount
of mathematical knowledge implicit in LISP, the language in which it was written (see
Lenat 1983a; Lenat 1983b; Lenat and Brown 1984; Ritchie and Hanna 1984). This result
seems related to some observations I make later concerning the amount of the work of
scientific discovery already done by giving BACON data arranged in our representa-
tional notation. Lenat’s later work, EURISKO, avoids the “defect” of trading on the
quasi-mathematical nature of LISP. But it too relies on our giving it notationally pre-
digested data. 1 do not see this as a defect: human babies inherit a rather impressive
notation in the form of a well-structured public language. I still affirm the comments in
chapter 10 that we need the right substructure below the public notation when instantia-
tion, not just psychological explanation, is the issue.
ibs I am especially indebted to Martin Davies for many fruitful conversations on the topic.
Chapter 2
i. Thanks to Lesley Benjamin for spotting the shampoo example and for some stimulating
conversation about the problems involved in parsing it.
210 Notes to pages 28-103
. See, e.g., Torrance 1984, 23. In fairness to Dreyfus, he goes on to say that his point is
not that humans can recognize subtleties currently beyond the reach of simple programs
but that “in any area there are simple taken-for-granted responses central to human
understanding, lacking which a computer program cannot be said to have any under-
standing at all” (Dreyfus 1981, 190). As will become apparent, I agree with this but
believe the real point concerns a degree of flexibility of behavior, which cannot be
modeled by the SPSS approach. This concerns not exactly what is and is not known but
rather how it is known. And this fits better with the second line of Dreyfus’s argument
than the first.
. These replies are well known (see Searle 1980 and, e.g., Torrance 1984). I do not think
that the discussion here has been very illuminating, and I therefore choose to ignore it
for present purposes.
_ Lam indebted to Aaron Sloman for teaching me to put this point in terms of structural
variability.
. This was pointed out to me by Michael Morris in conversation.
Chapter 3
1 At a minimum, the eliminative materialist must believe that this is the primary point
of the practice. In conversation Paul Churchland has accepted that folk-psychological
talk serves a variety of other purposes also, e.g., to praise, to blame, to encourage, and
so on. But, he rightly says, so did witch talk. That in itself is not sufficient to save the
integrity of witch talk. I agree. The primary point of witch talk, however, was to pick
out witches. In this it failed, since there were none (we believe). Comparably, Church-
land holds that the primary purpose of folk-psychological talk is to fix on neurophys-
iologically sound states of the head so as to facilitate the prediction and explanation of
others’ behavior or (more properly) their bodily movements. | deny that this is the
primary purpose.
. Lam especially grateful for the simple but suggestive picture of mental-state ascriptions
that exploit the cheap and available resource of talk about the external world.
. As used by (among others) Aaron Sloman, Barry Smith, and Steve Torrance.
Chapter 5
Zi, Aaron Sloman has fought hard to convince me of this. I thank him for his persistent
caution.
. See especially pages 27-31.
. This example was originally developed in McClelland 1981 and subsequently redeployed
in McClelland, Rumelhart, and the PDP Research Group 1986. My presentation is based
on the 1986 version.
. Philosophically, this talk of representation is lax and misleading. I, like many others, use
it because it is useful and concise. What I mean by representation here and in similar
occurrences elsewhere is just the part of a system’s internal structure that is particularly
implicated in an environmentally embedded system's capacity to behave in ways that
warrant talk of that system’s having a representation of whatever is in question. This is
quite a mouthful, as you can see. And the notion of what is “particularly implicated” in
the relevant behavior is intuitively clear but hard to make precise.
. Fora full account of how such learning rules work (e.g., the generalized delta rule and the
Boltzmann learning rule), see McClelland, Rumelhart, and the PDP Research Group
1986, vol. 1, chapters 7 and 8. $=
?
Notes to pages 111-155 = 211
Chapter 6
1. A related issue here concerns our capacity to change verbs into nouns and vice versa.
Thus newly coined uses like “She wants to thatcher the organization” or “Don't crack
wise with me” are easily understood. This might be partially explained by supposing
that the verb/noun distinction is (at best) one microfeature among many and that other
factors, like position in the sentence, can force a change in this feature assignment while
leaving much of the semantics intact. (This phenomenon is dealt with at length in
Benjamin, forthcoming).
. Many of the ideas in this section (including the locution “an equivalence-class of algo-
rithms”) are developed out of conversations with Barry Smith. He is not to blame, of
course, for the particular views I advance.
Chapter 7
. See, e.g., Hinton 1984.
Nv. Lecture given to the British Psychological Society, April 1987.
. This chapter owes much to Hofstadter (1985), whose suggestive comments have doubt-
less shaped my thought in more ways than I am aware. The main difference, I suspect, is
that I am kinder to the classical symbolic accounts.
Chapter 8
iB As ever, this kind of claim must be read carefully to avoid Church-Turing objections. A
task will count as beyond the explanatory reach of a PDP model just in case its perfor-
mance requires that to carry it out, the PDP system must simulate a different kind of
processing (e.g., that of a Von Neumann machine).
2. He allows that intentional realism can be upheld without accepting the LOT story (see
Fodor 1987, 137). But as I suggest in section 3, he does seem to believe that some form of
physical causation is necessary for the truth of intentional realism. This, I argue, is a very
dangerous assumption to make.
3. A parsing tree is a data structure that splits a sentence up into its parts and associates
those parts with grammatical categories. For example, a parsing tree would separate the
sentence “Fodor exists” into two components “Fodor” and “exists” and associate a
grammatical label with each. These labels can become highly complex. But the standard
simple illustration is
S
gine
NP VP
Fodor exists
Chapter 9
ip The actual structure of the model is complicated in various ways not germane to present
concerns. See Rumelhart and McClelland 1986 for a full account.
NO . A trivial model would be one that merely used a PDP substrate to implement a conven-
tional theory. But there are complications here; see section 5.
. This example is mentioned in Davies, forthcoming, 19.
. Thanks to Martin Davies for suggestive conversations concerning these issues.
. Lowe this suggestion to Jim Hunter.
. This point was made in conversation by C. Peacocke.
W
BP
NW
NO. For example, Sacks reports the case of Dr. P., a music teacher who, having lost the holistic
ability to recognize faces, makes do by recognizing distinctive facial features and using
these to identify individuals. Sacks comments that the processing these patients have
intact is machinelike, by which he means like a conventional computer model. As he puts
it:
Classical neurology ... has always been mechanical.... Of course, the brain is a
machine and a computer..., but our mental processes which constitute our being and
our life, are not just abstract and mechanical [they] involve not just classifying and
categorising, but continual judging and feeling also. If this is missing, we become
computer-like, as Dr. P. was.... By a sort of comic and awful analogy, our current
cognitive neurology and psychology resembles nothing so much as poor Dr. P.! (Sacks
1986, 18-19)
Sacks admonishes cognitive science for being “too abstract and computational.” But he
might as well have said “too rigid, rule-bound, coarse-grained, and serial.”
Chapter 10
Le This is not to say that the philosophers who raised the worries will agree that they are
best localized in the way I go on to suggest. They won't.
Epilogue
1. The story is inspired by two sources: the Gould and Lewontin critique of adaptationist
thinking, reported in chapter 4, and Douglas Hofstadter’s brief comments on operating
systems (1985, 641-642).
“
Bibliography
Adams, D. 1985. So long and Thanks for All the Fish. London: Pan.
Andler, D. 1988. Representations in cognitive science: Beyond the Pro and the Con. CREA
research paper, Paris.
Armstrong, D. 1970. The nature of mind. Reprinted in N. Block, ed., Readings in Philosophy
of Psychology, vol. 1, pp. 191-199. London: Methuen and Co., 1980.
Baddeley, R. 1987. Connectionism and gestalt theory. Unpublished manuscript. University
of Sussex.
Baron-Cohen, S., Leslie, A., and Frith, U. 1985. Does the autistic child have a “theory of
mind’? Cognition 21: 37—46.
Benjamin, L. Unpublished. How nouns can verb. Draft research paper. University of Sussex.
Block, N. 1980. Troubles with functionalism. In N. Block, ed., Readings in Philosophy of
Psychology, vol. 1, pp. 268-305. London: Methuen and Co.
Bobrow, D., and Winograd, T. 1977. An overview of KRL, a knowledge representation
Language. Cognitive Science 1: 3—46.
Boden, M. 1984a. Animal perception from an AI viewpoint. In C. Hookway, ed., Minds,
Machines and Evolution. Cambridge: Cambridge University Press.
Boden, M. 1984b. What is computational psychology? Proceedings of the Aristotelian Society,
suppl. 58: 17-35.
Brady, M., Hollenbach, J., Johnson, T., Lozano-Perez, T., and Mason, M., eds. 1983. Robot
Motion: Planning and Control. Cambridge: MIT Press.
Broadbent, D. 1985. A question of levels: Comment on McClelland and Rummelhart. Journal
of Experimental Psychology: General 114: 189-192.
Charniak, E., and McDermott, D. 1985. Introduction to Artificial Intelligence. Reading, Mass.:
Addison-Wesley.
Churchland, P. 1979. Scientific Realism. and the Plasticity of Mind. Cambridge: Cambridge
University Press.
Churchland, P. 1981. Eliminative materialism and the propositional attitudes. Journal of
Philosophy 78, no. 2: 67-90.
Churchland, P. 1986. Neurophilosophy: Towards a Unified Theory of the Mind-Brain. Cam-
bridge: MIT Press.
Churchland, P. 1989. On the nature of theories: A neurocomputational perspective. In P. M.
Churchland, The Neurocomputational Perspective. Cambridge: MIT Press.
Churchland, P., and Churchland, P. 1978. Commentary on cognition and consciousness in
non-human species. Behavioural and Brain Sciences 4: 565—560.
Churchland, P., and Churchland, P. 1981. Functionalism, qualia, and intentionality. In J. Biro
and R. Shahan, eds. Mind, Brain, and Function. Oklahoma: University of Oklahoma
Press, 1982.
Chomsky, N., and Katz, J. 1974. What the linguist is talking about. In N. Block, ed., Readings
in Philosophy of Psychology, vol. 2, 1980. pp. 223-237. London: Methuen and Co.
214 Bibliography
Fodor, J. 1980b. Some notes on what linguistics is about. In N. Block, ed., Readings in
Philosophy of Psychology, vol. 2, pp. 197-207. London: Methuen and Co.
Fodor, J. 1985. Fodor's guide to mental representations: The intelligent Auntie’s vade-
mecum. Mind 94: 77-100.
Fodor, J. 1986. Individualism and supervenience. Proceedings of the Aristotelian Society, Suppl.
60: 235-263.
Fodor, J. 1987. Psychosemantics: The Problem of Meaning in the Philosophy of Mind Cambridge:
MIT Press.
Fodor, J., and Pylyshyn, Z. 1988. Connectionism and cognitive architecture: A critical
analysis. Cognition 28: 3—71.
Gould, S., and Lewontin R. 1978. The spandrels of San Marco and the Panglossian Paradigm:
A critique of the adaptationist programme. Reprinted in E. Sober, ed., Conceptual Issues
in Evolutionary Biology, Cambridge: MIT Press, 1984.
Hallam, J., and Mellish, C., eds. 1987. Advances in Artificial Intelligence Chichester: Wiley and
Sons.
Harcourt, A. 1985. All’s fair in play and politics. New Scientist, no. 1486, December.
Harris, M., and Coltheart, M. 1986. Language Processing in Children and Adults. London:
Routledge and Kegan Paul.
Haugeland, J. 1981. The nature and plausibility of cognitivism. In J. Haugeland, ed., Mind
Design, pp. 243-281. Cambridge: MIT Press.
Haugeland, J. 1985. Artificial Intelligence: The Very Idea Cambridge: MIT Press.
Hayes, P. 1979. The naive physics manifesto. In D. Michie, ed., Expert Systems in the
Micro-Electronic Age. Edinburgh: Edinburgh University Press.
Hayes, P. 1985a. The second naive physics manifesto. In J. Hobbs and R. Moore, eds.,
Formal Theories of the Commonsense World, pp. 1-36. Norwood, N.J.: Ablex.
Hayes, P. 1985b. Naive physics I: Ontology for liquids. In J. Hobbs and R. Moore, eds.,
Formal Theories of the Commonsense World Norwood, N,J.: Ablex:
Hebb, D. 1949. The Organization of Behavior. New York: Wiley and Sons.
Hinton, G. 1984. Parallel computations for controlling an arm. Journal of Motor Behavior 16:
171-194.
Hinton, G., and Anderson, J. 1981, eds. Parallel Models of Associative Memory. Hillsdale, N.J.:
Erlbaum.
Hobbs, J. 1985. Introduction to J. R. Hobbs and R. Moore, eds., Formal Theories of the
Commonsense World, pp. xi—xxii. Norwood, NJJ.: Ablex.
Hobbs, J., and Moore, R. 1985. eds. Formal Theories of the Commonsense World. Norwood,
NJ.: Ablex.
Hodges, A. 1983. Alan Turing: The Enigma. New York: Simon and Schuster.
Hofstadter, D. 1985. Waking up from the Boolean dream, or, Subcognition as computation.
In his Metamagical Themas: Questing for the Essence of Mind and Pattern, pp. 631-665.
Harmondsworth: Penguin.
Hornsby, J. 1986. Physicalist thinking and behaviour. In P. Pettit and J. McDowell, eds.,
Subject, Thought, and Context. Oxford: Oxford University Press.
Hull, D. 1984. Historical entities and historical narratives. In C. Hookway, ed., Minds,
Machines and Evolution. Cambridge: Cambridge University Press.
Humphreys, N. 1983. Nature's psychologists. Consciousness Regained. New York: Oxford
University Press.
Israel, D. 1985. A short companion to the naive physics manifesto. In J. Hobbs and
R. Moore, eds., Formal Theories of the Commonsense World, pp. 427-447. Norwood,
NJJ.: Ablex.
Jacob, F. 1977. Evolution and tinkering. Science 196, no. 4295: pp. 1161-1166.
Jackson, F., and Pettit, P. 1988. Functionalism and Broad Content. Mind 97, no. 387:
381—400.
216 _ Bibliography
Kahneman, D., Slovic, P., and Tversky, A., eds. 1982. Judgement under Uncertainty: Heuristics
and Biases. Cambridge: Cambridge University Press.
Karmiloff-Smith, A. 1984. Children’s problem solving. In M.°E. Lamb, A. L. Brown, and
B. Rogoff, eds. Advances in Developmental Psychology, vol. 3, pp. 39-90. Hillsdale, NJ.:
Erlbaum.
Karmiloff-Smith, A. 1985. Language and cognitive processes from a developmental perspec-
tive. Language and Cognitive Processes 1, no. 1: p. 61—85.
Karmiloff-Smith, A. 1986. From metaprocesses to conscious access: Evidence from children’s
metalinguistic and repair data. Cognition 23: 95-147.
Karmiloff-Smith, A. 1987. Beyond modularity: A developmental perspective on human
consciousness. Draft manuscript of a talk given at the annual meeting of the British
Psychological Society, Sussex, April.
Katz, J. 1964. Mentalism in linguistics. Language 40: 124-137.
Krellenstein, M. 1987. A reply to parallel computation and the mind-body problem. Cogni-
tive Sciences 11: 155-157.
Knuth, D. 1973. Sorting and Searching. Reading, Mass.: Addison-Wesley.
Kohler, W. 1929. Gestalt Psychology. New York: Liveright.
Kuczaj, S. A. 1977. The acquisition of regular and irregular past tense forms. Journal of Verbal
Learning and Verbal Behaviour 16: 589—600.
Lakatos, I. 1974. Falsification and the methodology of scientific research programmes. In
I. Laktos and A. Musgrave, eds., Criticism and the Growth of Knowledge. Cambridge:
Cambridge University Press.
Langley, P. 1979. Rediscovering physics with BACON 3. Proceedings of the Sixth International
Joint Conference on Artificial Intelligence 1: 505-508.
Langley, P., Simon, H., Bradshaw, G., and Zytkow, J. 1987. Scientific Discovery: Computational
Explorations of the Creative Process. Cambridge: MIT Press.
Lenat, D. 1977. The ubiquity of discovery. Proceedings of the fifth International Joint Conference
on Artificial Intelligence 2: 1093-1105.
Lenat, D. 1983a. Theory formation by heuristic search. Artificial Intelligence 21: 31-59.
Lenat, D. 1983b. EURISKO: A program that learns new heuristics and domain concepts.
Artificial Intelligence 21: 61-98.
Levi-Strauss, C. 1962. The Savage Mind. London: Weidenfeld and Nicolson.
Lieberman, P. 1984. The Biology and Evolution of Language. Cambridge: Harvard University
Press.
Lycan, W. 1981. Form, function, and feel. Journal of Philosophy 78, no. 1: 24—50.
Maloney, J. 1987. The right stuff. Synthese 70: 349-372.
McClelland, J. 1981. Retrieving general and specific knowledge from stored knowledge of
specifics. Proceedings of the Third Annual Conference of The Cognitive Science Society
(Berkeley) 170-172.
McClelland, J. 1986. The programmable blackboard model of reading. In J. McClelland,
D. Rumelhart, and the PDP Research Group, Parallel Distributed Processing: Explorations
in the Microstructure of Cognition, vol. 2 pp. 122-169. Cambridge: MIT Press.
McClleland, J., and Kawamoto, A. 1986. Mechanisms of sentence processing: Assigning
roles to constituents of sentences. In J. McClelland, D. Rumelhart, and the PDP
Research Group, Parallel Distributed Processing: Explorations in the Microstructure of
Cognition, vol. 2, pp. 216-271. Cambridge: MIT Press.
McClelland, J., and Rumelhart, D. 1985a. Distributed memory and the representation of
general and specific information. Journal of Experimental Psychology: General 114, no. 2:
159-188.
McClelland, J., and Rumelhart, D. 1985b. Levels indeed! A response to Broadbent. Journal of
Experimental Psychology: General 114, no. 2: 193-197.
Bibliography 217
McClelland, J., and Rumelhart, D. 1986. Amnesia and distributed memory. In J. McClelland,
D. Rumelhart, and the PDP Research Group, Parallel Distributed Processing: Explorations
in the Microstructure of Cognition, vol. 2, p. 503-529. Cambridge: MIT Press.
McClelland, J., Rumelhart, D., and Hinton, G., 1986. The appeal of PDP. In Rumelhart,
McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in
the Microstructure of Cognition, vol. 1, pp. 3-44. Cambridge: MIT Press.
McClelland, J., and Rumelhart, D., and the PDP Research Group, Parallel Distributed Pro-
cessing: Explorations in the Microstructure of Cognition, vol. 2 Cambridge: MIT Press.
McCulloch, G. 1986. Scientism, mind, and meaning. In P. Pettit and J. McDowell, eds.,
Subject, Thought, and Context. Oxford: Oxford University Press. 1986.
McCulloch, W., and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous
activity. Bulletin of Mathematical Biophysics 5: 115-133.
McDermott, D. 1976. Artificial intelligence meets natural stupidity. In J. Haugeland, ed.,
Mind Design Cambridge: MIT Press, 1981.
McGinn, C. 1982. The structure of content. In A. Woodfield, ed., Thought and Object,
pp. 207—259. Oxford: Oxford University Press.
Marr, D. 1977. Artificial intelligence: A personal view. In J. Haugeland, ed., Mind Design,
p. 129-142. Cambridge: MIT Press, 1981.
Marr, D. 1982. Vision. New York: W. H. Freeman and Co.
Marr, D., and Poggio, T. 1976. Cooperative computation of stereo disparity. Science 194:
283-287.
Michaels, C., and Carello, C. 1981. Direct Perception. Englewood Cliffs, N.J.: Prentice-Hall.
Michie, D., and Johnston, R. 1984. The Creative Computer. Harmondsworth: Penguin.
Millikan, R. 1986. Thoughts without laws, cognitive science with content. Philosophical
Review. 95: 47-80.
Minsky, M. 1974. A framework for representing knowledge. MIT lab memo 306. Cam-
bridge, Mass. Excerpts in J. Haugeland, ed., Mind Design (Cambridge: MIT Press,
1981).
Minsky, M., 1980. K-lines: A theory of memory. Cognitive Science 4: 117-133.
Minsky, M., and Papert, S. 1969. Perceptrons. Cambridge: MIT Press.
Newell, A. 1980. Physical symbol systems.. Cognitive Science 4: 135-183.
Newell, A., and Simon, H. 1976. Computer science as empirical inquiry. In J. Haugeland, ed.,
Mind Design. Cambridge: MIT Press.
Norman, D. 1986. Reflections on cognition and parallel distributed processing. In J. Mc-
Clelland, D. Rumelhart, and the PDP Research Group, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, vol. 2, pp. 110-146. Cambridge: MIT
Press.
Pettit, P., and McDowell, J., eds., 1986. Subject, Thought and Context. Oxford: Oxford
University Press.
Pinker, S. 1984. Language Learnability and Language Development. Cambridge: Harvard Uni-
versity Press.
Pinker, S., and Prince. A. 1988. On language and connectionism: Analysis of a parallel dis-
tributed processing model of language acquisition. Cognition 28: 73-193.
Poggio, T., and Koch, C. 1987. Synapses that compute motion. Scientific American, May,
pp. 42-48.
Premack, D., and Woodruff, G. 1978. Does the chimpanzee have a theory of mind? Be-
havioural and Brain Science 4: 515-526.
Putnam, H. 1960. Minds and machines. In S. Hook, ed., Dimensions of Mind. New York: New
York University Press.
Putnam, H. 1967. Psychological Predicates. In W. Capitan and D. Merill, eds., Art, Mind, and
Religion, pp. 37-48. University of Pittsburgh Press.
218 Bibliography
and Reality,
Putnam, H. 1975a. The meaning of “meaning.” In H. Putnam, Mind, Language,
pp. 215-271. Cambridge: Cambridge University Press.
and
Putnam, H. 1975b. Philosophy and our mental life. In H. Putnam, Mind, Language
Reality, pp. 291-303. Cambridge: Cambridge University Press.
Mind
Putnam, H. 1981. Reductionism and the nature of psychology. In J. Haugeland, ed.,
Design, pp. 205-219. Cambridge: MIT Press.
Pylyshyn, Z. 1986. Computation and Cognition. Cambridge: MIT Press.
Ridley, M. 1985. The Problems of Evolution. Oxford: Oxford University Press.
Ritchie, G., and Hanna, F. 1984. AM: A case study in AI methodology. Artificial Intelligence
23: 249-268.
Robbins, A. Unpublished. Representing type and category in PLP. Draft doctoral disserta-
tion. University of Sussex.
Rosenblatt, F. 1962. Principles of Neurodynamics. New York: Spartan Books.
Rumelhart, Hinton, G., and Williams, R. Learning internal representations by error propaga-
tion, In Rumelhart, McClelland, and the PDP Research Group, Parallel Distributed Pro-
cessing: Explorations in the Microstructure of Cognition, vol. 1, pp. 318-362. Cambridge:
MIT Press.
Rumelhart, D., and McClelland, J. 1986. On learning the past tenses of English verbs. In
J. McClelland, D. Rumelhart, and the PDP Research Group, Parallel Distributed Process-
ing: Explorations in the Microstructure of Cognition, vol. 2, pp. 216-271. Cambridge:
MIT Press.
Rumelhart, D., and McClelland, J. 1986. PDP models and general issues in cognitive science.
In D. Rumelhart, J. McClelland, and the PDP Research Group, Parallel Distributed Pro-
cessing: Explorations in the Microstructure of Cognition, vol. 1, pp. 110-146. Cambridge:
MIT Press.
Rumelhart, D., Mclelland, J., and the PDP Research Group, 1986. Parallel Distributed Pro-
cessing: Explorations in the Microstructure of Cognition, vol. 1, Cambridge: MIT Press.
Rumelhart, D., and Norman, D. 1982. Simulating a skilled typist: A study in skilled motor
performance. Cognitive Science 6: 1-36.
Rumelhart, D., Smolensky, P., McClelland, J., and Hinton, G. 1986. Schemata and sequential
thought processes in PDP models. In J. McClelland, D. Rumelhart, and the PDP Re-
search Group, Parallel Distributed Processing: Explorations in the Microstructure of Cogni-
tion, vol. 2, pp. 7-58. Cambridge: MIT Press.
Rutkowska, J. 1984. Explaining infant perception: Insights from artificial intelligence. Cogni-
tive studies research paper 005. University of Sussex.
Rutkowska, J. 1986. Developmental psychology’s contribution to cognitive science.
In KS. Gill, ed. Artificial Intelligence for Society, pp. 79-97. Chichester, Sussex: John
Wiley.
Ryle, G. 1949. The Concept of Mind. London: Hutchinson.
Sacks, O. 1986. The Man Who Mistook His Wife for a Hat. London: Picador.
Schank, R., and Abelson, R. 1977. Scripts, Plans, Goals, and Understanding. Hillsdale, N|J.:
Lawrence Erlbaum Associates.
Schilcher, C., and Tennant, N. 1984. Philosophy, Evolution, and Human Nature. London:
Routledge and Kegan Paul.
Schreter, Z., and Maurer, R. 1986. Sensorimotor spatial learning in connectionist artificial
organisms. Research abstract FPSE. University of Geneva.
Searle, J.1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cambridge
University Press. i
Searle, J. 1980. Minds, brains, and programs. Reprinted in J. Haugeland, ed., Mind Design,
pp. 282-307. Cambridge: MIT Press, 1981. $
Searle, J. 1983. Intentionality. Cambridge: Cambridge University Press. :
Bibliography 219
Searle, J. 1984. Intentionality and its place in nature. Synthese 61: 3—16.
Sejnowski, T., and Rosenberg, C. 1986. NETtalk: A parallel network that learns to read
aloud. John Hopkins University Technical Report JHU/EEC-86/01.
Shortlife, E. 1976. Computer Based Medical Consultations: MYCIN New York: Elsevier.
Simon, H. 1962. The architecture of complexity. Reprinted in H. Simon, ed., The Sciences of
the Artificial. Cambridge: Cambridge University Press, 1969.
Simon, H. 1979. Artificial intelligence research strategies in the light of AI models of
scientific discovery. Proceedings of the Sixth International Joint Conference on Artificial
Intelligence 2: 1086—1094.
Simon, H. 1980. Cognitive science: The newest science of the artificial. Cognitive Science 4,
no. 2: 33—46.
Simon, H. 1987. A psychological theory of scientific discovery. Paper presented at the
annual conference of the British Psychological Society. University of Sussex.
Sloman, A. 1984. The structure of the space of possible minds. In S. Torrance, ed., The Mind
and The Machine. Sussex: Ellis Horwood.
Smart, J. 1959. Sensations and brain processes. Philosophical Review 68: 141-156.
Smith, M. 1974. The evolution of animal intelligence. In C. Hookway, ed., Minds, Machines
and Evolution. Cambridge: Cambridge University Press.
Smolensky, P. 1986. Information processing in dynamical systems: Foundations of harmony
theory. In D. Rumelhart, J. McClelland, and the PDP Research Group. Parallel Distri-
buted Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 194-281.
Cambridge: MIT Press.
Smolensky, P. 1987. Connectionist AI, and the brain. Artificial Intelligence Review 1: 95-109.
Smolensky, P. 1988 On the proper treatment of connectionism. Behavioural and Brain
Sciences 11: 1-74.
Sterelny, K. 1985. Review of Stich From Folk Psychology to Cognitive Science. Australasian
Journal of Philosophy 63, no. 4: 510-520.
Stich, S. 1971. What every speaker knows. Philosophical Review 80: 476-496.
Stich, S. 1972. Grammar, psychology, and indeterminacy. Reprinted in N. Block, ed., Read-
ings in Philosophy of Psychology, vol. 2, pp. 208-222. London: Methuen and Co., 1980.
Stich, S. 1983. From Folk Psychology to Cognitive Science. Cambridge: MIT Press.
Tannenbaum, A. 1976. Structured Computer Organization. Englewood Cliffs, N.J.: Prentice-
Hall.
Tennant, N. 1984a. Intentionality, syntactic structure, and the evolution of language. In
C. Hookway, ed., Minds, Machines, and Evolution. Cambridge: Cambridge University
Press.
Tennant, N., and Schilcher, C. 1984. Philosophy, Evolution, and Human Nature. London:
Routledge and Kegan Paul.
Tennant, N. 1987. Philosophy and biology: Mutual enrichment or one-sided encroachment.
La nuova critica 1-2: 39-55.
Thagard, P. 1986. Parallel computation and the mind-body problem. Cognitive Science 10:
301-318.
Torrance, S. 1984. Philosophy and AI: Some issues. Introduction to S. Torrance, ed., The
Mind and the Machine, pp. 11-28. Sussex: Ellis Horwood.
Turing, A. 1937. On computable numbers with an application to the Entscheidungs prob-
lem. Proceedings of the London Mathematical Society 42: 230-265.
Turing, A. 1950. Computing machinery and intelligence. Mind 59: 433-460.
Van Fraasen, B. 1980. The Scientific Image. Oxford: Oxford University Press.
Vogel, S. 1981. Behaviour and the physical world of an animal. In P. Bateson and P. Klopfer,
eds., Perspectives in Ethology, vol. 4. New York: Plenum Press.
Walker, S. 1983. Animal Thought. London: Routledge and Kegan Paul.
220 _—Bibliography
Warrington, C., and McCarthy, R., 1987. Categories of knowledge: Further fractionations
and an attempted integration. Brain 110: 1273-1296.
Winograd, T. 1972. Understanding natural language. Cognitive Psychology 1: 1-191.
Winston, P. 1975. The Psychology of Computer Vision. New York: McGraw Hill.
Wittgenstein, L. 1969. On Certainty. Oxford: Blackwell.
Woodfield, A., ed. 1982. Thought and object. Oxford: Oxford University Press.
.
‘
“ye
Index
Evolutionarily basic tasks, 72-73, Hofstadter, D., 13, 15, 123, 174
104-105, 114 Holism
Excitatory links, 86. See also Connectionism of belief and desire, 5, 48-50, 57-58,
Exclusive or, 102—103 145, 147
Expert skill, 204—205 of evolutionary processes, 66-72
Explanation in PDP storage, 107-111, 120-121
and generalizations, 196—202 Horizontally limited microworlds. See
Marr's levels of, 18 Microworlds
varieties of, 15, 17, 128, 171, 173, Hornsby, J., 44, 53
196-202 Humphreys, N., 51
Explicit rules. See Rules
Identity theory, 22—23
Feedback loops, 66 Implementation theory, 124, 129, 144,
Flatfish, 71-72 170-172
Fodor, J., 1, 17, 19-20, 48, 143-152, 156, Informational holism, 107—121
160, 197, 203 Information processing, 29, 64
Folk psychology. See also Eliminative Information-processing warfare, 62—63
materialism Inhibitory links, 86
and broad content, 42—46, 179 Insight, flash of, 15, 139-141
criticisms of, 39—42 Intentional realism, 160
and folk physics, 52 Intuitive processor, 14, 137-139. See also
holism of, 48—50 (see also Holism) Connectionism; Subsymbolic paradigm
innateness of, 51-52 Israel, D., 158
as a level of description of connectionist
systems, 195-196 Jackson, F., 197-198
as a theory, 38-39 Jacob, F., 70
Formal properties, 10, 31-34 Jets and Sharks model, 86-92
Frame-based reasoning, 26—30, 92—95, 200
Function, sameness of, 129 Karmiloff-Smith, A., 165, 172, 204—205
Functionalism, 21-24, 36, 58—59. See also Katz, J., 155-156
Microfunctionalism Kawamoto, A., 108-111
Kings College chapel, 77
Generalizations, 92, 181-182, 196—202 K-line theory, 207
Generalized dela rule. See Delta rule Kludge, 69—72, 134
Gestalt theory, 84 Knowledge-representation language, 29
Gould, S., 76-80 Koch, C., 183
Graceful degradation, 62, 89—90, 95 Krellenstein, M., 121, 129
Gradualistic holism, 68—70 Kuczaj, S., 162
Grammar, 21, 154—157, 162—166
Guesses, 62 Lakatos, I., 40
Langley, P., 14, 16
Hallam, D., 158 Language-of-thought hypothesis, 146, 148,
Harcourt, A., 51 201. See also Representations;
Hayes, P., 52, 158-159 Cognitivism
Heuristic search, 13 Learning, 165—166. See also Connectionism
Hidden units, 102-103. See also Lewontin, R., 76-80
Connectionism Lexicon, 162
High-level descriptions, 111-112, Lieberman, P., 69
198—202 Linguistic theories, 160-163. See also
Hinton, G., 132, 152, 207 Grammar; Computational linguistics
Hobbs, J., 158 - LISP, 10, 13
224 ~=Index
Visual cliff, 52
Vogel, S., 63-64
Walker, S., 73
Warrington, C,, 151
Winograd, T., 25—26
Winston, P., 26
Woodfield, A., 42, 47
Zytkow, J., 16
“wr
_—
tay
BRADFORD BOOKS aE
a
of related interest
“
PARALLEL DISTRIBUTED PROCESSING
Explorations in the Microstructure of Cognition
Volume 1, Foundations
Group
by David E. Rumethart, James McClelland, and the PDP Research
Volume 2, Psychological and Biological Models
Group
by James McClelland, David E. Rumethart* and the PDP Research
para-
These books by a pioneering neurocomputing group describe a new
digm for cognitive theory.
‘These two volumes may turn out to be among the ... most important books
yet written for cognitive psychology. They are already among the most
controversial.”
—Stephen E. Palmer, Contemporary Psychology
This handbook includes two 5¥-inch floppy disks for IBM PC and IBM PC
compatible computers. The software contains many of the models described
in Parallel Distributed Processing: Explorations in the Microstructure of
Cognition, volumes 4 and 2.
“These programs really work! ... The reader who is able to spend adequate
time will be most satisfied with the knowledge gained. There is no question but
that this book will be a major milestone.”
—Boston Computer Society Artificial Intelligence Newsletter