This Content Downloaded From 125.16.189.232 On Sat, 05 Oct 2024 09:04:27 UTC
This Content Downloaded From 125.16.189.232 On Sat, 05 Oct 2024 09:04:27 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Springer is collaborating with JSTOR to digitize, preserve and extend access to Computers and
the Humanities
Nick Cercone is in the computing science program at Simon Fraser University, Burnaby, British Columbia.
235
WORD FORMATION
COMPOSITION DERIVATION/INFLEXION
S (Including overlap
i I.I------
PREFIXES i SUFFIXESX
IIII
DERIVATIONAL INFLEXION
("goldsmith,"
text, preliminary analysis of structural andnoun-noun;
affix "blackboard," adjective-
noun; "drawbridge,"
relationships can aid in determining the word's verb-noun), compound adjec-
tives ("blameworthy,"
function in the utterance. Our ability to infer or noun-adjective; "over-
anxious," adverb-adjective),
guess meanings can only be enhanced through compound pronouns
extensive morphological analysis. ("myself'), compound verbs ("overcome," adverb-
verb; "daresay," verb-verb), and compound partici-
(iv) Derivational information
Derivational affixes can affect ples
word such as "airborne." Back formation is a special
meaning,
type of composition in which the second element
often in a systematic way, for example "-esque,"
denotes an agent or action, for example "house-
like a; "non-," negation. A word like "booklet"
keeper." Word formation also gives repetition
can be satisfactorily understood as "little book"
compounds ("goody-goody," "fifty-fifty"). Neverthe-
through morphological analysis without explicitly
less, most composite words yield little useful affix
storing both words.
information except where there is overlap with
derivation.
Figure 1 illustrates word formation as either (i)
composition-the formation of a word Inflexion
by and
the derivation
close are related, but while
combination of two or more elements each of which inflexion modifies a word (book-books-book's),
is also a separate word, e.g., "goldsmith"; or (ii)derivation can result in the formation of a different
derivation/inflexion-formation by the close combi-word (kind-kindness).1 Prefixes in English are always
nation of two or more elements only one of which derivational. Suffixes may be derivational or in-
can be a separate word, e.g., "kindness." flexional, so we must distinguish between the two
Word formation by composition generates kinds.
compound words. There are compound nouns The computer program which performs morpho-
# R NEW:MACLISP
# 21:50.18
=(RESTORE 'CHKPT) =NIL
=(STEM 'INTEREST) =(INTEREST NIL INTEREST NIL)
=(STEM 'INTERESTED) =(INTERESTED NIL INTEREST (ED) )
=(STEM 'INTERESTING) =(INTERESTING NIL INTEREST (ING) )
=(STEM 'INTERESTINGLY) =(INTERESTINGLY NIL INTEREST (ING LY) )
=(STEM 'BASHES) =(BASHES NIL BASH (S) )
=(STEM 'BATHES) =(BATHES NIL BATHE (S) )
=(STEM 'BATHS) =(BATHS NIL BATH (S) )
=(STEM 'LEANING) =(LEANING NIL LEAN (ING) )
=(STEM 'LEAVING) =(LEAVING NIL LEAVE (ING))
=(STEM 'DENTED) =(DENTED NIL DENT (ED) )
=(STEM 'DANCED) =(DANCED NIL DANCE (ED) )
=(STEM 'KISSES) =(KISSES NIL KISS (S) )
=(STEM 'CURVED) =(CURVED NIL CURVE (ED) )
=(STEM 'CURLED) =(CURLED NIL CURL (ED) )
=(STEM 'ROTTING) =(ROTTING NIL ROT (ING) )
=(STEM 'ROLLING) =(ROLLING NIL ROLL (ING) )
=(STEM 'PLAYED) =(PLAYED NIL PLAY (ED))
=(STEM 'PLIED) =(PLIED NIL PLY (ED) )
=(STEM 'REALEST) =(REALEST NIL REAL (EST) )
=(STEM 'PALEST) =(PALEST NIL PALE (EST) )
=(STEM 'KNIVES) =(KNIVES NIL KNIFE (S) )
=(STEM 'PRETTILY) =(PRETTILY NIL PRETTY (LY))
=(STEM 'NOBLY) =(NOBLY NIL NOBLE (LY) )
=(STEM 'PATRONIZINGLY) =(PATRONIZINGLY NIL PATRON (IZE ING LY) )
=(STEM 'RELIABLE) =(RELIABLE NIL RELY (ABLE) )
=(STEM 'ACCESSIBLE) =(ACCESSIBLE NIL ACCESS (IBLE) )
=(STEM 'ACREAGE) =(ACREAGE NIL ACRE (AGE) )
=(STEM 'M ILAGE) =(MILAGE NIL MILE (AGE)
=(STEM 'STOPPAGE) =(STOPPAGE NIL STOP (AGE) )
=(STEM 'CULTURAL) =(CULTURAL NIL CULTURE (AL) )
=(STEM 'RIDDANCE) =(RIDDANCE NIL RID (ANCE) )
=(STEM 'OPERATION) =(OPERATION NIL OPERATE (ATION) )
=(STEM 'STARVATION) =(STARVATION NIL STARVE (ATION) )
=(STEM 'ACCURACY) =(ACCURACY NIL ACCURATE (CY) )
=(STEM 'CONSTANCY) =(CONSTANCY NIL CONSTANT (CY) )
=(STEM 'CAPTAINCY) =(CAPTAINCY NIL CAPTAIN (CY) )
=(STEM 'DUKEDOM) =(DUKEDOM NIL DUKE (DOM) )
=(STEM 'HANDFUL) =(HANDFUL NIL HAND (FUL) )
=(STEM 'PATRIOTISM) =(PATRIOTISM NIL PATRIOT (ISM) )
=(STEM 'SOCIALIST) =(SOCIALIST NIL SOCIAL (IST) )
=(STEM 'VISIBILITY) =(VISIBILITY NIL VISIBLE (ITY) )
=(STEM 'SENTIMENTALITY) =(SENTIMENTALITY NIL SENTIMENT (AL ITY) )
=(STEM 'CIVILIZE) =(CIVILIZE NIL CIVIL (IZE) )
=(STEM 'PENNILESS) =(PENNILESS NIL PENNY (LESS) )
=(STEM 'RESTLESS) =(RESTLESS NIL REST (LESS) )
=(STEM 'CHILDLIKE) =(CHILDLIKE NIL CHILD (LIKE) )
=(STEM 'ARGUMENT) =(ARGUMENT NIL ARGUE (MENT) )
=(STEM 'SHIPMENT) =(SHIPMENT NIL SHIPMENT NIL)
=(STEM 'DRUNKENNESS) =(DRUNKENNESS NIL DRUNK (EN NESS) )
=(STEM 'GOODNESS) =(GOODNESS NIL GOOD (NESS) )
=(STEM 'WICKEDNESS) =(WICKEDNESS NIL WICKED (NESS) )
=(STEM 'NERVOUS) =(NERVOUS NIL NERVE (OUS) )
=(STEM 'ASLEEP) =(ASLEEP A SLEEP NIL)
=(STEM 'ANTEROOM) =(ANTEROOM ANTE ROOM NIL)
=(STEM 'ANTICHRIST) =(ANTICHRIST ANTI CHRIST NIL)
=(STEM 'ARCHIBISHOP) =(ARCHBISHOP ARCH BISHOP NIL)
=(STEM 'AUTOBIOGRAPHY) =(AUTOBIOGRAPHY AUTO BIOGRAPHY NIL)
=(STEM 'BEMOAN) =(BEMOAN BE MOAN NIL)
=(STEM 'BIANNUAL) =(BIANNUAL BI ANNUAL NIL)
=(STEM 'COUNTERACT) =(COUNTERACT COUNTER ACT NIL)
(Listing 1-continued)
=(STEM 'DECODE) =(DECODE DE CODE NIL)
=(STEM 'ENDANGER) =(ENDANGER EN DANGER NIL)
=(STEM 'EMBED) =(EMBED NIL EMBED NIL)
=(STEM 'HYPERACTIVE) =(HYPERACTIVE HYPER ACT (IVE) )
=(STEM 'IMMORAL) =(IMMORAL IM MORAL NIL)
=(STEM 'INTERRELATIONSHIP) =(INTERRELATIONSHIP INTER RELATE (ATION SHIP) )
=(STEM 'MISCONDUCT) =(MISCONDUCT MIS CONDUCT NIL)
=(STEM 'NONSTOP) =(NONSTOP NON STOP NIL)
=(STEM 'POSTWAR) =(POSTWAR POST WAR NIL)
=(STEM 'RECONSIDER) =(RECONSIDER RE CONSIDER NIL)
=(STEM 'SUBAWARENESS) =(SUBAWARENESS SUB AWARE (NESS))
=(STEM 'SUPERMARKET) =(SUPERMARKET SUPER MARKET (NIL)
=(STEM 'ULTRACONSERVATION) =(ULTRACONSERVATION ULTRA CONSERVE (ATION) )
=(STEM 'UNNECESSARY) =(UNNECESSARY UN NECESSARY NIL)
=(STEM 'UNREST) =(UNREST UN REST NIL)
=(STEM 'COEDUCATION) =(COEDUCATION CO EDUCATE (ATION) )
=(STEM 'COOPERATIONAL) =(COOPERATIONAL CO OPERATE (ATION AL) )
=(STEM 'DEHUMANIZE) =(DEHUMANIZE DE HUMAN (IZE) )
=(STEM 'INEQUALITY) =(INEQUALITY IN EQUAL (ITY) )
=(STEM 'REELIGIBILITY) =(REELIGIBILITY RE ELIGIBLE (ITY) )
=(STEM 'LOUDLY) =(LOUDLY NIL LOUD (LY) )
=(STEM 'LENGTHWAYS) =(LENGTHWAYS NIL LENGTH (WAY S) )
=(STEM 'HOMEWARDS) =(HOMEWARDS NIL HOME WARD S) )
=(STEM 'NON LOUDLY) =(NONLOUDLY NON LOUD (LY) )
=(STEM 'BEER) =(BEER NIL BEER NIL)
=(STEM 'MURDER) =(MURDER NIL MURDER NIL)
=(STEM 'OTHER) =(OTHER NIL OTHER NIL)
=(STEM 'ARABESQUE) =(ARABESQUE NIL ARAB (ESQUE) )
=(STEM 'REALIZE) =(REALIZE NIL REAL (IZE) )
=(STEM 'GROTESQUE) =(GROTESQUE NIL GROTESQUE NIL)
=(STEM 'NONAGENARIAN) =(NONAGENARIAN NIL NONAGENARIAN NIL)
=(STEM 'NONALIGNMENT) =(NONALIGNMENT NON ALIGN (MENT) )
=(MTS)
Lexical categories are properties associated with research have been classified according to class and
lexical items used in parsing. Through categories, thefeature (features shown in Tables 2 and 3). Some of
representation of an appropriate lexical item can the be decisions made in this classification were
selected from the lexicon. Categories are identified arbitrary, especially those pertaining to whether a
with classes (or parts of speech) rather than ex- word or group of words should form a new class or
pressions of a particular syntagmatic relation among given a new feature within a class. However, th
classification scheme is used to aid, not constrai
items in an utterance. Generally, classes are separated
parsing; detailed concern about this type of arb
into open and closed classes. Characteristically, closed
classes have a strictly limited membership which trariness is unwarranted. Please note that the proble
cannot be increased by adding new formations with or the use of coordinators, which can link word
loanwords (words which have been incorporated by phrases, clauses, or sentences, has not been addresse
one language from another language). The signifi- Any adequate solution to this problem might entai
cance of closed-class items is best expressed by theirsubstantial changes in the interpretation or assign
grammatical function. In contrast, open classes have ment
a of closed categories (and syntactic features).
large, flexibly increasing membership. The meaning of The syntactic features which can be attached to
open class words is best expressed through synonyms. the various lexical items are shown in Table 2 (open
The difference between the classes representscategory)
a and Table 3 (closed category) and
mixture of criteria, both statistical (the number of
explained in detail below. These features are neces-
forms in a class), and diachronic (concern with thesary to insure formal agreement in person, number, o
way in which language changes over time). tense between two or more lexical items, or parts
The lexical categories for English shown in Tablesentences.
1
adapt categories used by Woods et al. (1972) andMost of these terms are used in their ordinary
Winograd (1972). The lexical items used in this
sense. The special labels are as follows: PERS
# R NEW:MACLISP
# 22:39.27
(RESTORE 'CHKPT)
= NIL
= (CLASS 'DRINK)
(N (NS ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4))) A (PRES
( (0 0) (/*DRINK1 P1 P2)) ((0 0) (/*DRINK1A P1 P2))
( (0 0) (/*DRINK3 P1 P2) )))
= (CLASS 'DRINKS)
= (N (NP ( (0 0) (/*DRINK2)) ((0 0) (/*DRINK4) )) A (PRES
= TPS ((0 0) (/*DRINK1 P1 P2) ) ((0 0) (/*DRINK1A P1 P2)
= ) ((0 0) (/*DRINK3 P1 P2))))
= (CLASS 'DRINKER)
= (N (PERS ((0 0) (/*DRINK5) ))
(CLASS 'DRINKERS)
= (N (NP PERS ( (0 0) (/*DRINK5)))
= (CLASS 'DRINKING)
= (N (NS ((0 0) (/*DRINK6))) A (PART ((0 0) (/*DRINK1
= P1 P2)) ((0 0) (/*DRINK1A P1 P2) ) ((0 0) (/*DRINK3
= P1 P2) )) NM (ADJ CLASF ((0 0) (/*DRINK2 P1 P2) ) ( (0
= 0) (/*DRINK4 P1 P2)) ((0 0) (/*DRINK5 P1 P2)) ((0
0) (/*DRINK6 P1 P2) )))
= (CLASS 'DRINKINGS)
= (N (NP NS ( (0 0) (/*DRINK6))))
= (CLASS 'DRINKABLE)
(N (NS ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4) )) NM (ADJ
= CLASF ((0 0) (/*DRINK2 P1 P2)) ((0 0) (/*DRINK4 P1
= P2) ) ((0 0) (/*DRINK5 P1 P2)) ((0 0) (/*DRINK6 P1 P2)
= (CLASS 'DRINKABLES)
= (N (NP NS 9 (0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))
= (CLASS 'DRINKWISE)
(AM (AT3 ( (0 0) (/*DRINK1 KIND)) ((0 0) (/*DRINK 1A
S KIND))))
(CLASS 'DRINKLIKE)
(NM (ADJ ( (0 0) (/*DRINK2 P1 P2)) ((0 0) (/*DRINK4
P1 P2)) ((0 0) (/*DRINK5 P1 P2)) ((0 0) (/*DRINK6 P1
P2))))
(CLASS 'DRINKETTE)
(N (DIM ( (0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))
(CLASS 'DRINKETTES)
(N (NP DIM ( (0 0) (/*DRINK2) ) ( (0 0) (/*DRINK4) )))
(CLASS 'DRINKIE)
(N (DIM ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))
S(MTS)
OPEN CATEGORIES
N ...... nominal, typically either a noun (man, airplane, city) or a proper noun (John, Canada)
A ...... action, typically a verb (walk, throw, fly)
NM ..... nominal modifier, typically an adjective (tall, happy)
AM ..... action modifier, typically an adverb (quickly, suddenly)
CLOSED CATEGORIES
predicate
indicates a personal nominal (e.g., employee); DIM in the following way: In "John w
ESSENTIALLY decent," "essentially" accentuates
indicates a diminutive (e.g., booklet). POSS indicates
those aspects
possession, as in "John's." The time features differ if of decency which are most crucial to i
possession
the nominal is a time word (e.g., day, year-TIME) or and de-emphasizes those features whi
are less crucial. The features AT11 and AT22 are
indicates a relative time (e.g., yesterday-FTIME).
The feature AUX indicates an auxiliary (i.e., similar
a verb to AT1 and AT2. They indicate more context
form used in forming the tenses, moods, and dependence;
voices of the effect of AM's with these features is
other verbs). Included in the auxiliaries are partially
the determined by their proximity to the verb
features BE, DO, HAVE, WILL, and MODAL,3they whichmodify, for example the adverb "slightly" as
help determine constituents of action phrases.well as some sentence adverbials. The feature AT3
applies to predicate limiting adverbs such as "emo-
Classifiers may also be nominals, as in Winograd's
(1972) example, "water meter cover adjustmenttionally" and "healthwise." Manner adverbs, e.g.,
screw." The type features attached to AM's quickly,
arequietly, etc., have the feature AT4 attached.
Whenever a word acts as an adverb of degree it is
specific to adverbial modifiers. In part, adverbial
given the feature AT5, as in "I was DEAD tired."
modifiers are based on Zadeh's (1972) work, especial-
ly the adverbial modifiers with features "AT I" andAT6 is the applicable feature for modal
Finally
"AT2." Features classify adverbs accordingadverbs,
to how such as "certainly," and "possibly."
they operate in an utterance. The feature AT1 is
3.2 Meaning
attached to adverbs that act on single fuzzy sets as in Representations for Word Concepts.
"John was VERY decent"4 where "very" raises Associated
the with open-class category words are
meaning
criteria of all aspects that contribute to decency. Therepresentations: one for each sense of the
feature AT2 applies when the adverb operates word. on
Theastructure of a meaning representation is
OPEN CATEGORIES
N's
AM's
PRO's
DET's
DEF ............ definite INDEF .......... indefinite
based on the semantic network notationsubsumes not only differing senses of "drink," but
developed
also other in-
by Schubert (1974). Pragmatic and semantic more specific concepts as well, like
"eating" or "receiving an enema." This observation
formation are included in a meaning representation
for words. has led to the following consideration.
Figures 2 through 7 show networks that illustrate When creating the meaning representations (as
some of the main senses of the word "drink," extended semantic networks) for concepts, it is
concentrating on its action aspects. For illustrative desirable to avoid the duplication of propositions in
purposes Figures 2, 4 and 7 are divided into astorage. If we extract more general concepts from the
pragmatic section and a semantic section. The specific concepts that they subsume (totally or in
pragmatic section includes the template(s) that guides part), we can avoid duplication by associating the
the parse of the utterance and two lists: the firstcommon propositions with the more general concept.
contains propositions that represent the implicationsIn a sense the work of both Schank (1972) and Wilks
that are likely to be needed for the comprehension of(1973) support the contention that the meaning of a
subsequent text; and the second contains proposi-concept is best represented by predications at the
tions representing critical implications that we expect highest level of generality that adequately explain the
to match in the surface structure. In Figure 2 this term's meaning. Thus we extract from "drinking"
first list is (P3) and the second list is (P1,P2). The (and "eating," etc.) the structure shown in Figure 8.
semantic section contains the network that representsWe might reasonably label the concept expressed by
the meaning of the word sense. Figures 3, 5, and 6 this structure "ingest." It is important to note,
show various nominal senses of the word "drink." however, that while Schank and Wilks might conclude
Notice that Figures 2, 4, and 7 all have the notion that "ingesting" is a primitive action, I consider it a
of "change in containment location" in common. general concept. This applies to all primitive actions
This corresponds to a "general concept" that put forward of Schank and Wilks. Examination of
P3: in
P4: mouth-of
I P10: then
(P3) DIR
(P1,P2)
P2: liquid yP7: moving [
P11: cause
P:8 then
P9: location [
PRAGMATICS SEMANTICS
Figure 2.
"(John) 'drinks' (water)"
"(Mary) 'drinks' (prune juice)"
Figure 8 shows clearly that ingesting is "not a in the text and are most frequently needed for
primitive action" but one whose meaning is expressed comprehension. At times, however, other proposi-
in terms of causes, motion, time, and other concepts. tions may be required for comprehension. The word
At this point the original representations for the sense illustrated in Figure 2 shows that we expect, in
various action senses of "drink," i.e., Figures 2, 4, an utterance about drinking, an anim(x) and a
and 7, can be replaced with more simplified diagrams liquid(y), propositions P1 and P2. But the question
based on the general concept "ingest" (Figure 8). In can be posed, "What is the effect of John's drink-
similar fashion Figure 10 diagrams one meaning of ing?" To answer this question entails a further
"eating," again based on the general concept investigation of other propositions in the network,
"ingest." especially the first list of implications. Although it is
The key to effective use of the meaning repre- implicit in the semantic structure, we make explicit in
sentation for comprehension centers on developing the pragmatic structure the inference that "x - drink -
propositions with arguments that we expect to match y" necessarily implies that it causes y's location to be
in the surface utterance. The lexical item for "drink" "in" x at some time after x initiates the drinking
would contain, among other things, pointers to a list action. Of course, since this implication is common to
of the arguments that we expect to match with words all senses of "drink" (and eats, inhales, etc.) it is
drinker
Figure 3.
"(John is a) 'drinker' "
P3: in
P4: mouth-of
P11: cause
{ P8: then
P9: location
P12: cause
P13: before
P14: inebriated [ ]
PRAGMATICS SEMANTICS
Figure 4.
"(John) 'drinks' (whiskey)"
"(John) 'drinks' "
"(Mary has a) 'drinking' (problem)"
"(Mary) 'drinks' (a lot)"
x -liquor
body-of-water drink2
Figure 5. Figure 6.
"(Throw John into the) 'drink' " "(John is drinking a) 'drink'"
P3: in
P4: thru-part
P11: cause
I P8: then
PRAGMATICS SEMANTICS
Figure 7.
"(My car) 'drinks' (gasoline)"
"(The donut) 'drinks' (coffee)"
P3: in
P4: thru-part
x - PS: < ?
WHO WHAT
P10: then
THRU DIR
y P7: moving
(P3)
P11: cause
p:8 then
P9: location .
PRAGMATICS SEMANTICS
Figure 8.
"ingest"
(P1,P2)
THRU
ingest )
~WHAT
liquid>
PRAGMATICS SEMANTICS
Figure 9.
"(John) 'drinks' (water)"
(P1,P2)
THRU
ingest O
WHAT
food y
PRAGMATICS SEMANTICS
Figure 10.
"(John) 'eats' (cake)"
PRED B C
anim P1 SP4
A PRED
x mouth-of
abstracted inferencesthe
into part of the representation
same of a concept. gener
well, as shownThe inpropositions, for example P1 and P4 shown
Figure 8. in
. The semanticFigurestructure
2 are, in turn, represented as shown in Figure
for "d
as properties 11. See Appendix A for sample lexicalto
attached entries, ineach
properties include ARGS,
particular the entry for "drink." the
ments in the wordMany advantages accruesense;
by representing meaning IMPL
tions; the formulas in this way. First, unlike Wilks' (1973)
propositions P1, P2
arguments to meaning
predicates
formulas, the representation is suggestivethat
of m
explicating thethe meaning
given of a word. I see no justification for
word sens
form (binary) lexical decomposition trees as meaning repre-
sentations for words since such trees neither suggest
argl arg2 .. argi WORD argi+1 ... argn. the type of processing required nor the propositions
they encode.5
The implications make the most commonly used A second and major advantage is that the meaning
# R NEW:MACLISP
# 20:23.08
(RESTORE 'CHKPT)
= NIL
= (MTS)
Listing 3. Lexical manipulation and maintenance
All other
representation for lexical items
a would
wordnot be consideredis n
further. The letter "r" would locate
terms of "primitives." all items that
Rather, e
propositions that
begin withform
"dr" and so on unfil
thethe word isnetw
found. In
word's meaning this way
can the numberbe of searches needed to locate a
represent
manner. In particular theto the
lexical item is directly proportional notion
number of
no more "primitive"
letters and the size of thethan "dri
lexicon. This is easily done
in LISP.
representing word meanings enh
ational schema for comprehensio
This lexical structure was designed for a small
(300 words)
of detail can be in dictionary
theused with an experimental
meaning
program designed
adding propositions to to create
the extended semantic-
networ
network meaning-representations
Third, inference mechanisms, for various ut- h
algorithms, and superimposed
terances. The program, with relatively few heuristics, k
schemas can be creates network structures on the average of usin
incorporated about
three seconds ofas
for word meanings CPU time per sentence (simple
easily as i
sentences, active voiceinformatio
sentation. Incomplete only). This is accomplished
with an "interpretor
be inferred, when necessary, only" LISP system running dire
ing representation, in
under the Michigan Terminal Systemsome
[MTS] on an
argument. This IBM type
360/67 computer. of meaning
Ninety-five percent of this
lexical items isCPU time is devoted to non-lexical
further explainereferencing
For a brief sketch of parsing
operations. Experiments are being designed to gather(w
stated grammar) based
exhaustive statistics to determineon this
the running times
Appendix C. of lexical manipulations (insertion, deletion, search-
ing, etc.) for different size lexicons with various types
of organizations (alphabetic,
4. Formal Specification of letter-frequency
Lexica
Table 4 shows the grammar
oriented, use-frequency oriented, etc.). by w
entered in the dictionary.
Listing 3 is a sample output which shows, in the Th
following order, a Naur
basically the Backus search through dictionaries
Form for
with the addition
lexical items, theof
merging the Klee
of two dictionaries, a
metalinguistic search
characters include
for the same items in the merged version, an
operator *, the
addition form :: =,
to the merged dictionary, and
and finally a
surround phrase-class search for the newly added
nameslexical item. The FIND
wh
entities. The form ::= can be read as "is of the form." routine has two arguments: the first is the word to be
The bar denotes alternation, one form or the other. found and the second specifies the dictionary to be
And the * defines an arbitrarily repeatable (zero or searched.
more) constituent when surrounded by brackets; The algorithms for manipulating and maintaining
otherwise, it defines an arbitrarily repeatable (one or lexical items as shown in Listing 3 are given in
more constituent; e.g., <a*> means zero or more a's, Appendix B.
while a* or <a>)* means one or more a's.
In Appendix A, examples of closed and open
category items are shown as they exist in the lexicon 6. Conclusions
(Cercone, 1975a). They were constructed accordingIn this paper the construction of a lexicon, as well as
to the syntax rules shown in Table 4. the manipulation and maintenance of lexical items,
has been explained. This lexicon has been used in an
5. Lexical Manipulation and Maintenance experimental program, see Cercone (1975a), that was
In order to enable the rapid retrieval of relevant designed to create semantic structures from utter-
lexical information, a scheme was developed that ances for the ultimate purpose of understanding
exploits the way tree structures are stored in LISP. natural language. This lexicon has proved to be
The root form is a binary branching tree that suggestssignificant to the experimental program because of
a search method similar to a binary search. Letters in the ease with which lexical items can be manipulated
the query word serve as an index to a subset of lexical and maintained. Since the lexicon has a uniform
entries which contain the letters in correspondingstructure, the routines which access and manipulate
positions. For example, in "drink," the "d" would be lexical information are relatively simple to under
used to locate the lexical items beginning with "d." stand and use.
Acknowledgments
I would like to thank Rici Liknaitzky for his ideas and programming aid, especially w
maintenance routines. I am indebted to Len Schubert, Jeff Sampson, and Carol Murchison f
reading and suggestions. I would also like to thank the reviewers, especially Don Ross of the
Minnesota, for their invaluable suggestions and insight. Part of this work was supported by
Research Council of Canada, grant A4309.
Appendix A
The lexicon is organized as a general list structure comprised of many similar structures. The f
list of all root forms beginning with the letter A, the second, those beginning with B, etc.
element a similar list organization is imposed. Each meaning-sense of a word contains
corresponding meaning-representation (a proposition-based semantic network, see secti
Cercone, 1975b). "Drinking," "drinkable," and "drinklike" all have pointers to *DRIN
*DRINK5, and *DRINK6 as possible meanings within the utterance in which they appear. The
sample entries, first from the closed category (Figure Al) and then from the open cate
lexicon.
(B(E(F(O(R(E(*(BIND () (*BEFORE1))
(PREP () (*BEFORE2))
((AM (AIT)) (*BEFORE3))) ))))
(H(I(N(D(*(PREP ()(*BEHIND))) ))))
(L(O(W(*(PREP () (*BELOW1))
(AM ((AP AA)) (*BE LOW2))) )))
(N(E(A(T(H(*(PREP () (*BENEATH1))
(AM ((AP AA)) (*BENEATH2))) ))))
(S(I(D(E(* (PREP () (*BESIDE))) )))))
(O(T(H(*(QNTRF ((NP COLL)) (*BOTH1))
(PRO ((INDEF)) (*BOTH2)) (AM () (*BOTH3))) )))
(U(T(* (BIND () (*BUT1)) (AM () (*BUT2))) ))
(Y(* (PREP () (*BY1)) (PRT () (*BY2))) ))
(D(O(W(N(*(PREP ()(*DOWN1)) (PRT ()(*DOWN2))))))
(E(A(C(H(* (QNTFR ((NS)) (*EACH1)))))
(PRO ((INDEF NS COLL)) (*EACH2))) )))
(I(T(H(E(R(*(QNTRF ((NS NP)) (*EITHER1))
(PRO ((INDEF NS NP)) (*EITHER2))) ))))
(G(H(T(*(QNTRF ((NP)) (*EIGHT1)) (NUM () (*EIGHT2)))
(H(*(ORD () (*EIGHTH))) )))))
(L(S(E(* (AM () (*ELSE))) )))
(V(E(R(Y(* (QNTFR ((NS)) (*EVERY)))
(O(N(E(* (PRO ((INDEF)) (*EVERYONE))) )))
(T(H(I)N)G)* (PRO ((INDEF NS)) (*EVERYTHING)))))))))))
(X(C(E(P(T(*(PREP () (*EXCEPT)) (CONJ () (*EXCEPT2)))))))))
(F(E(W(* (QNTRF ((NONUM NP COLL)) (*FEW)))
(E(R(*(QNTRF -((NONUM NP COLL)) (*FEW))) ))))
(I(F(T(H(*(ORD ()(*FIFTH))) )))
(R(S(T(*(ORD () (*FIRST))) )))
(V(E(* (QNTFR ((NP)) (*FIVE1)) (NUM () (*FIVE2))))))
(O(R(* (PREP () (*FOR1)) (CONJ () (*FOR2))))
(U(R(*(QNTRF ((NP)) (*FOUR1)) (NUM () (*FOUR2)))
(T(H(*(ORD () (*FOURTH))))))))
(R(O(M(*(PREP () (*FROM))) ))))
Appendix B
Algorithms for the routines shown in Listing 2 and Listing 3 are presented as a brief description follow
its LISP code.6
The CLASS routine has one argument-the word to be classified. If the word does not appear as a lexi
entry, the message "I do not know the word" appears followed by the word. If the word appears
closed-category lexicon, the entry's meaning, as appearing in CLOSCAT, is returned. Otherwise the rel
portions of the entry in the open category lexicon (OCAT) are extracted and returned (using the routi
FINDER), based on morphology.
(DEFPROP CLASS
(LAMBDA (WORD)
(PROG (W S A SL LEX)
(SETQ W (STEM WORD))
(RETURN
(COND
((NOW W) (PRINT '(I DO NOT KNOW THE WORD)) (PRINT WORD))
((SETQO A (CHK (EXPLODC (CADDR W)) CLOSCAT)) (CAAR A))
(T (SETQ A (CAR (SETQ SL (REVERSE (CADDDR W)))))
(SETO LEX (CAR (CHK (EXPLODC (CADDR W)) OCAT)))
(COND
(SL (COND
((NOT (EQ A 'S))
(MAPCAN '(LAMBDA (X) (FINDER X A)) LES))
((CDR SL)
(SETO A (FINDER (CAR LEX) (CADR SL)))
(LIST (CAR A)(CONS 'NP (CAR (CDR A)))) )
(T (MAPCAN '(LAMBDA (X) (FINDER X A)) LEX)) ))
(T (MAPCAN '(LAMBDA (X) (FINDER X NIL)) LEX)))))) ))
EXPR)
(DEFPROP FINDER
(LAMBDA (CAT SUF
(PROG (ANS X)
(DO I (CDR CAT) (CDDR I)
(OR (NULL!) (EQ 'SYN (CAAR I)) )
(COND ((SETQ X (DO J (CAR I) (CDR J)
(OR (NULL J) (EQ SUF (CARR J)))))
(SETO ANS (APPEND (APPEND (CDAR X) (CADR I)) ANS)))))
(RETURN (COND (ANS (LIST (CAR CAT) ANS)))) ))
EXPR)
The FIND routine has two arguments: the first is the word and the second is the dictionary in which to
search. The searching algorithm has been described in Section 5.
(DEFPROP FIND
(LAMBDA (W D)
(PROG ()
(COND
(D (COND
(W (DO J D (CDR J) (NULL J)
(COND ((EQ (CAR W) (CAAR J))
(RETURN (CHK (CDR W) (CDAR J)))) )))
((EQ (CAAR D) '*) (RETURN (LIST (CDAR D))))
(T (RETURN NIL))) )) ))
EXPR)
To add items to an existing lexicon, the routines DICTADD, ADD, and MUNG are used. DICTADD has
four arguments. The first specifies the lexicon to which the second argument is to be added. The third specifies
the flag, i.e., the character to be inserted after the letters of the lexical entry to designate the end of the root
and the beginning of the meaning field; the fourth argument specifies the meaning field. ADD is invoked from
DICTADD and does the actual addition by searching for the proper position and invoking MUNG to ready the
item for the addition. MUNG is invoked from ADD to consolidate the parts of the lexical entry that become
the single lexicon entry.
(DEFPROP DICTADD
(LAMBDA (DICT WORD FLAG PROP)
(PROG (TDICT)
(SETQ TDICT (CONS NIL DICT))
(ADD TDICT WORD FLAG PROP)
(RETURN (CDR TDICT))))
EXPR)
(DEFPROP MUNG
(LAMBDA (WORD FLAG PROP)
(COND ((NULL WORD) (CONS FLAG PROP))
((CONS (CAR WORD)
(LIST (MUNG (CDR WORD) FLAG PROP))))))
EXPR)
To combine lexicons and merge items with the same root into one lexicon, the COMBINE routine is used.
COMBINE has four arguments; the first two specify the old and new dictionaries. The old dictionary is
combined to the new one, so the first argument names the combined dictionary. The old dictionary is not
destroyed and may still be used. The third argument is the flag (as in DICTADD). The fourth argument,
specified as NIL, is used internally, since the COMBINE routine is recursive, for building up the new
dictionary. COMBINE uses the ADD routine.
(DEFPROP COMBINE
(LAMBDA (ODICT NDICT FLAG SOFAR)
(MAPC
'LAMBDA (X)
(COND
((EQ (CAR X) FLAG)
(ADD (CONS () ODICT)(REVERSE SOFAR) FLAG (CDR X)))
((COMBINE ODICT (CDR X) FLAG (CONS (CAR X) SOFAR)))))
NDICT))
EXPR)
Although not appearing in any of the above listings, the routines DICL, DICLIST, JUSTN, and
JUSTNAMES can be used to list lexicons neatly. DICLIST and JUSTNAMES have one argument, the name of
a dictionary. The former lists the contents of the dictionary exactly as they appear in storage, while the latter
gives just the root forms of the lexical items. DICLIST invokes DICL and JUSTNAMES invokes JUSTN in
analogous manners, i.e., to print the listing.
(DEFPROP DICL
(LAMBDA (NDICT FLAG SOFAR)
(MAPC'(LAMBDA (X)
(DEFPROP JUSTNAMES
(LAMBDA (NDICT FLAG SOFAR)
(MAPC '(LAMBDA (X)
(COND ((EQ (CAR X) FLAG)
(PRINT (IMPLODE (CAR (LIST
(REVERSE SOFAR) FLAG (CDR X))))))
((JUSTNAMES (CDR X) FLAG (CONS
(CAR X)
SOFAR)))))
NDICT))
EXPR)
Appendix C
The following discussion explains how English is parsed in an experimental program that uses the lexical
structure. A semantic structure expressing a particular utterance is formed according to simple structural rules.
The central role of verbs is acknowledged and preferred semantic categories for the subjects and objects of
verbs guide each choice in the creation of meaning structures. Word sense disambiguation for verbs, modifiers,
and nominals follows naturally in this approach, vide Cercone (1975a). Extensive trial and error searches are
eliminated since the interpretation takes on a "slot and filler" character. The approach to interpretation is
almost completely semantically oriented and syntax is used only when meaning-analysis fails.
Initial Classification
Initially the text is read (either in discourse mode or from an external file for longer text) and broken into
clauses (at present this process is very unsophisticated). Each clause is then "classified" in the following
manner. Words are morphologically analyzed and, based on that analysis, are classified to determine all of their
possible syntactic functions. For example, the form "drinks" of the root word "drink" can only be used
nominally or as an action. The root form is located in the lexicon and using affix information from the
morphological analysis, all of the possibilities for the word are extracted. When all words in the clause are
classified, the next phase, parsing, begins.
Parsing
Traditionally, the purpose of parsing sentences has been to output syntactic trees. These trees served as
input to semantic routines charged with the generation of meaning structures. Winograd (1972) and Woods
(1970) tried, with some degree of success, to integrate the two processes and have each guide the other.
Schank (1972) and Wilks (1973) have stressed that syntactic processing was secondary to meaning analysis and
should be necessary only when the resolution of ambiguity by meaning analysis alone had failed. Their parsing
phase is almost completely semantically oriented. One important by-product in the method to be described is
the detection of the correct "sense" of nominals and actions and, although not yet implemented, modifiers as
well (I am restricting utterances to active voice).
The parsing proceeds as follows. Words in a classified clause are scanned from left to right in search of a
suitable candidate for an action. Once found, the sentence is separated into
((FIRST PART) (ACTION CANDIDATE) (SECOND PART)).
The action candidate contains, among other things, a list of possible action "senses" that this particular root
form may have. These senses are ordered by a scheme, albeit a very superficial scheme, to be described later.
Associated with word senses are templates as described in Cercone (1975a). For example the sense *GIVE1 of
the root form "give" has a template
X GIVE Y Z
X GIVE Z TO Y.
The template is used to guide the parsing. In this example X, Y and Z are variables represe
arguments of the predicate "give" that we expect to find in the surface utterance, in the given or
argument is not present in the utterance, the implication template can be used to infer argument
detailed information concerning the arguments is obtained by examining the network proposition
sense of "give" in question, those which involve the arguments. Thus X would represent an
nominal capable of "giving."
This is similar to what Schank does when parsing in conceptual dependency theory. If the word
surface utterance do not satisfy the constraints for arguments, one of four reasons is likely. First,
syntactic constructions could exist. Second, a different "sense" of the action is "correct." Third, the
action-candidate is not the valid action of the clause. Finally, some other reason, like slang express
metaphor might be the cause.
Whenever arguments fail to satisfy a predicate, a search for alternative implication templates begi
fails then the list of senses for the root form is further examined. If other senses of the action cand
they are examined further to see if arguments in the surface utterance match variables in the temp
procedure is repeated until the correct sense of the action candidate is found or the list of senses is
If the sense list is exhausted, scanning continues in the surface clause for another suitable action can
the process is repeated.
The matching of predicates' arguments in surface text to variables in implication templates includ
the correct sense of nominals and modifiers as well. The sentence "A drinker drinks many drinks" h
second argument of the predicate "drinks" the word "drinks." Possible nominal senses for tha
include an alcoholic beverage, a body of water (throw John into the drink), or a thirst quencher. Th
first sense of a nominal fails as argument, all other senses must be examined before deciding not to a
argument. This reasoning applies with respect to modifiers in a similar but not identical fashion. Fo
a "yellow cake" is a type of cake much like a chocolate cake, whereas a "yellow car" is someth
yellow and something that is a car. Using these methods, sentences such as "A 'drinker' 'dri
'drinks' " and "The pilot 'banked' his plane near the river 'bank' over the 'bank' that he 'banks' on
'banking' service" present little difficulty.
Morphological analysis is important since only those forms that can authentically be considered
need be examined. In the example, "A drinker drinks many drinks" morphological analysis
"drinker" immediately as an action candidate. Thus, we are quickly able to get a right choice.
Both Schank and Wilks used their intuition to set up respective meaning representations. The way
defined and used semantic "primitives" are one example. One way in which my intuition has
experimental program can be shown with the following superficial scheme for choosing word senses.
"bank"
frequency count will be incremented by one (if it is the correct sense). This would continue
sense fails to be correct. At this point we would examine the second, first, and fourth senses u
the correct meaning sense (i.e., the ith term). The 13 is added to g3, li is set to one (non zer
zero, and the ith meaning sense is selected whenever the term "bank" is encountered.
The list of modifiers found in the clause, further classified as to function, and associated
predicate arguments they modify, is also given as part of the parsing phase.
Once the parsing phase has been completed, the meaning representation is built for the c
structure is integrated into the semantic network, vide Schubert (1974). The first step invo
intermediate structure based only on the predicate of the clause and its arguments. After t
created, it may be altered to accommodate other information detected in the parsing phase. T
includes mainly modifiers (only some adjectival modifiers are now analyzed, howeve
quantificational are planned).
The following IBM Reports are available on request to IBM Corporation, Armonk, N.Y.
"Data Entry of Chinese and Kanji Characters" no. 5249, edited by E. F. Yhap. A keystroke system with 37
keys, upper and lower shift, is proposed for data entry of Chinese and Kanji characters into computer systems.
This data entry method has been applied to the 881 Kanji characters which are prescribed as a minimum
requirement for the six elementary grades in Japanese schools by the Japanese Ministry of Education. The
resulting average number of keystrokes for this set of 881 characters is just a little under 4.2 keystrokes per
character. Reasonably high rates of character input are therefore expected to be achievable (60 cpm or better).
Other advantages claimed (but not yet tested) for this data entry method are ease of operator training, and
lack of operator mental fatigue.
"An Organization for a Dictionary of Senses" no. 5548, edited by Dick H. Fredericksen. This paper describes a
lexical organization in which "senses" are represented in their own right, along with "words" and "phrases,"
by distinct data items. The objective of the scheme is to facilitate recognition and employment of synonyms
and stock phrases by programs which process natural language. Besides presenting the proposed organization,
the paper characterizes the lexical "senses" which result.
"On Natural Language Based Query Systems" no. 5577, edited by Stanley R. Petrick. Some of the arguments
which have been given both for and against the use of natural languages in question-answering (QA) systems
are discussed. Several QA systems are evaluated in assessing the current level of QA system development.
Finally, certain pervasive difficulties which have arisen in developing natural language based QA systems are
identified, and the approach which has been taken to overcome them in the REQUEST System is described.
"The Request System" no. 5604, edited by Warren J. Plath. REQUEST is an experimental Restricted English
QUESTion-answering system which is currently capable of analyzing and answering a variety of English
questions, spanning a significant range of syntactic complexity, with respect to a small Fortune-500-type data
base. The long-range objective of this work is to explore the possibility of providing non-programmers with a
convenient and powerful means of accessing information in formatted data bases without having to learn a
formal query language. In order to address the somewhat conflicting requirements of understandability for the
machine and maximum naturalness for the user, the REQUEST System employs a language processing
approach featuring: (1) the use of restricted English; (2) a two-phase, compiler-like organization; and (3)
linguistic analysis based on a transformational grammar. The present paper explores the motivation for this
approach in some detail and also describes the organization, operation, and current status of the system.