0% found this document useful (0 votes)
11 views

This Content Downloaded From 125.16.189.232 On Sat, 05 Oct 2024 09:04:27 UTC

Uploaded by

phdfromlpu24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

This Content Downloaded From 125.16.189.232 On Sat, 05 Oct 2024 09:04:27 UTC

Uploaded by

phdfromlpu24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Morphological Analysis and Lexicon Design for Natural-Language Processing

Author(s): Nick Cercone


Source: Computers and the Humanities , Jul. - Aug., 1977, Vol. 11, No. 4 (Jul. - Aug.,
1977), pp. 235-258
Published by: Springer

Stable URL: https://www.jstor.org/stable/30199902

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Springer is collaborating with JSTOR to digitize, preserve and extend access to Computers and
the Humanities

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
Computers and the Humanities, Vol. 11, pp. 235-258. PERGAMON PRESS, 1978. Printed in the U.S.A.
0010-4817/78/0706-0235$2.00/0
Copyright A 1978 Pergamon Press

Morphological Analysis and Lexicon Design


for Natural-Language Processing
NICK CERCONE

1. Introduction determiner in "a red haired woman drinking wine."


Language use presumes knowledge about words. Part Any two sentences in which these descriptions appear
of this knowledge is functional (how words are used) have quite distinct meanings.
and part of it deals with the meanings of words. The The construction and maintenance of a lexicon,
lexical component of a computer memory model designed to operate with the meanings and functions
should be organized to deal adequately with both. of words, is described.
Before we begin designing a lexicon, we might
observe how any standard dictionary is organized. 2. Morphological Analysis of English Words
This consideration quickly leads to chagrin. Words are The length of lexical items (the number of characters)
not always assigned meanings in any consistent way; and their codification can greatly affect subsequent
rather, various methods are used, be they extra- processing times in automated language under-
linguistic (e.g., diagrams) or explicit definitions. Also standing systems. Lexical design must account for
meanings are given in English, which is not a suitable this. One obvious storage simplification is to perform
formalism for lexical memory. This informal struc- morphological analysis on words. This facility would
ture is not conducive to the construction of an "in permit all entries with the same root form to be
toto" memory model. stored as a single lexical item along with indicators
The experimental program reported in Cercone for permissible inflections. Morphological analysis is
(1975b) explores the nature and computational use an integral component of the experimental program
of meaning representations for word concepts in an of Cercone (1975a) for many reasons, including the
automated natural language understanding system. following advantages:
Word meanings are represented as extended semantic
networks (see Schubert, 1974) based on propositions. (i) Storage economy
These meaning representations are accessed via a It would be absurd to store all forms of lexical
lexicon. items directly since well-defined spelling rules exist
Functional knowledge about words, on the other which specify all normal word formations. A
hand, can aid language interpretation when meaning small, relatively simple analysis routine can save
analysis alone stalls or when making initial assump- vast amounts of storage.
tions concerning anaphoric references or those
associated with word order. For example, if a system (ii) Interpretive assistance
were canonically to represent one sense of "give" as a A by-product of morphological analysis is the
three-place predicate *GIVEl(x,y,z) corresponding to discovery of affixes that were added to the root
the sentence "John gave Mary the book," where word to form the word under analysis. Often these
argument x is bound to John, y to Mary, and z to affixes, especially in the case of inflexional
book, then the appearance of the preposition "to" in endings, determine the use of the word in an
"John gave the book to Mary" signifies change in the utterance, or, at least narrow its possibilities.
order of arguments. As another example consider the
definite determiner in the description "the red haired (iii) Learning new words
woman drinking wine" in contrast to the indefinite Whenever an unknown word is encountered in

Nick Cercone is in the computing science program at Simon Fraser University, Burnaby, British Columbia.

235

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
236 NICK CERCONE

WORD FORMATION

COMPOSITION DERIVATION/INFLEXION

S (Including overlap

DEAD AFFIXES I I LIVING AFFIXES

i I.I------
PREFIXES i SUFFIXESX
IIII

DERIVATIONAL INFLEXION

Figure 1. Short taxonomy of English word formation

("goldsmith,"
text, preliminary analysis of structural andnoun-noun;
affix "blackboard," adjective-
noun; "drawbridge,"
relationships can aid in determining the word's verb-noun), compound adjec-
tives ("blameworthy,"
function in the utterance. Our ability to infer or noun-adjective; "over-
anxious," adverb-adjective),
guess meanings can only be enhanced through compound pronouns
extensive morphological analysis. ("myself'), compound verbs ("overcome," adverb-
verb; "daresay," verb-verb), and compound partici-
(iv) Derivational information
Derivational affixes can affect ples
word such as "airborne." Back formation is a special
meaning,
type of composition in which the second element
often in a systematic way, for example "-esque,"
denotes an agent or action, for example "house-
like a; "non-," negation. A word like "booklet"
keeper." Word formation also gives repetition
can be satisfactorily understood as "little book"
compounds ("goody-goody," "fifty-fifty"). Neverthe-
through morphological analysis without explicitly
less, most composite words yield little useful affix
storing both words.
information except where there is overlap with
derivation.
Figure 1 illustrates word formation as either (i)
composition-the formation of a word Inflexion
by and
the derivation
close are related, but while
combination of two or more elements each of which inflexion modifies a word (book-books-book's),
is also a separate word, e.g., "goldsmith"; or (ii)derivation can result in the formation of a different
derivation/inflexion-formation by the close combi-word (kind-kindness).1 Prefixes in English are always
nation of two or more elements only one of which derivational. Suffixes may be derivational or in-
can be a separate word, e.g., "kindness." flexional, so we must distinguish between the two
Word formation by composition generates kinds.
compound words. There are compound nouns The computer program which performs morpho-

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 237

argument: the English word to be analyzed. STEM


logical analysis treats prefix-root-suffix relationships.
Affixes are classified into two groups-living returns
or dead. a list containing four elements: (1) the
Examples of the dead group include: for-, original
"forgive,"word; (2) the prefixes; (3) the root form; and
"forget"; with-, "withhold"; and -ant, "servant";
(4) a list -le,
of suffixes. Whenever the analysis yields a
"handle." The program is restricted to living rootaffixes.
form from the original word with no prefix or
There are approximately 75 English prefixes. suffix, the word NIL appears, corresponding to the
Most
of the living ones amongst these are analyzed and fourth argument. In the bicolumnar
second and/or
form below,
include the following: a-, ante-, anti-, arch-, auto-, be-, the left column entries show STEM
routine
bi-, co-, counter-, de-, dis-, em-, en-, ex-, extra-, invocations and entries in the right show the
fore-,
hyper-, in-, inter-, mal-, mis-, non-, post-, result
pre-,of the analysis.
pro-,
re-, semi-, sub-, super-, trans-, ultra-, and un-. Typically
Since words have multiple meanings and
some living prefixes occur infrequently,multiple
they areforms. As humans, we have a remarkable
ability
stored as separate lexical items along with root correctly to recognize and parse contiguous
words.
Usually the morphological analyzer can wordseasily
in utterances. It is desirable, in the interpretive
remove prefixes leaving just the root plus phase
suffixes.
of understanding, to identify the functions and
meanings of
For this reason, the major part of the program is words. Morphological analysis can aid
devoted to the analysis of derivational and inflexional
this phase. For example, consider the multiple forms
suffixes. Suffix analysis can help determine the item "drink." If the form "drinking"
of the lexical
appears
word's part of speech. For example, a word in an utterance, it can be regarded as a
with
suffix "-est" can be an adjective, one with "-ist"
participle, a noun, or an adjective. The form "drink-
would be a noun. Furthermore, we can determine
ings" canthe
only be regarded as a noun. Listing 2 shows
part of speech by examining the root, how
since
thein-
interpretive phase uses a morphological
flexional suffixes do not change the part ofanalyzer to identify a word's function. The lexical
speech.
Inflexional suffixes usually come last in a entry
word for
and"drink" appears in Figure A.2 of Appendix
A. The CLASS routine (see Appendix B) extracts
do not "pile up." Since an analysis of regularly
inflected words would always yield the rootonly plusthe relevant portion of the lexical entry based
the
inflexional suffix then inflected forms need not be on the morphology of its argument, i.e., some form
stored separately. Winograd (1972) has shown this forof "drink" in this example.
many cases. On the other hand, roots with deriva- Learning the meanings of new or unfamiliar words
tional suffixes are not always related to one particularis not considered at this time, although information
syntactic class, e.g., "-ful," adjective - "forgetful," obtained through morphological analysis would be
noun - "handful." Derivational suffixes can "pile valuable in doing so. It is simply more economical in
up," as in "fertilizers" whose two suffixes "-ize" andterms of processing time to store unfamiliar words or
"-er," do not close the word, followed by themorphological rarities explicitly than to include
inflexional suffix "-s" which does.2 Because of this programs for their analysis.
phenomenon, when suffix removal does not expose
the root form, the analysis must procede recursively 3. The Organization of the Lexicon
and must take into account the entire derivation up By way of introduction, let me point out that the
next two subsections are distinguished for ex-
to that point of analysis. This is necessary to analyze
words like "relationship" or "patronizingly." The
planatory purposes only and there is no intent to
suffixes in the program include: -able, -ible, -age, -al, dichotomize syntax and semantics. It is a rather
-ance, -ation, -cy, -dom, -ed, -en, -er, -est, -ful, -ing, touchy issue to decide whether some of the features
and category items in the next section are syntactic at
-ish, -ist, -ity, -ize, -less, -like, -ly, -ment, -ness, -ous, -s,
-'s, -s', -ship, -some, -ster, -ward, -way, -wise. This list all.
is far from exhaustive.
The basic algorithm for performing morphological 3.1 Lexical Categories and Syntactic Features.
analysis is given in Cercone (1974). Listing 1 is Lexical
a items are items of vocabulary; usually, but
sample output of that analyzer operating withnot a necessarily, words. Traditionally (the Aristotelian
view), they are said to have both lexical (material)
lexical structure constructed according to the syntax
rules given in Table 4. The algorithm was imple- and grammatical (formal) meaning. This distinction
mented using a converted MACLISP modified to run
between meanings can best be expressed in terms of
on an IBM 360/67 under the Michigan Terminal
open or closed classes (sets of alternatives) as ex-
System [MTS]. The STEM routine takes one plained below.

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
238 NICK CERCONE

# R NEW:MACLISP
# 21:50.18
=(RESTORE 'CHKPT) =NIL
=(STEM 'INTEREST) =(INTEREST NIL INTEREST NIL)
=(STEM 'INTERESTED) =(INTERESTED NIL INTEREST (ED) )
=(STEM 'INTERESTING) =(INTERESTING NIL INTEREST (ING) )
=(STEM 'INTERESTINGLY) =(INTERESTINGLY NIL INTEREST (ING LY) )
=(STEM 'BASHES) =(BASHES NIL BASH (S) )
=(STEM 'BATHES) =(BATHES NIL BATHE (S) )
=(STEM 'BATHS) =(BATHS NIL BATH (S) )
=(STEM 'LEANING) =(LEANING NIL LEAN (ING) )
=(STEM 'LEAVING) =(LEAVING NIL LEAVE (ING))
=(STEM 'DENTED) =(DENTED NIL DENT (ED) )
=(STEM 'DANCED) =(DANCED NIL DANCE (ED) )
=(STEM 'KISSES) =(KISSES NIL KISS (S) )
=(STEM 'CURVED) =(CURVED NIL CURVE (ED) )
=(STEM 'CURLED) =(CURLED NIL CURL (ED) )
=(STEM 'ROTTING) =(ROTTING NIL ROT (ING) )
=(STEM 'ROLLING) =(ROLLING NIL ROLL (ING) )
=(STEM 'PLAYED) =(PLAYED NIL PLAY (ED))
=(STEM 'PLIED) =(PLIED NIL PLY (ED) )
=(STEM 'REALEST) =(REALEST NIL REAL (EST) )
=(STEM 'PALEST) =(PALEST NIL PALE (EST) )
=(STEM 'KNIVES) =(KNIVES NIL KNIFE (S) )
=(STEM 'PRETTILY) =(PRETTILY NIL PRETTY (LY))
=(STEM 'NOBLY) =(NOBLY NIL NOBLE (LY) )
=(STEM 'PATRONIZINGLY) =(PATRONIZINGLY NIL PATRON (IZE ING LY) )
=(STEM 'RELIABLE) =(RELIABLE NIL RELY (ABLE) )
=(STEM 'ACCESSIBLE) =(ACCESSIBLE NIL ACCESS (IBLE) )
=(STEM 'ACREAGE) =(ACREAGE NIL ACRE (AGE) )
=(STEM 'M ILAGE) =(MILAGE NIL MILE (AGE)
=(STEM 'STOPPAGE) =(STOPPAGE NIL STOP (AGE) )
=(STEM 'CULTURAL) =(CULTURAL NIL CULTURE (AL) )
=(STEM 'RIDDANCE) =(RIDDANCE NIL RID (ANCE) )
=(STEM 'OPERATION) =(OPERATION NIL OPERATE (ATION) )
=(STEM 'STARVATION) =(STARVATION NIL STARVE (ATION) )
=(STEM 'ACCURACY) =(ACCURACY NIL ACCURATE (CY) )
=(STEM 'CONSTANCY) =(CONSTANCY NIL CONSTANT (CY) )
=(STEM 'CAPTAINCY) =(CAPTAINCY NIL CAPTAIN (CY) )
=(STEM 'DUKEDOM) =(DUKEDOM NIL DUKE (DOM) )
=(STEM 'HANDFUL) =(HANDFUL NIL HAND (FUL) )
=(STEM 'PATRIOTISM) =(PATRIOTISM NIL PATRIOT (ISM) )
=(STEM 'SOCIALIST) =(SOCIALIST NIL SOCIAL (IST) )
=(STEM 'VISIBILITY) =(VISIBILITY NIL VISIBLE (ITY) )
=(STEM 'SENTIMENTALITY) =(SENTIMENTALITY NIL SENTIMENT (AL ITY) )
=(STEM 'CIVILIZE) =(CIVILIZE NIL CIVIL (IZE) )
=(STEM 'PENNILESS) =(PENNILESS NIL PENNY (LESS) )
=(STEM 'RESTLESS) =(RESTLESS NIL REST (LESS) )
=(STEM 'CHILDLIKE) =(CHILDLIKE NIL CHILD (LIKE) )
=(STEM 'ARGUMENT) =(ARGUMENT NIL ARGUE (MENT) )
=(STEM 'SHIPMENT) =(SHIPMENT NIL SHIPMENT NIL)
=(STEM 'DRUNKENNESS) =(DRUNKENNESS NIL DRUNK (EN NESS) )
=(STEM 'GOODNESS) =(GOODNESS NIL GOOD (NESS) )
=(STEM 'WICKEDNESS) =(WICKEDNESS NIL WICKED (NESS) )
=(STEM 'NERVOUS) =(NERVOUS NIL NERVE (OUS) )
=(STEM 'ASLEEP) =(ASLEEP A SLEEP NIL)
=(STEM 'ANTEROOM) =(ANTEROOM ANTE ROOM NIL)
=(STEM 'ANTICHRIST) =(ANTICHRIST ANTI CHRIST NIL)
=(STEM 'ARCHIBISHOP) =(ARCHBISHOP ARCH BISHOP NIL)
=(STEM 'AUTOBIOGRAPHY) =(AUTOBIOGRAPHY AUTO BIOGRAPHY NIL)
=(STEM 'BEMOAN) =(BEMOAN BE MOAN NIL)
=(STEM 'BIANNUAL) =(BIANNUAL BI ANNUAL NIL)
=(STEM 'COUNTERACT) =(COUNTERACT COUNTER ACT NIL)

Listing 1. Output from morphological analysis

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE DISCOURSE 239

(Listing 1-continued)
=(STEM 'DECODE) =(DECODE DE CODE NIL)
=(STEM 'ENDANGER) =(ENDANGER EN DANGER NIL)
=(STEM 'EMBED) =(EMBED NIL EMBED NIL)
=(STEM 'HYPERACTIVE) =(HYPERACTIVE HYPER ACT (IVE) )
=(STEM 'IMMORAL) =(IMMORAL IM MORAL NIL)
=(STEM 'INTERRELATIONSHIP) =(INTERRELATIONSHIP INTER RELATE (ATION SHIP) )
=(STEM 'MISCONDUCT) =(MISCONDUCT MIS CONDUCT NIL)
=(STEM 'NONSTOP) =(NONSTOP NON STOP NIL)
=(STEM 'POSTWAR) =(POSTWAR POST WAR NIL)
=(STEM 'RECONSIDER) =(RECONSIDER RE CONSIDER NIL)
=(STEM 'SUBAWARENESS) =(SUBAWARENESS SUB AWARE (NESS))
=(STEM 'SUPERMARKET) =(SUPERMARKET SUPER MARKET (NIL)
=(STEM 'ULTRACONSERVATION) =(ULTRACONSERVATION ULTRA CONSERVE (ATION) )
=(STEM 'UNNECESSARY) =(UNNECESSARY UN NECESSARY NIL)
=(STEM 'UNREST) =(UNREST UN REST NIL)
=(STEM 'COEDUCATION) =(COEDUCATION CO EDUCATE (ATION) )
=(STEM 'COOPERATIONAL) =(COOPERATIONAL CO OPERATE (ATION AL) )
=(STEM 'DEHUMANIZE) =(DEHUMANIZE DE HUMAN (IZE) )
=(STEM 'INEQUALITY) =(INEQUALITY IN EQUAL (ITY) )
=(STEM 'REELIGIBILITY) =(REELIGIBILITY RE ELIGIBLE (ITY) )
=(STEM 'LOUDLY) =(LOUDLY NIL LOUD (LY) )
=(STEM 'LENGTHWAYS) =(LENGTHWAYS NIL LENGTH (WAY S) )
=(STEM 'HOMEWARDS) =(HOMEWARDS NIL HOME WARD S) )
=(STEM 'NON LOUDLY) =(NONLOUDLY NON LOUD (LY) )
=(STEM 'BEER) =(BEER NIL BEER NIL)
=(STEM 'MURDER) =(MURDER NIL MURDER NIL)
=(STEM 'OTHER) =(OTHER NIL OTHER NIL)
=(STEM 'ARABESQUE) =(ARABESQUE NIL ARAB (ESQUE) )
=(STEM 'REALIZE) =(REALIZE NIL REAL (IZE) )
=(STEM 'GROTESQUE) =(GROTESQUE NIL GROTESQUE NIL)
=(STEM 'NONAGENARIAN) =(NONAGENARIAN NIL NONAGENARIAN NIL)
=(STEM 'NONALIGNMENT) =(NONALIGNMENT NON ALIGN (MENT) )
=(MTS)

Lexical categories are properties associated with research have been classified according to class and
lexical items used in parsing. Through categories, thefeature (features shown in Tables 2 and 3). Some of
representation of an appropriate lexical item can the be decisions made in this classification were
selected from the lexicon. Categories are identified arbitrary, especially those pertaining to whether a
with classes (or parts of speech) rather than ex- word or group of words should form a new class or
pressions of a particular syntagmatic relation among given a new feature within a class. However, th
classification scheme is used to aid, not constrai
items in an utterance. Generally, classes are separated
parsing; detailed concern about this type of arb
into open and closed classes. Characteristically, closed
classes have a strictly limited membership which trariness is unwarranted. Please note that the proble
cannot be increased by adding new formations with or the use of coordinators, which can link word
loanwords (words which have been incorporated by phrases, clauses, or sentences, has not been addresse
one language from another language). The signifi- Any adequate solution to this problem might entai
cance of closed-class items is best expressed by theirsubstantial changes in the interpretation or assign
grammatical function. In contrast, open classes have ment
a of closed categories (and syntactic features).
large, flexibly increasing membership. The meaning of The syntactic features which can be attached to
open class words is best expressed through synonyms. the various lexical items are shown in Table 2 (open
The difference between the classes representscategory)
a and Table 3 (closed category) and
mixture of criteria, both statistical (the number of
explained in detail below. These features are neces-
forms in a class), and diachronic (concern with thesary to insure formal agreement in person, number, o
way in which language changes over time). tense between two or more lexical items, or parts
The lexical categories for English shown in Tablesentences.
1
adapt categories used by Woods et al. (1972) andMost of these terms are used in their ordinary
Winograd (1972). The lexical items used in this
sense. The special labels are as follows: PERS

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
240 NICK CERCONE

# R NEW:MACLISP
# 22:39.27
(RESTORE 'CHKPT)
= NIL

= (CLASS 'DRINK)
(N (NS ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4))) A (PRES
( (0 0) (/*DRINK1 P1 P2)) ((0 0) (/*DRINK1A P1 P2))
( (0 0) (/*DRINK3 P1 P2) )))

= (CLASS 'DRINKS)
= (N (NP ( (0 0) (/*DRINK2)) ((0 0) (/*DRINK4) )) A (PRES
= TPS ((0 0) (/*DRINK1 P1 P2) ) ((0 0) (/*DRINK1A P1 P2)
= ) ((0 0) (/*DRINK3 P1 P2))))

= (CLASS 'DRINKER)
= (N (PERS ((0 0) (/*DRINK5) ))

(CLASS 'DRINKERS)
= (N (NP PERS ( (0 0) (/*DRINK5)))

= (CLASS 'DRINKING)
= (N (NS ((0 0) (/*DRINK6))) A (PART ((0 0) (/*DRINK1
= P1 P2)) ((0 0) (/*DRINK1A P1 P2) ) ((0 0) (/*DRINK3
= P1 P2) )) NM (ADJ CLASF ((0 0) (/*DRINK2 P1 P2) ) ( (0
= 0) (/*DRINK4 P1 P2)) ((0 0) (/*DRINK5 P1 P2)) ((0
0) (/*DRINK6 P1 P2) )))

= (CLASS 'DRINKINGS)
= (N (NP NS ( (0 0) (/*DRINK6))))

= (CLASS 'DRINKABLE)
(N (NS ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4) )) NM (ADJ
= CLASF ((0 0) (/*DRINK2 P1 P2)) ((0 0) (/*DRINK4 P1
= P2) ) ((0 0) (/*DRINK5 P1 P2)) ((0 0) (/*DRINK6 P1 P2)

= (CLASS 'DRINKABLES)
= (N (NP NS 9 (0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))

= (CLASS 'DRINKWISE)
(AM (AT3 ( (0 0) (/*DRINK1 KIND)) ((0 0) (/*DRINK 1A
S KIND))))
(CLASS 'DRINKLIKE)
(NM (ADJ ( (0 0) (/*DRINK2 P1 P2)) ((0 0) (/*DRINK4
P1 P2)) ((0 0) (/*DRINK5 P1 P2)) ((0 0) (/*DRINK6 P1
P2))))

(CLASS 'DRINKETTE)
(N (DIM ( (0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))
(CLASS 'DRINKETTES)
(N (NP DIM ( (0 0) (/*DRINK2) ) ( (0 0) (/*DRINK4) )))

(CLASS 'DRINKIE)
(N (DIM ((0 0) (/*DRINK2)) ((0 0) (/*DRINK4))))

S(MTS)

Listing 2. Relevant selection from lexical entry

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 241

OPEN CATEGORIES

N ...... nominal, typically either a noun (man, airplane, city) or a proper noun (John, Canada)
A ...... action, typically a verb (walk, throw, fly)
NM ..... nominal modifier, typically an adjective (tall, happy)
AM ..... action modifier, typically an adverb (quickly, suddenly)

CLOSED CATEGORIES

CONJ = conjunction (and, or, but)


BIND = binder (before, if)
PREP = preposition (to, for, over)
PRO = pronoun (I, you, they)
DET = determiner (the, a, those)
ORD = ordinal (first, second, last, final)
NEG = negative (not)
COMP = comparative (more, less, greater)
OP = operation (plus, times)
QWORD = question nominal (who, what, why)
QNTFR = quantifier (some, any, none)
PRT = particle (knock "out")
NUM = number (one, two, three)
INTJ = interjection (oh)

Table 1. Lexical categories

predicate
indicates a personal nominal (e.g., employee); DIM in the following way: In "John w
ESSENTIALLY decent," "essentially" accentuates
indicates a diminutive (e.g., booklet). POSS indicates
those aspects
possession, as in "John's." The time features differ if of decency which are most crucial to i
possession
the nominal is a time word (e.g., day, year-TIME) or and de-emphasizes those features whi
are less crucial. The features AT11 and AT22 are
indicates a relative time (e.g., yesterday-FTIME).
The feature AUX indicates an auxiliary (i.e., similar
a verb to AT1 and AT2. They indicate more context
form used in forming the tenses, moods, and dependence;
voices of the effect of AM's with these features is
other verbs). Included in the auxiliaries are partially
the determined by their proximity to the verb
features BE, DO, HAVE, WILL, and MODAL,3they whichmodify, for example the adverb "slightly" as
help determine constituents of action phrases.well as some sentence adverbials. The feature AT3
applies to predicate limiting adverbs such as "emo-
Classifiers may also be nominals, as in Winograd's
(1972) example, "water meter cover adjustmenttionally" and "healthwise." Manner adverbs, e.g.,
screw." The type features attached to AM's quickly,
arequietly, etc., have the feature AT4 attached.
Whenever a word acts as an adverb of degree it is
specific to adverbial modifiers. In part, adverbial
given the feature AT5, as in "I was DEAD tired."
modifiers are based on Zadeh's (1972) work, especial-
ly the adverbial modifiers with features "AT I" andAT6 is the applicable feature for modal
Finally
"AT2." Features classify adverbs accordingadverbs,
to how such as "certainly," and "possibly."
they operate in an utterance. The feature AT1 is
3.2 Meaning
attached to adverbs that act on single fuzzy sets as in Representations for Word Concepts.
"John was VERY decent"4 where "very" raises Associated
the with open-class category words are
meaning
criteria of all aspects that contribute to decency. Therepresentations: one for each sense of the
feature AT2 applies when the adverb operates word. on
Theastructure of a meaning representation is

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
242 NICK CERCONE

OPEN CATEGORIES

N's

NS .............singular TIME........... time

NP .............plural FTIME.......... functional time


COLL........... collective PERS........... personal
POSS ........... possessive DIM ............ diminutive
A's

AUX .......... auxiliary BE ............ be


WILL ........... future
DO .............do
HAVE .......... have MODAL. ........ modality
TRANS ......... transitive ITRNS .......... intransitive

PART........... participle IREG ........... irregular


PRES........... present PAST........... past
INF ............ infinitive TPS ............ 3rd person
NM's

ADJ ............ adjective


COM .......... comparative
SUP............ superlative
CLASF.......... classifier

AM's

AT1 ........... adverb type one


AT11 ...........adverb type one one
AT2 ........... adverb type two
AT22........... adverb type two two
AT3 ........... adverb type three
AT4 ........ .. .adverb type four
AT5 ........... adverb type five
AT6 ........... adverb type six
AAA ........... adverb modifying another adverb or adjective
AA ............. adverb modifying an action
AP ............. adverb modifying a preposition or prepositional phrase

ADT ............ adverb specifying definite time


AIT ............ adverb specifying indefinite time
AL ........... adverb specifying location
AJ .............adverbial adjunct

Table 2. Syntactic features (open categories)

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 243

CONJ BIND PREP NUM

ORD OP PRT NEG

COMP QWORD INTJ

PRO's

NP............. plural COLL .......... collective

NS ............. singular POSS ........... possessive


REL............ relative PERS ...........personal
DEF ............ definite
DEM ........... demonstrative
INDEF.......... indefinite SUB............ subject
OBJ............ object

DET's
DEF ............ definite INDEF .......... indefinite

NP .............plural COLL ......... collective

NS............. singular DEM........... demonstrative

QDET .......... question


QNTFR's
NE .............negative NS ............ singular
NONUM ........ no number NP............. plural
COLL .......... collective

Table 3. Syntactic features (closed categories)

based on the semantic network notationsubsumes not only differing senses of "drink," but
developed
also other in-
by Schubert (1974). Pragmatic and semantic more specific concepts as well, like
"eating" or "receiving an enema." This observation
formation are included in a meaning representation
for words. has led to the following consideration.
Figures 2 through 7 show networks that illustrate When creating the meaning representations (as
some of the main senses of the word "drink," extended semantic networks) for concepts, it is
concentrating on its action aspects. For illustrative desirable to avoid the duplication of propositions in
purposes Figures 2, 4 and 7 are divided into astorage. If we extract more general concepts from the
pragmatic section and a semantic section. The specific concepts that they subsume (totally or in
pragmatic section includes the template(s) that guides part), we can avoid duplication by associating the
the parse of the utterance and two lists: the firstcommon propositions with the more general concept.
contains propositions that represent the implicationsIn a sense the work of both Schank (1972) and Wilks
that are likely to be needed for the comprehension of(1973) support the contention that the meaning of a
subsequent text; and the second contains proposi-concept is best represented by predications at the
tions representing critical implications that we expect highest level of generality that adequately explain the
to match in the surface structure. In Figure 2 this term's meaning. Thus we extract from "drinking"
first list is (P3) and the second list is (P1,P2). The (and "eating," etc.) the structure shown in Figure 8.
semantic section contains the network that representsWe might reasonably label the concept expressed by
the meaning of the word sense. Figures 3, 5, and 6 this structure "ingest." It is important to note,
show various nominal senses of the word "drink." however, that while Schank and Wilks might conclude
Notice that Figures 2, 4, and 7 all have the notion that "ingesting" is a primitive action, I consider it a
of "change in containment location" in common. general concept. This applies to all primitive actions
This corresponds to a "general concept" that put forward of Schank and Wilks. Examination of

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
244 NICK CERCONE

P3: in

P4: mouth-of

S P1: anim xP5S: < ? >[


drink 1
P6: cause

I P10: then
(P3) DIR
(P1,P2)
P2: liquid yP7: moving [
P11: cause

P:8 then

P9: location [
PRAGMATICS SEMANTICS

Figure 2.
"(John) 'drinks' (water)"
"(Mary) 'drinks' (prune juice)"

Figure 8 shows clearly that ingesting is "not a in the text and are most frequently needed for
primitive action" but one whose meaning is expressed comprehension. At times, however, other proposi-
in terms of causes, motion, time, and other concepts. tions may be required for comprehension. The word
At this point the original representations for the sense illustrated in Figure 2 shows that we expect, in
various action senses of "drink," i.e., Figures 2, 4, an utterance about drinking, an anim(x) and a
and 7, can be replaced with more simplified diagrams liquid(y), propositions P1 and P2. But the question
based on the general concept "ingest" (Figure 8). In can be posed, "What is the effect of John's drink-
similar fashion Figure 10 diagrams one meaning of ing?" To answer this question entails a further
"eating," again based on the general concept investigation of other propositions in the network,
"ingest." especially the first list of implications. Although it is
The key to effective use of the meaning repre- implicit in the semantic structure, we make explicit in
sentation for comprehension centers on developing the pragmatic structure the inference that "x - drink -
propositions with arguments that we expect to match y" necessarily implies that it causes y's location to be
in the surface utterance. The lexical item for "drink" "in" x at some time after x initiates the drinking
would contain, among other things, pointers to a list action. Of course, since this implication is common to
of the arguments that we expect to match with words all senses of "drink" (and eats, inhales, etc.) it is

drinker

Figure 3.
"(John is a) 'drinker' "

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 245

P3: in

P4: mouth-of

P1: anim x P5: ( ? )>


drinkla O I
P6: cause

P:10 then DIR


(P3,P14)
(P1,P2)
P2: liquor Y P7: moving

P11: cause

{ P8: then

P9: location

P12: cause

P13: before

P14: inebriated [ ]

PRAGMATICS SEMANTICS

Figure 4.
"(John) 'drinks' (whiskey)"
"(John) 'drinks' "
"(Mary has a) 'drinking' (problem)"
"(Mary) 'drinks' (a lot)"

x -liquor
body-of-water drink2

Figure 5. Figure 6.
"(Throw John into the) 'drink' " "(John is drinking a) 'drink'"

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
246 NICK CERCONE

P3: in

P4: thru-part

P1: inanim-- x -- P5: ( ?>


( -- drink3> P12: cause

I P10: then DIR


(P3,P6)
(P1,P2)
P2: liquid-- > P7: moving [

P11: cause
I P8: then

P6: excessive P9: location [


amount

PRAGMATICS SEMANTICS

Figure 7.
"(My car) 'drinks' (gasoline)"
"(The donut) 'drinks' (coffee)"

P3: in

P4: thru-part

x - PS: < ?
WHO WHAT

P10: then
THRU DIR

y P7: moving
(P3)
P11: cause

p:8 then

P9: location .
PRAGMATICS SEMANTICS

Figure 8.
"ingest"

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 247

Q- drink1 .3 anim - x - mouth-of


nil WHO

(P1,P2)
THRU

ingest )

~WHAT

liquid>

PRAGMATICS SEMANTICS
Figure 9.
"(John) 'drinks' (water)"

-eat1- anim mouth-of


nil

(P1,P2)
THRU

ingest O
WHAT

food y

PRAGMATICS SEMANTICS
Figure 10.
"(John) 'eats' (cake)"

PRED B C

anim P1 SP4

A PRED

x mouth-of

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
248 NICK CERCONE

abstracted inferencesthe
into part of the representation
same of a concept. gener
well, as shownThe inpropositions, for example P1 and P4 shown
Figure 8. in
. The semanticFigurestructure
2 are, in turn, represented as shown in Figure
for "d
as properties 11. See Appendix A for sample lexicalto
attached entries, ineach
properties include ARGS,
particular the entry for "drink." the
ments in the wordMany advantages accruesense;
by representing meaning IMPL
tions; the formulas in this way. First, unlike Wilks' (1973)
propositions P1, P2
arguments to meaning
predicates
formulas, the representation is suggestivethat
of m
explicating thethe meaning
given of a word. I see no justification for
word sens
form (binary) lexical decomposition trees as meaning repre-
sentations for words since such trees neither suggest
argl arg2 .. argi WORD argi+1 ... argn. the type of processing required nor the propositions
they encode.5
The implications make the most commonly used A second and major advantage is that the meaning

(lexical entry) ((root) (meaning *))


(root) root of word given meaning
(meaning) (lexical category)
(category value) (synonym)
(antonym) (compound)
(idiom) (abbrev)

(lexical category) (open category) I


(closed category)
(open category) N i A NM AM
(closed category) CONJ I PREP t ...
(category value) < (root feature list)
(word sense formula *) )*

(synonym) SYN (synonym value) * I (lambda)


(antonym) ANT (antonym value) * i (lambda)
(idiom) ID (idiom value) * I (lambda)
(abbrev) ABB (abbrev value) * I (lambda)

(compound) COMPD (compound value) *


I (lambda)

(synonym value) a synonym for the root


(antonym value) an antonym for the root
(idiom value) the idiom or slang expression involving the root
(abbrev value) abbreviation for the root

(compound value) ( (tree) * )


(tree) ((word) (result) (tree) * )
(result) (word) I (lambda)
(word) any word
(lambda)

(root feature list) ( (morph code)(root feature*)) *


(word sense formula) ( the construction of semantic units and/or
concepts that express the correct word sense 1
a function to be applied )
(morph code) -ING i -ED I ... <(lambda)
(root feature) AT1 : DEF : BIND I ...

Table 4. Syntax for lexical items

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 249

# R NEW:MACLISP
# 20:23.08
(RESTORE 'CHKPT)

= NIL

= (FIND 'SOME OCAT)


NIL

= (FIND 'DRINK OCAT)


(( (N ( (NIL NS) (S NP) (ABLE NS) (ETTE DIM) (IE DIM)
) ( ( (0 0) (/*DRINK2) ) ((0 0) (/*DRINK4) ) ) ( (ING
= NS) ) ( ( (0 0) (/*DRINK6) ) ) ( (ER PERS) (EER PERS)
(IST PERS) ) (((0 0) (/*DRINK5) )) (SYN DRAFT POTATION
BEVERAGE LIQUOR) (ID BOOZE HOOCH MOONSHINE) ) (A
( (NIL PRES) (S PRES TPS) (ING PART)) ( ((0 0) (/*DRINK1
P1 P2) ) ( (0 0) (/*DRINK1A P1 P2) ) ( (0 0) (/*DRINK3
P1 P2) )) (SYN CONSUME SWALLOW IMBIBE GUZZLE TOAST)
(ID SWIG SOP/-UP) ) (AM ( (WISE AT3) (WAYS AT3) )
S ( ( 0) (/*DRINK1 KIND) ) ( (0 (/*DRINK1A KIND)
) ) ) (NM (ABLE ADJ CLASF) (ING ADJ CLASF) (LIKE
ADJ) ) (((0 0) (/*DRINK2 P1 P2) ) ((0 0) (/*DRINK4
= P1 P2) ) ((0 0) (/*DRINK5 P1 P2) ) (0 0) (/*DRINK6
P1 P2))))))

= (FIND 'SEXUAL OCAT)


NIL

= (FIND 'SOME ADVERB)


NIL

= (FIND 'DRINK ADVERB)


NIL

= (FIND 'SEXUAL ADVERB)


= (((AM ( (LY AT3 AAA AA)) (((0 0) (/*SEXUAL1))) ) ) )

(COMBINE OCAT ADVERB '*NIL)


= NIL

= (FIND 'SEXUAL OCAT)


= ( ((AM ( (LY AT3 AAA AA) ) ( ( (O 0) (/*SEXUAL1)) )) ))

= (FIND 'SOME OCAT)


NIL
= (FIND 'DRINK OCAT)
( ( (N ( (NIL NS) (S NP) (ABLE NS) (ETTE DIM) (IE DIM)
= ) ( ( (0 0) (/*DRINK2) ) (/*DRINK4) )) ( (ING
NS) ) ( ( (0 0) (/*DRINK6) ) ) ( (ER PERS) (EER PERS)
(IST PERS) ) ( ( (0 0) (/*DRINK5) ) ) (SYN DRAFT POTATION
= BEVERAGE LIQUOR) (ID BOOZE HOOCH MOONSHINE) ) (A
( (NIL PRES) (S PRES TPS) (ING PART) ) ( ( (0 0) (/*DRINK1
P1 P2) ) ( (0 0) (/*DRINK1A P1 P2) ) ( (0 0) (/*DRINK3
= P1 P2) ) (SYN CONSUME SWALLOW IMBIBE GUZZLE TOAST)
(ID SWIG SOP/-UP) ) (AM ( (WISE AT3) (WAYS AT3) )
S ( (O0 0) (/*DRINK1 KIND) ) ( (0 (/*DRINK1A KIND)
= ) ) (NM ( (ABLE ADJ CLASF) (ING ADJ CLASF) (LIKE
ADJ) ) ( (0 0) (/*DRINK2 P1 P2) ) ((0 0) (/*DRINK4
P1 P2) ) (0 0) (/*DRINK5 P1 P2)) ((0 0) (/*DRINK6
= P1 P2))))))

= (DICTADDOCAT '(SOM E) '*


= '((PRO (INDEF NS NP)) (QNTRF (COLL NS NP NONUM) ) ) )
S NIL

= (FIND 'SOME OCAT)


( ( (PRO (INDEF NSNP)) (QNTRF (COLLNSNP NONUM) ) ) )

= (MTS)
Listing 3. Lexical manipulation and maintenance

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
250 NICK CERCONE

All other
representation for lexical items
a would
wordnot be consideredis n
further. The letter "r" would locate
terms of "primitives." all items that
Rather, e
propositions that
begin withform
"dr" and so on unfil
thethe word isnetw
found. In
word's meaning this way
can the numberbe of searches needed to locate a
represent
manner. In particular theto the
lexical item is directly proportional notion
number of
no more "primitive"
letters and the size of thethan "dri
lexicon. This is easily done
in LISP.
representing word meanings enh
ational schema for comprehensio
This lexical structure was designed for a small
(300 words)
of detail can be in dictionary
theused with an experimental
meaning
program designed
adding propositions to to create
the extended semantic-
networ
network meaning-representations
Third, inference mechanisms, for various ut- h
algorithms, and superimposed
terances. The program, with relatively few heuristics, k
schemas can be creates network structures on the average of usin
incorporated about
three seconds ofas
for word meanings CPU time per sentence (simple
easily as i
sentences, active voiceinformatio
sentation. Incomplete only). This is accomplished
with an "interpretor
be inferred, when necessary, only" LISP system running dire
ing representation, in
under the Michigan Terminal Systemsome
[MTS] on an
argument. This IBM type
360/67 computer. of meaning
Ninety-five percent of this
lexical items isCPU time is devoted to non-lexical
further explainereferencing
For a brief sketch of parsing
operations. Experiments are being designed to gather(w
stated grammar) based
exhaustive statistics to determineon this
the running times
Appendix C. of lexical manipulations (insertion, deletion, search-
ing, etc.) for different size lexicons with various types
of organizations (alphabetic,
4. Formal Specification of letter-frequency
Lexica
Table 4 shows the grammar
oriented, use-frequency oriented, etc.). by w
entered in the dictionary.
Listing 3 is a sample output which shows, in the Th
following order, a Naur
basically the Backus search through dictionaries
Form for
with the addition
lexical items, theof
merging the Klee
of two dictionaries, a
metalinguistic search
characters include
for the same items in the merged version, an
operator *, the
addition form :: =,
to the merged dictionary, and
and finally a
surround phrase-class search for the newly added
nameslexical item. The FIND
wh
entities. The form ::= can be read as "is of the form." routine has two arguments: the first is the word to be
The bar denotes alternation, one form or the other. found and the second specifies the dictionary to be
And the * defines an arbitrarily repeatable (zero or searched.
more) constituent when surrounded by brackets; The algorithms for manipulating and maintaining
otherwise, it defines an arbitrarily repeatable (one or lexical items as shown in Listing 3 are given in
more constituent; e.g., <a*> means zero or more a's, Appendix B.
while a* or <a>)* means one or more a's.
In Appendix A, examples of closed and open
category items are shown as they exist in the lexicon 6. Conclusions
(Cercone, 1975a). They were constructed accordingIn this paper the construction of a lexicon, as well as
to the syntax rules shown in Table 4. the manipulation and maintenance of lexical items,
has been explained. This lexicon has been used in an
5. Lexical Manipulation and Maintenance experimental program, see Cercone (1975a), that was
In order to enable the rapid retrieval of relevant designed to create semantic structures from utter-
lexical information, a scheme was developed that ances for the ultimate purpose of understanding
exploits the way tree structures are stored in LISP. natural language. This lexicon has proved to be
The root form is a binary branching tree that suggestssignificant to the experimental program because of
a search method similar to a binary search. Letters in the ease with which lexical items can be manipulated
the query word serve as an index to a subset of lexical and maintained. Since the lexicon has a uniform
entries which contain the letters in correspondingstructure, the routines which access and manipulate
positions. For example, in "drink," the "d" would be lexical information are relatively simple to under
used to locate the lexical items beginning with "d." stand and use.

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 251

Acknowledgments

I would like to thank Rici Liknaitzky for his ideas and programming aid, especially w
maintenance routines. I am indebted to Len Schubert, Jeff Sampson, and Carol Murchison f
reading and suggestions. I would also like to thank the reviewers, especially Don Ross of the
Minnesota, for their invaluable suggestions and insight. Part of this work was supported by
Research Council of Canada, grant A4309.

Appendix A

The lexicon is organized as a general list structure comprised of many similar structures. The f
list of all root forms beginning with the letter A, the second, those beginning with B, etc.
element a similar list organization is imposed. Each meaning-sense of a word contains
corresponding meaning-representation (a proposition-based semantic network, see secti
Cercone, 1975b). "Drinking," "drinkable," and "drinklike" all have pointers to *DRIN
*DRINK5, and *DRINK6 as possible meanings within the utterance in which they appear. The
sample entries, first from the closed category (Figure Al) and then from the open cate
lexicon.

(B(E(F(O(R(E(*(BIND () (*BEFORE1))
(PREP () (*BEFORE2))
((AM (AIT)) (*BEFORE3))) ))))
(H(I(N(D(*(PREP ()(*BEHIND))) ))))
(L(O(W(*(PREP () (*BELOW1))
(AM ((AP AA)) (*BE LOW2))) )))
(N(E(A(T(H(*(PREP () (*BENEATH1))
(AM ((AP AA)) (*BENEATH2))) ))))
(S(I(D(E(* (PREP () (*BESIDE))) )))))
(O(T(H(*(QNTRF ((NP COLL)) (*BOTH1))
(PRO ((INDEF)) (*BOTH2)) (AM () (*BOTH3))) )))
(U(T(* (BIND () (*BUT1)) (AM () (*BUT2))) ))
(Y(* (PREP () (*BY1)) (PRT () (*BY2))) ))
(D(O(W(N(*(PREP ()(*DOWN1)) (PRT ()(*DOWN2))))))
(E(A(C(H(* (QNTFR ((NS)) (*EACH1)))))
(PRO ((INDEF NS COLL)) (*EACH2))) )))
(I(T(H(E(R(*(QNTRF ((NS NP)) (*EITHER1))
(PRO ((INDEF NS NP)) (*EITHER2))) ))))
(G(H(T(*(QNTRF ((NP)) (*EIGHT1)) (NUM () (*EIGHT2)))
(H(*(ORD () (*EIGHTH))) )))))
(L(S(E(* (AM () (*ELSE))) )))
(V(E(R(Y(* (QNTFR ((NS)) (*EVERY)))
(O(N(E(* (PRO ((INDEF)) (*EVERYONE))) )))
(T(H(I)N)G)* (PRO ((INDEF NS)) (*EVERYTHING)))))))))))
(X(C(E(P(T(*(PREP () (*EXCEPT)) (CONJ () (*EXCEPT2)))))))))
(F(E(W(* (QNTRF ((NONUM NP COLL)) (*FEW)))
(E(R(*(QNTRF -((NONUM NP COLL)) (*FEW))) ))))
(I(F(T(H(*(ORD ()(*FIFTH))) )))
(R(S(T(*(ORD () (*FIRST))) )))
(V(E(* (QNTFR ((NP)) (*FIVE1)) (NUM () (*FIVE2))))))
(O(R(* (PREP () (*FOR1)) (CONJ () (*FOR2))))
(U(R(*(QNTRF ((NP)) (*FOUR1)) (NUM () (*FOUR2)))
(T(H(*(ORD () (*FOURTH))))))))
(R(O(M(*(PREP () (*FROM))) ))))

Figure A.1. Closed category lexical entries

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
252 NICK CERCONE

(D(R(A(N(K(*(A ((NIL IREG PAST))


(((O O) (*DRINK1 P1 P2))
((O O) (*DRINK1A P1 P2))
((O O) (*DRINK3 P1 P2))) ) ))))
(I(N(K(*(N ((NIL NS)(S NP)(ABLE NS)(ETTE DIM)(IE DIM))
(((O O) (*DRINK2))
((O O) (*DRINK4)))
((ING NS))
(((O O) (*DRINK6)))
((ER PERS)(EER PERS)(IST PERS))
(((O O) (*DRINK5)))
(SYN DRAFT POTATION BEVERAGE LIQUOR)
(ID BOOZE HOOCH MOONSHINE) )
(A ((NIL PRES)(S PRES TPS)(ING PART))
(((O O) (*DRINK1 P1 P2))
((O O) (*DRINK1A P1 P2))
((O O) (*DRINK3 P1 P2)))
(SYN CONSUME SWALLOW IMBIBE GUZZLE TOAST)
(ID SWIG SOP-UP) )
(AM ((WISE AT3)(WAYS AT3))
(((O O( (*DRINK1 KIND))
((0 0) (*DRINK1A KIND))) )
(NM ((ABLE ADJ CLASF)(ING ADJ CLASF)(LIKE ADJ))
(((O O) (*DRINK2 P1 P2))
((O O) (*DRINK4 P1 P2))
((O O) (*DRINK5 P1 P2))
((O O) (*DRINK6 P1 P2))) ) ))))
(U(N(K(*(N ((NIL NS)(S NP)) (((O O) (*DRUNK1))))
(A ((NIL IREG PART))
((( O) (*DRINK1 P1 P2))
((0 O) (*DRINK1A P1 P2))
((O O) (*DRINK3 P1 P2))) ) ))))))
(E(A(T(*(N ((S NP)(IE DIM))
(((O O) (*EAT3)))
((ER PERS)(EER PERS))
(((O O) (*EAT3)))
(SYN FOOD)
(ID MUNCHIES GRUB FULLERS GRUMBLIES))
(A ((NIL PRES)(S PRES TPS)(ING PART))
(((O O) (*EAT1 P1 P2))
((O O) (*EAT2 P1 P2)))
(SYN CONSUME DEVOUR FEED FARE ERODE WEAR)
(ID GOBBLE) )
(NM ((ABLE ADJ CLASF)(ING ADJ CLASF))
(((O O) (*EAT3))) ) )
(E(N(* (A ((NIL IREG PART))
(((O O) (*EAT1 P1 P2))
((O O) (*EAT2 P1 P2))) ) )))))

Figure A.2. Open category lexical entries

Appendix B

Algorithms for the routines shown in Listing 2 and Listing 3 are presented as a brief description follow
its LISP code.6
The CLASS routine has one argument-the word to be classified. If the word does not appear as a lexi
entry, the message "I do not know the word" appears followed by the word. If the word appears
closed-category lexicon, the entry's meaning, as appearing in CLOSCAT, is returned. Otherwise the rel
portions of the entry in the open category lexicon (OCAT) are extracted and returned (using the routi
FINDER), based on morphology.

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 253

(DEFPROP CLASS
(LAMBDA (WORD)
(PROG (W S A SL LEX)
(SETQ W (STEM WORD))
(RETURN
(COND
((NOW W) (PRINT '(I DO NOT KNOW THE WORD)) (PRINT WORD))
((SETQO A (CHK (EXPLODC (CADDR W)) CLOSCAT)) (CAAR A))
(T (SETQ A (CAR (SETQ SL (REVERSE (CADDDR W)))))
(SETO LEX (CAR (CHK (EXPLODC (CADDR W)) OCAT)))
(COND
(SL (COND
((NOT (EQ A 'S))
(MAPCAN '(LAMBDA (X) (FINDER X A)) LES))
((CDR SL)
(SETO A (FINDER (CAR LEX) (CADR SL)))
(LIST (CAR A)(CONS 'NP (CAR (CDR A)))) )
(T (MAPCAN '(LAMBDA (X) (FINDER X A)) LEX)) ))
(T (MAPCAN '(LAMBDA (X) (FINDER X NIL)) LEX)))))) ))
EXPR)

(DEFPROP FINDER
(LAMBDA (CAT SUF
(PROG (ANS X)
(DO I (CDR CAT) (CDDR I)
(OR (NULL!) (EQ 'SYN (CAAR I)) )
(COND ((SETQ X (DO J (CAR I) (CDR J)
(OR (NULL J) (EQ SUF (CARR J)))))
(SETO ANS (APPEND (APPEND (CDAR X) (CADR I)) ANS)))))
(RETURN (COND (ANS (LIST (CAR CAT) ANS)))) ))
EXPR)

The FIND routine has two arguments: the first is the word and the second is the dictionary in which to
search. The searching algorithm has been described in Section 5.

(DEFPROP FIND
(LAMBDA (W D)
(PROG ()
(COND
(D (COND
(W (DO J D (CDR J) (NULL J)
(COND ((EQ (CAR W) (CAAR J))
(RETURN (CHK (CDR W) (CDAR J)))) )))
((EQ (CAAR D) '*) (RETURN (LIST (CDAR D))))
(T (RETURN NIL))) )) ))
EXPR)

To add items to an existing lexicon, the routines DICTADD, ADD, and MUNG are used. DICTADD has
four arguments. The first specifies the lexicon to which the second argument is to be added. The third specifies
the flag, i.e., the character to be inserted after the letters of the lexical entry to designate the end of the root
and the beginning of the meaning field; the fourth argument specifies the meaning field. ADD is invoked from
DICTADD and does the actual addition by searching for the proper position and invoking MUNG to ready the
item for the addition. MUNG is invoked from ADD to consolidate the parts of the lexical entry that become
the single lexicon entry.

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
254 NICK CERCONE

(DEFPROP DICTADD
(LAMBDA (DICT WORD FLAG PROP)
(PROG (TDICT)
(SETQ TDICT (CONS NIL DICT))
(ADD TDICT WORD FLAG PROP)
(RETURN (CDR TDICT))))
EXPR)

(DE FPROP ADD


(LAMBDA (DICT WORD FLAG PROP)
(COND ((NULL WORD)
(COND ((EQ FLAG (CAADR DICT))
(NCONC (CADR DICT) PROP))
((RPLACD DICT (CONS (LIST FLAG PROP)
(CDR DICT))))))
((PROG NIL
(DO J (CDR DICT) (CDR J) (NULL J)
(COND ((EQ (CAR WORD) (CAAR J))
(RETURN (ADD (CAR J) (CDR WORD) FLAG PROP))))))
((NCONC DICT (LIST (MUNG WORD FLAG PROP))))))
EXPR)

(DEFPROP MUNG
(LAMBDA (WORD FLAG PROP)
(COND ((NULL WORD) (CONS FLAG PROP))
((CONS (CAR WORD)
(LIST (MUNG (CDR WORD) FLAG PROP))))))
EXPR)

To combine lexicons and merge items with the same root into one lexicon, the COMBINE routine is used.
COMBINE has four arguments; the first two specify the old and new dictionaries. The old dictionary is
combined to the new one, so the first argument names the combined dictionary. The old dictionary is not
destroyed and may still be used. The third argument is the flag (as in DICTADD). The fourth argument,
specified as NIL, is used internally, since the COMBINE routine is recursive, for building up the new
dictionary. COMBINE uses the ADD routine.

(DEFPROP COMBINE
(LAMBDA (ODICT NDICT FLAG SOFAR)
(MAPC
'LAMBDA (X)
(COND
((EQ (CAR X) FLAG)
(ADD (CONS () ODICT)(REVERSE SOFAR) FLAG (CDR X)))
((COMBINE ODICT (CDR X) FLAG (CONS (CAR X) SOFAR)))))
NDICT))
EXPR)

Although not appearing in any of the above listings, the routines DICL, DICLIST, JUSTN, and
JUSTNAMES can be used to list lexicons neatly. DICLIST and JUSTNAMES have one argument, the name of
a dictionary. The former lists the contents of the dictionary exactly as they appear in storage, while the latter
gives just the root forms of the lexical items. DICLIST invokes DICL and JUSTNAMES invokes JUSTN in
analogous manners, i.e., to print the listing.

(DEFPROP DICLIST (LAMBDA (DIC) (DICL DIC '* NIL)) EXPR)

(DEFPROP JUSTN (LAMBDA (DIC) (JUSTNAMES DIC '* NIL)) EXPR)

(DEFPROP DICL
(LAMBDA (NDICT FLAG SOFAR)
(MAPC'(LAMBDA (X)

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 255

(COND ((EQ (CAR X) FLAG)


(PRINT (LIST (REVERSE SOFAR)
FLAG
(CDR X)))
((DICL
(CDR X)
F LAG
(CONS (CAR X) SOFAR)))))
NDICT))
EXPR)

(DEFPROP JUSTNAMES
(LAMBDA (NDICT FLAG SOFAR)
(MAPC '(LAMBDA (X)
(COND ((EQ (CAR X) FLAG)
(PRINT (IMPLODE (CAR (LIST
(REVERSE SOFAR) FLAG (CDR X))))))
((JUSTNAMES (CDR X) FLAG (CONS
(CAR X)
SOFAR)))))
NDICT))
EXPR)

Appendix C

The following discussion explains how English is parsed in an experimental program that uses the lexical
structure. A semantic structure expressing a particular utterance is formed according to simple structural rules.
The central role of verbs is acknowledged and preferred semantic categories for the subjects and objects of
verbs guide each choice in the creation of meaning structures. Word sense disambiguation for verbs, modifiers,
and nominals follows naturally in this approach, vide Cercone (1975a). Extensive trial and error searches are
eliminated since the interpretation takes on a "slot and filler" character. The approach to interpretation is
almost completely semantically oriented and syntax is used only when meaning-analysis fails.

Initial Classification
Initially the text is read (either in discourse mode or from an external file for longer text) and broken into
clauses (at present this process is very unsophisticated). Each clause is then "classified" in the following
manner. Words are morphologically analyzed and, based on that analysis, are classified to determine all of their
possible syntactic functions. For example, the form "drinks" of the root word "drink" can only be used
nominally or as an action. The root form is located in the lexicon and using affix information from the
morphological analysis, all of the possibilities for the word are extracted. When all words in the clause are
classified, the next phase, parsing, begins.

Parsing
Traditionally, the purpose of parsing sentences has been to output syntactic trees. These trees served as
input to semantic routines charged with the generation of meaning structures. Winograd (1972) and Woods
(1970) tried, with some degree of success, to integrate the two processes and have each guide the other.
Schank (1972) and Wilks (1973) have stressed that syntactic processing was secondary to meaning analysis and
should be necessary only when the resolution of ambiguity by meaning analysis alone had failed. Their parsing
phase is almost completely semantically oriented. One important by-product in the method to be described is
the detection of the correct "sense" of nominals and actions and, although not yet implemented, modifiers as
well (I am restricting utterances to active voice).
The parsing proceeds as follows. Words in a classified clause are scanned from left to right in search of a
suitable candidate for an action. Once found, the sentence is separated into
((FIRST PART) (ACTION CANDIDATE) (SECOND PART)).
The action candidate contains, among other things, a list of possible action "senses" that this particular root
form may have. These senses are ordered by a scheme, albeit a very superficial scheme, to be described later.
Associated with word senses are templates as described in Cercone (1975a). For example the sense *GIVE1 of
the root form "give" has a template

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
256 NICK CERCONE

X GIVE Y Z

and an alternative (ALTERN) template

X GIVE Z TO Y.

The template is used to guide the parsing. In this example X, Y and Z are variables represe
arguments of the predicate "give" that we expect to find in the surface utterance, in the given or
argument is not present in the utterance, the implication template can be used to infer argument
detailed information concerning the arguments is obtained by examining the network proposition
sense of "give" in question, those which involve the arguments. Thus X would represent an
nominal capable of "giving."
This is similar to what Schank does when parsing in conceptual dependency theory. If the word
surface utterance do not satisfy the constraints for arguments, one of four reasons is likely. First,
syntactic constructions could exist. Second, a different "sense" of the action is "correct." Third, the
action-candidate is not the valid action of the clause. Finally, some other reason, like slang express
metaphor might be the cause.
Whenever arguments fail to satisfy a predicate, a search for alternative implication templates begi
fails then the list of senses for the root form is further examined. If other senses of the action cand
they are examined further to see if arguments in the surface utterance match variables in the temp
procedure is repeated until the correct sense of the action candidate is found or the list of senses is
If the sense list is exhausted, scanning continues in the surface clause for another suitable action can
the process is repeated.
The matching of predicates' arguments in surface text to variables in implication templates includ
the correct sense of nominals and modifiers as well. The sentence "A drinker drinks many drinks" h
second argument of the predicate "drinks" the word "drinks." Possible nominal senses for tha
include an alcoholic beverage, a body of water (throw John into the drink), or a thirst quencher. Th
first sense of a nominal fails as argument, all other senses must be examined before deciding not to a
argument. This reasoning applies with respect to modifiers in a similar but not identical fashion. Fo
a "yellow cake" is a type of cake much like a chocolate cake, whereas a "yellow car" is someth
yellow and something that is a car. Using these methods, sentences such as "A 'drinker' 'dri
'drinks' " and "The pilot 'banked' his plane near the river 'bank' over the 'bank' that he 'banks' on
'banking' service" present little difficulty.
Morphological analysis is important since only those forms that can authentically be considered
need be examined. In the example, "A drinker drinks many drinks" morphological analysis
"drinker" immediately as an action candidate. Thus, we are quickly able to get a right choice.
Both Schank and Wilks used their intuition to set up respective meaning representations. The way
defined and used semantic "primitives" are one example. One way in which my intuition has
experimental program can be shown with the following superficial scheme for choosing word senses.

"bank"

(gl,11) (g2,12) (g3,13) (g4,14)


(16,0) (92,0) (47,0) (12,0)

Associated with each sense of a word are gl's and ll's


usage of the ith meaning sense of the word. Whenev
first examined to see if any context has been establi
example, so no context has been established; then the
second sense is selected as the most likely candidate. If
selected in that order. Suppose that the third sense tu
one, and, whenever the term "bank" is encountere

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
MORPHOLOGICAL ANALYSIS AND LEXICON DESIGN FOR NATURAL-LANGUAGE PROCESSING 257

frequency count will be incremented by one (if it is the correct sense). This would continue
sense fails to be correct. At this point we would examine the second, first, and fourth senses u
the correct meaning sense (i.e., the ith term). The 13 is added to g3, li is set to one (non zer
zero, and the ith meaning sense is selected whenever the term "bank" is encountered.
The list of modifiers found in the clause, further classified as to function, and associated
predicate arguments they modify, is also given as part of the parsing phase.
Once the parsing phase has been completed, the meaning representation is built for the c
structure is integrated into the semantic network, vide Schubert (1974). The first step invo
intermediate structure based only on the predicate of the clause and its arguments. After t
created, it may be altered to accommodate other information detected in the parsing phase. T
includes mainly modifiers (only some adjectival modifiers are now analyzed, howeve
quantificational are planned).

NOTES American Journal of Computational Linguistics,


Volume 2, AJCL Microfiche 34, 64-81.
Cercone, N. (1974). "Computer Analysis of English Word
1. A word with either a different syntactic, class from the
original, a different meaning, or both. Formation," Technical Report TR74-6, Department
2. Suffixes may "pile-up" to about three or four in number of Computing Science, University of Alberta,
(e.g., normalizers) whereas prefixes are normally single. Edmonton, Alberta.
When suffixes do "pile-up," their order is fixed and weCercone, N., and L. K. Schubert (1974). "Toward a
can take advantage of this fact. State-Based Conceptual Representation," Department
3. They include auxiliaries of periphrasis, which assist in of Computing Science, TR74-19, University of
expressing the interrogative, negative, and emphatic forms Alberta, Edmonton, Alberta. (Also IJCAI-IV, pp.
of speech, viz. "do" ("did"); auxiliaries of tense, "have," 83-90.)
"be," "shall," "will"; of mood, "may," "should," Davidson, D., and G. Harman, eds. (1972). SEMANTICS OF
"would"; of voice, "be"; of predication (i.e., verbs of NATURAL LANGUAGE, D. Reidel Publishing
incomplete predication which require a verbal comple- Company, Boston, Massachusetts.
ment), "can," "must," "ought," "need," also "shall," Heny, F. W. (1973). "Sentence and Predicate Modifiers in
"will," "may," when not auxiliaries of tense or mood. English," in SYNTAX AND SEMANTICS, vol. 2, J.
(OED s.v. Auxiliary, B. Sb., 3.) Kimball ed., Seminar Press, New York, New York,
4. The type features have all been placed under the category 217-245.
AM because of the nature of action-modifying adverbs; Lakoff, G. (1973). "Hedges: A Study in Meaning Criteria and
however, as type one adverbs show, this is not always the the Logic of Fuzzy Concepts," Journal of Philo-
case. sophical Logic, v 2, pp. 458-508.
5. Much of Wilks' representation of meaning Lakoff, G. (1972).
in formulas is "Linguistics and Natural Logic," in
based on lexical decomposition trees developed by SEMANTICS
Lakoff OF NATURAL LANGUAGE, D.
(1972). Those representations have foundationsDavidson
in the and G. Harman eds. D. Reidel Publishing
"generative semantics" school of thought. The argument
Company, Boston, Massachusetts.
concerning the "correct" theory of grammar between
Montague, R. (1972). "Pragmatics and Intensional Logic,
advocates of transformational syntax on the one SEMANTICS
hand and OF NATURAL LANGUAGE, D. David-
generative semanticists on the other continues. son
Anand G. Harman eds. D. Reidel Publishing
ex-
cellent critique of both avenues is presented inCompany,
Bartsch Boston, Massachusetts.
and Vennemann (1972), pages 6-28. Much ofMoon, D. (1974). "MACLISP Reference Manual," Project
the material
they review has been reprinted in Davidson and MAC-M.I.T.,
Harmon Cambridge, Massachusetts.
Parsons,
(1972), see especially Parsons, Montague, and Lakoff.T. (1972). "Some Problems Concerning the Logic of
6. Excluded from this Appendix is the STEM routine Grammatical
which Modifiers," in SEMANTICS OF
has been described in Cercone (1974). NATURAL LANGUAGE, D. Davidson and G. Harman
eds. D. Reidel Publishing Company, Boston, Massachu-
setts, 127-141.
REFERENCES Quillian, M. (1968). "Semantic Memory," in SEMANTIC
INFORMATION PROCESSING, M. Minsky ed. MIT
Bartsch, R., and T. Vennemann (1972). SEMANTIC Press, Cambridge Massachusetts, 227-270.
STRUCTURES, Athenaum Verlag, Frankfurt,Quine, W. (1960). WORD AND OBJECT, MIT Press, Cam-
Germany. bridge, Massachusetts.
Cercone, N. (1975a). "Representing Natural Language in Russell, B. (1923). "Vagueness," Australasian Journal of
Extended Semantic Networks," Department of Philosophy, 1, 84-92.
Computing Science, TR75-11, University of Alberta,Schank, R. (1974). "Adverbs of Belief," Lingua, v 33, North
Edmonton, Alberta. Holland Publishing Company, 45-67.
Schank, R. (1972). "Conceptual Dependency: A Theory of
Cercone, N. (1975b). "The Nature and Computational Use of
a Meaning Representation for Word Concepts,"' Natural Language Understanding," Cognitive

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms
258 NICK CERCONE

Psychology, v 3, LANGUAGE, Academic Press, New York, New York.


552-631.
Schubert, L. (1974). "On
Woods, W., the
R. Kaplan, and Expressiv
B. Nash-Webber (1972). "The
Semantic Networks," TR74-18, Department of Lunar Sciences Natural Language Information System:
Computing Science, University of Alberta, Edmonton, Final Report," Bolt Beranek and Newman Inc.,
Alberta. (Also AI, 7, 2, 163-198.) Cambridge, Massachusetts.
Wilks, Y. (1973). "Preference Semantics," Stanford AI Zadeh, L. (1972). "A Fuzzy-Set-Theoretic Interpretation of
Project, Memo AIM-206, Stanford University, Linguistic Hedges," Journal of Cybernetics, v 2, n 3,
Stanford, California. 4-34.
Winograd, T. (1972). UNDERSTANDING NATURAL

The following IBM Reports are available on request to IBM Corporation, Armonk, N.Y.

"Data Entry of Chinese and Kanji Characters" no. 5249, edited by E. F. Yhap. A keystroke system with 37
keys, upper and lower shift, is proposed for data entry of Chinese and Kanji characters into computer systems.
This data entry method has been applied to the 881 Kanji characters which are prescribed as a minimum
requirement for the six elementary grades in Japanese schools by the Japanese Ministry of Education. The
resulting average number of keystrokes for this set of 881 characters is just a little under 4.2 keystrokes per
character. Reasonably high rates of character input are therefore expected to be achievable (60 cpm or better).
Other advantages claimed (but not yet tested) for this data entry method are ease of operator training, and
lack of operator mental fatigue.

"An Organization for a Dictionary of Senses" no. 5548, edited by Dick H. Fredericksen. This paper describes a
lexical organization in which "senses" are represented in their own right, along with "words" and "phrases,"
by distinct data items. The objective of the scheme is to facilitate recognition and employment of synonyms
and stock phrases by programs which process natural language. Besides presenting the proposed organization,
the paper characterizes the lexical "senses" which result.

"On Natural Language Based Query Systems" no. 5577, edited by Stanley R. Petrick. Some of the arguments
which have been given both for and against the use of natural languages in question-answering (QA) systems
are discussed. Several QA systems are evaluated in assessing the current level of QA system development.
Finally, certain pervasive difficulties which have arisen in developing natural language based QA systems are
identified, and the approach which has been taken to overcome them in the REQUEST System is described.

"The Request System" no. 5604, edited by Warren J. Plath. REQUEST is an experimental Restricted English
QUESTion-answering system which is currently capable of analyzing and answering a variety of English
questions, spanning a significant range of syntactic complexity, with respect to a small Fortune-500-type data
base. The long-range objective of this work is to explore the possibility of providing non-programmers with a
convenient and powerful means of accessing information in formatted data bases without having to learn a
formal query language. In order to address the somewhat conflicting requirements of understandability for the
machine and maximum naturalness for the user, the REQUEST System employs a language processing
approach featuring: (1) the use of restricted English; (2) a two-phase, compiler-like organization; and (3)
linguistic analysis based on a transformational grammar. The present paper explores the motivation for this
approach in some detail and also describes the organization, operation, and current status of the system.

This content downloaded from


125.16.189.232 on Sat, 05 Oct 2024 09:04:27 UTC
All use subject to https://about.jstor.org/terms

You might also like