Compaund Wordsgross
Compaund Wordsgross
Maurice Gross
U n i v e r s i t y Paris 7
2, place Jussieu
The essenti~d feature of a lexicon-grammar is that the elementary $apport verbs are frequerd in technical texts, and may have stylistic
unit of computation and storage is the simple sentence: variants, as in this last example,
snbleet-verb-complement(s). -this type of representation is obviously
needed for verbs: limiting a verb to its shape has no meaning other Grammatical elements such as determiners, prepositions and
than typographic, ~ir]ce a verb cannot be separated from its subject and conjunctions, do not belong to the lexicon-grammar in the same sense as
essential coreplemenl(s) 2. We have shown (M, Gross 1975) that given a the four major parts of speech do, siece they are parts ot structures or
verb, Or equivalently a simple sentence, the set of syntactic properties rules. For example, prepositions appear in the columns ef the
that describes its variations is unique: in general, no ether v e r b has lexicon-grammar.
an identical syntactic paradigm 3. As a consequence, the properties of
each verbal construction must be represented in a texicon--grammar. The An early representation of verbs in a lexicon-grammar of about
lexicon has no significance taken as an isolated component and the 12,000 verbs is ftiven in figure t. Each row of the matrix is an entry
gr~rnmar tempera:at, viewed as independent of the lexicon, will have to be whose main construction is defined by a table or class code. In figure
limited to c e r t a i n c o m p l e x sentences, I, the code G corresponds to the class of constructions:
subject-verb-direct s e n t e n t i a l c o m p l e m e n t , noted:
Since be-Adiective tetras are close to verbs, their
description is quite similar, that is, they are considered as sentences. (1) NO V qee P
We have apl}lied lexicon-grammar representation not only to the two (N O ts the subject and P stands for sentence).
obvious predicative parts of speech, verb and adjective, but to nouns
and adverbs a~; well. In the same way as one adjoins the verb to be Each column is a syntactic property, and corresponds to a structure into
tn adjectives, we have ~ystematically introduced support verbs which V may enter, roughly a syntactic transform of the main structure,
(Vsup) for nouns and adverbs, as in the following examples ~or example, in columns we have placed the Passive forms, Extraposed and
(Z.S, H a r r i s 1976, M. Gross 1982, 1986): renominal forms. Thus, the related structures are semantically close.
" + " sign at the intersection of a row and a column indicates that the
Vsup ::: fO be Prep: ntry in the row is accepted in the structure associated to the column,
The text ia in contradiction with the low " - " sign correspoeds to inacceptability. The process of accumulation
that led to the formalized lexicon-grammar of 12,000 French verbs has
run into what seemed to be at first a minor problem of representation of
VSUp =: to h~ve
words: the difference between simple and compound words, On the one
This text has a certain importance for Bob4
hand, there are simple words ~uch as the verb know and complex
Vsup =: tO occur, etc. (idiomatic) forms such as keep in mind, Both forms play the same
syetactic arid s e m a n t i c role in s e n t e n c e s such as:
Accident8 occur at random
The ~4ccident (was, happened, occurred, took place) late at
Bob knows that Max ha~ moved to Tampa
night
Bob keeps in mind that Max /)as moved to Tampa
-
i
observer + -
i i !
obtenir + -
-I- officialiser
i -I -I -I +
1
omettre +
ordlestrer + -
+ -
oublier
_ -- + - +
-- oui'r
palper +
+ .-.
~arapher
~assersous silence + -
~enser + -- + 4-
)ercevoir _ -- + + -i -
)erdre de vue
+ -
perforer i
I_ + i-
p~rorer
I
Figure I
(1) The show took place nighlly The two other adverbs of (1) are tree forms. Thus, the determiners
el night Det and modifiers (Adl and Re/clause) of:
during a busy night
the night Bob missed his plane during Det Adj night Relclauae
By compound adverbs, or frozen or idiomatic adverbs, we mean adverbs can vary freely (within semantic constraints). In the same way, tfm
that can be separated into several words, with some or all o f t h e i r event associated w i t h t h e s e n t e n c e 8 in t h e form:
words frozen, that is, semantically and/or syntactically
noncompositional. In (1), af night is a compound adverb, t h e lack the night (E, that) 8
of c o m p o s i t i o n a l i t y is a p p a r e n t f r o m l e x i c a l r e s t r i c t i o n s such as:
can be e x p r e s s e d by a l a r g e v a r i e t y o f u n c o n s t r a i n e d forms.
*at day, *at afternoon, *at evening
Frozen or compound adverbs constitute the simplest case of compound
and by t h e impossibility of inserting material that is a priori forms because they do not allow variations of their components. As
p l a u s i b l e , s y n t a c t i c a l l y and s e m a n t i c a l l y : mentioned above, in at night no a d j e c t i v e is authorized. Moreover,
one c a n n o t insert a determiner: *at (a, this) night, t h e plural is
*at (coming. present) night forbidden: *at nights and no relative clause can be appended:
*st ( c o l d , dark) night *M night (that, which) was agreed on.
during the (coming, present) night Such observations are general, and apply to many adverbs of varied
f o r m and l e x i c a l c o n t e n t :
during s (cold, dark) night
It rained cats and dogs
*many cats and dogs
5. Note that words or roots are often considered as units in most *big cats and dogs
a t t e m p t s to d e v i s e s e m a n t i c r e p r e s e n t a t i o n s . *cat and dog
from time to tit~e Max propoaed 8ohrtiena f r o m l,he top of his hat
~trorn timet~ to times
*from a time to a n o t h e r time It ic largely frozen: no o t h e r d e t e r m i n e r is allowed, no a d j e c t i v e s can
from l o n g time to l o n g time be appended to either noun, etc., but the person of the possessive
a d j e c t i v e Pone, may vary. This possessive a d j e c t i v e must refer to
Consequently, these compound adverbs could be identified by a simple t h e s u b j e c t o f t h e s e n t e n c e , and v a r i e s a c c o r d i n g l y :
recognRiorr procedure, for they do not require any lei, amatization or
syntactic analysis to be reduced to a dictionary form, as is the case *Max propound ideas f r o m the top el y o u r hat
with verb for)as for example. *My staler p r o p o s e d ideas from the top of his hat
Bob and Max proposed ideas from the top of their hat(8)
A lexieal study of compound adverbs has been performed in French
and a systematic inventory has been compiled from various dictionaries. In this case. the recognition procedure is no longer a simple string
R u n n i , g texts have been examined as well. It is interesting to note matching operation, since a variable slot must be dealt with inside the
that whereas in current dictionaries t h e r e are about 1,500 one word fixed string. More general matching rules are required here 6. Once
adverbs, most of t h e m in -meat (-ly), we have found o v e r 5,000 this compound adverb l,laa been identified in a text to be processed, it
compound adverbs, can be given an iaterpt~etation, for example in terms of a simple adverb
such as teiaarely or lightly and the referential information
These compound adverbs have been classihed according to their carried by Pots can then be ignored. Itowever, one oar] easily
sywtacltc shape. The syntactic forms are described at the elementary construct particular discourses w h e r e the obligatory cereference
level ef sequences of fmrts of speech. We use symbols with obvious relation involved will (bsambiguate some analysis. Thus, not only the
interpretations such as Prep, Dot, Adl, N, V, COOl v a r i a t i e n of Poaa must be accounted l,or at the lexical level, but
(fay conjunction) and W for a variable ranging over verb complements, its r e f e r e n t i a l i n f e r m a t i e n has to be kept l,or possible use in a parser.
etc. We write:
{fiber compound adverbs o i l e r different degrees of variation. There
Prep N =: at night m'e cases where one part of the adverb is frozen and another part is
Prep Dot N =: in the end entirely free:
Prep Dot Adf N =: in the l o n g r i m
Max organized a p a r t y in h o n o r of Bob
Prop Oel, N el Dot N =: in every nonce of the w o r d Max h i d the c a r at the f a r end of the p a r k i n g lot
at the point of a gun
The parts in honor, al the far end are frozen. For example,
Prop Def N Ceni Dot N =: time and again they do not allow modil,iers. The parts of N are tree, for we
o b s e r v e v a r i a t i o n s such as:
V W =: to begin with
Max organized a p a r t y in hJa h o n o r
S =: all things being equal Max hid the car at the lar end, I think, of the parking lot
F gure 2 shows the classes that have been defined on this basis, Consider the adverbials:
t o g e t h e r w i t h e x a m p l e s and t h e n u m b e r el, i t e m s in each (:lass;
for the sake of r u i n i n g thinfjs
for the sake of Bob
for G o d ' s sake
PAD% Adv _ . ._ . .
|~oad~,fin . . . . . . |i P O __
-.~
PC Ihap (" ] en bref 1 ,160 We (:all the combinatien for--cake frozen, since the noun
sake does not occur elsewhere than in adverbial phrases with the
P[)l~q( Prep De~ C lco,m-e tome atzente 570
! preposition for: it cannot be t h e s u b j e c t or o b j e c t of any verb.
PAC Prep a d i ¢ +++esa hel,h, mort
/
440 On the other hand, the modifiers of sake are quite varied and
/ 1
pea P~+pCAdj ~ d ~o,'~e ~l+,l..+ t ~oo l regular from the point of view of the syntax of noun modifiers 7.
pv l,,,pv, ....... i {,,o I Tecl;nical or specialized families of adverbs come close to being
PF P (phrase figae) I p i . . . . . . . l i e sail ] 230 i frozeu adverbs:
I I
pl~co <Adi) .......... ( ~ . .............. ,in,. [ 200
/ / (2) They e l e c t e d Bob on the ( f i r s l , s e c o n d ) ballot
PVCO (V) comme C ~ c o m m e un cheveu sar la soupe ~ 210 (3) Max ate his n o o d l e s in a bow/
/ /
PPCO (V) . . . . . . . . . ]'r6p C J , " ......... d...... h, bellFr£ J 30
t / The special semantic relations that hold between the adverbial
F'JC
..... ('onj (" el out le tret~ hlet e.' t _1 100
complement and the rest el, the sentence are lirmted. There are few
TOTAL ~,',4 190 verbs such as to eat which c o m b i n e with in a bowl and which
have t h e non locative i n t e r p r e t a t i o n of (3). The usual i n t e r p r e t a t i o n
Frozen Adverbs (t4. (;ros;,s 19~6 ) is t h a i f o u n d in:
Tableau 2
6. PRDLOG rules are particularly well adapted to recognizing such
The examples discussed an far are entirely frozen. Itence, as a f r o z e n f o r m s (P. S a b a t i e r 1980).
i)vuctical matter, t h e y can be located iu a text by using the search
function available for strings in any text e d i t o r system. T h e r e are 7. There are nonetheless restrictions on t h e m :
however more complex examples that require deeper analysis. Consider
fay e x a m p l e t h e i d i o m a t i c a d v e r b in t h e s e n t e n c e : ~for a h e a v e n l y ,~oke
Max puF hia n o o d l e ~ in a b o w l - b o a r d of g o v e r n o r s one be modified in several ways: board
a~ld g o v e r n o r a ta.ke separate determiners and modifiers: ~he
Entering ITozen adverbs into a lexicon-grammar raises many r=ew powerful boarda of the twelve governora of my bank, Such a compound
questions, The bulk of adverbs can be described by means of the noun comes close to being a free Form. It is the liruited number of
Following t y p e of d e r i v a t i o n (Z.S, H a r r i s 197¢):
second aeons such as d i r e c t o r , governor or regent that
suggests we are dealing with a compound noun. Also, the meameg of
Bob left; 7hat Bob left occurred at 9
these phrases is nonoompoaitional in the sense that they have a legal or
: Bob l e f t , fhia o c c u r r e d at 9
i n s t i t u t i o n a l m e a o i n g t h a t t h e i r c o m p o n e n t s do not have c l e a r l y .
:: Bob l e f t at 9
The variations of lurer we have enumerated can be partly hal'=died bit
and sulaport verbs play a crucial role here. However, there are cases atlcachiag a finite automaton to a given entry, and this automaton will
where no general support verb is found and where adverbs have to be describe the main grammatical changes allowed The adjunction o~ free
considered as a part of the elementary sentence. Consider the adverb relative clauses to compound nouns may r e q u i r e a different t r e a t m e n t
in:
"l~)e kiads of variation of compound nouns are aO numereu,~ that
Bob sang at t h e t o p o f hJ~ v o i c e cletermieing whether a given nomit)al coostruction is a compouod noun or
nol: almost requires c~. original demonstratiou. Titus, aotontatizirlg~ the
It is syntactically and semantically analogous to tree adverbs such as co,infraction of a leKicoa is a,'l activity that will preseot severe
noisily, powerfelly. For these two f r e e adverbs, a d e r i v a t i o n a l Ibnitatioas.
source i n v o l v i n g t h e a d j e c t i v e is available:
Determining the sup~mrt verbs for compound nouns does )tot seem to
The way Bob sang was (noiay, powertn/) raise o t h e r p r o b l e t e s t h a n t h o s e e n c o u n t e r e d with simple nouns.
This is not the case for at the top of his voice which is
practically limited to modifying the verbs of saying. Moreover the R~MAIrlK
obbgatory c o r e f e r e n c e link of hia leads to a representation where
this adverb is not analyzed. Thus two semantically similar types of Conrlpound aeons raise o t h e r q u e s t i o n s in some language•:
adverbs have to be represented quite differently in the lexicon-grammar.
All the situations just exemplified with adverbs are quite common, cod - in Gerraan. whore rio blacks occur between component¢, segmentation is ~[
are also encountered with nouns, adjectives and verbs. The paradox el~ prebleltn;
relaresentatJon they lead to can only be solved by introducing a complex - in French (G. Gross '1985), where the spelling of the plural is ht
level of semantic equivalence for the entries of the lexicon-grammar, g e n e r a l not s t a n d a r d i z e d , e x t r a v a r i a t i o n s have to be expecte(I.
2, C o m p o u n d nouns C o m p o u n d modifielFs
In general, compound nouns allow variations of determiners and (I) NO V Ct =: Bob hit the /ackpot
modifiers, but m a n y s i t u a t i o n s are encountered: (2) N0 V N1 Prep C2 =: Bob took your project into account
(3) N0 V CI Prep C2 =: Bob look the bull b y Ihe heron
the moon is a frozen combination, - - definite article-noun
(4) N°~ C0 V Ct =: Bob'a dream came true
-- which behaves like a proper name, because ot its unicity of
reference. It cannot be modified by adjectives without losing its
W e outlined in I the description ot a lexicon-grammar of French
reference: *the (big, yellow) moon;
v ~ b s and the reasons why compound verbs had to be separated from simple
On~S.
c r u d e oil takes restricted determiners. Since it is a mass noun,
t h e r e are difficulties in accepting its plural, It can be modified by ~;ystematic search through dictionaries (monolingaal, bilingual, and
adjectives and nouns as in (cheap, high q u a l i t y ) c r u d e oH, but specialized) has yielded close to 20,000 compound verbs belonging to the
these cannot modify el/: *crude, ( c h e a p , h i g h q u a l i t y ) oil; same level of language as the 12,000 simple verbs. A syntactic
c l a s s i f i c a t i o n has b e e n b u i l t for t h e m ( F i g u r e 3).
stroke of luck has unrestricted determiners and modifiers, bat no
iosertion is allowed i m m e d i a t e l y before or next to of, in particular Compound verbs are the most complex Forms that have to be entered
luck cannot be modified: *stroke of g o r ~ luckS; into a lexicon £t. The compounds discussed previously were simple
If wa,s f o r a l l I h e w o r l d aa i t S
8. ~,lrnko of b a d luck would be a different compound word, whose
relation to a f r o k e o f l u c k is o n l y etymological.
Which n e e d an e x t r a l e v e l of c o m p l e x i t y (L. Danlos 19B5).
because by and large they wore topologically connc% that is, either 4. SoFno (;oncIusions
their I'mrts could not be separated by any extraneous linguistic material
or else the+ inso~ted material could be easily described (i.e. by moans Ilew to organize the lexicon of compound utterances is an gloom
of a finite a u t o m a t o n ) . question, From a computational point ef view, many solutions use
a v a i l a b l e for t h e lookup of a (:emDound term:
1(7 NoV Ct flce ()n P II a dit IlOIl ~'l CC que Max testc r ,50
con'ffJo,ond verbs, one wonh'l have to synthesize a matchinfl utterance,
rather than .girn[dy looking it up. Such a procedure car, always fm
cP, NoV (~l ,:1..' ce Qu P I1 se tiler (t lop; ¢loigts dc cc qu'il egt / sln+utat ed s(tqueutJally.
i'Cst(~ I
I. all cost-.,';, the representatio, el utterances which we have used.
CAI)V
_ NoV Adv Cola nc pisse pas loin ,] Z!}
200!
flamen the Se(luer.cos of syntactic categories, agow.~; for the separation
c× INoVN li est palti sans laisscl (l+adrcsse ] 30() of the lexi(:on of con'lpeund [ornl!~: into classes for which direct access
(]O CoV %/ ]AI nloll[indc nlOll[C all ILCI tic Max [ I 300 can be provided. In this way, dictionary Iooliup can Lie stied u|l 1i
ftEMAIH<
AU'N ._INepv,}+~:~ l...)v 0 i+ ~).,,¢ ~., M.= l E°~ /
ANP2 / No i veil N I )).(~p ('2 Ill [' MilX CO hoti'ctll L ]00 In laver el l e f l q o - r i g h t aualysit; one could point to the loci that
A'?+ "- No ave-h (£i A(+il- - II t, la v\le[ ........ IO0-} complex terms can ellen be abbreviated and that abbreviations are nlostty
rHfht truncations. In seth situations the remaining part (the tellmast
A-I-i'2 No avoir C I'l',r~ I) (+ II a mat aux chcvcux 250 i p~rt) af the truocated term must carry the in|ormation that describers
EOi-i(°:ii-N~'!!i"!:~!! ) i / ;'imhe;i~M . . . . . , "leo,lie 350 the rgtht context m order to allow reconstruction of the reducncl part.
Iherc are however examples where abbreviations are carried out on the
Eel'[ C0 6trc lh61> Cl l.cs ~ieuts sont du c6t(" dc Max 2(10
left part el a term. (e g. a progral~mlng language a
larp.quagc).
Fr(3zen Verbs Preliminary figures have shown that conl[~und terms form thP.
(hi. Cz'os;.~ 19112) essential [.art of a lexicon-grammar. It is also interesting to observe
Tableau 3 that they Iorce both the linguist and the computer specialist to adopt a
me(;h voore a b s t r a c t view of language;
At tbaf time. Bob w i l l be h i t t i n g the l a c k p a t - syntactically, it has become a rather general hatlit to attach
properties 1o individual words, In the case of compounds this mode of
representation is no longer possible: Why privilege one part of a
Sentential i n s e r t s (:an s e p a r a t e a verb from its coruplemonts:
compound with marks rather than some other part? For example, there is
no reason to attach the Passive marking to the verb rather than to
Bob hit, if s e e m s to me, the jackpot either of the complements of the utterance to put the cart before the
horse, Lexicon-grammar representations eliminate such questions by
In example (2). the direct complement N t is Ifee and general. dolocalizing the syntactic information and by attaching it to the full
heoce, se+ltenti~d structures can separate the verb from its second sentence, In this sense, compound expressions provide a powerful
(frezed=} c o m p l e m e n t : n]etivation for representing lexical and syntactic phenomena in the form
of a l e x i c o n - g r a m m a r .
Bob took the tact lhat Jo was absent yesterday into account
10+ As a matter of fact, when an utterance is found to be ambiguous, 11. The saree use of se(luences of syntactic categories is found in n
with one analysis as a frozen form and the other as a free form, string grammar (Z.S+ Harris 1961), which has proven to be quite
ignoring c o m p e t i n g free forms a l t o g e t h e r is a good parsing strategy, efficient in syntactic recognition (N, Sager 1981, M. Salkoff 1973,
1979).
REFERENCES
Gross, Gaston; Viv~s Robert, eds. 1986. Syntaxe <lea hems, Langue
francaise 69, Paris: Larousse, 128p.