We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
4.3 Vocabulary connections: mul
items in English
Rosamund Moon
(COBUILD ane te Ureversty of etringham
Introduction
In looking at vocabulary, it is natural to focus on the word as the
primary unit. Dictionaries help to reinforce this by representing the
lexicon as a series of headwords or individual lexical items. But while
this isa practical approach, it may also he dangerously isolationist, as in
many respects a ‘word? is an arbitrary unit. Itis, after all, just a string, of
characters, or a sequence of one or more morphemes, which is bounded
at either end by a space or by punctuation, Text studies and corpus
studies have revealed the significance and the intricacy of the links
between words: for example, their strong clustering tendencies and the
patterns which are associated with them. This chapter will consider
lexical connections between words in English, with particular emphasis
‘on multi-word lexical items. It will then review some of the problems
facing L2 teachers with respect to these items.
Collocation
Traditional models of language ~ or at least models of Western
European languages — are generally built on grammatical principles,
with the clause or sentence being the focal unit. In such models,
connections are the syntactic relationships between elements in the
clause or sentence. A sentence such as:
‘The bushes and trees were blowing in the wind, but the rain had
stopped. (The Bank of English: a collection of over 300 million
words of written and transcribed oral English texts which is
held at COBUILD, the University of Birmingham, and is
‘managed by HarperCollins Publishers)
contains 14 orthographic words and it can be analysed in a number of
different ways, according to the grammatical model in operation. For
40
Vocabulary connections: multi-word items in English
example, it is a single sentence, consisting of two finite clauses; it
contains two noun-phrases and two verb-phrases at a primary-level
analysis in a transformational-generative model, or two noun phrases,
‘wo verb phrases, and one adjunct or prepositional phrase in a systemic
model of grammar. The syntactic connections manifest in the sentence
enable the hearer/reader to (re}construct its meaning, In a similar vein, a
discourse-based grammatical model would draw attention to the cohe-
sive significance of the occurrences of the; the way in which the time
frame is indicated, and the logical connection or in this case disjunction,
signalled by but.
‘In contrast to these, a collocationist model would take into account
considerations such as the predictability of the co-oceurrences of words
in the slots that constitute the underlying structural frame. For example,
it might consider the statistical significance of the lexical frame “SOME”
THING blowing in the wind’ as against ‘the wind blowing SOME-
THING’, or the significance of rain being followed by the verb stop
rather than end, or finish or any other verb for that matter, (In fact, in
The Bank of English, the commonest verb which follows rain as subject
is fall, and stop is more typical or usual with rain than end.)
Collocation studies are now inevitably associated with corpus studies,
since it is difficult and arguably pointless to study such things except
through using large amounts of real data. Important papers on lexical
collocation are Halliday (1966), Jones and Sinclair (1974), and Sinclair
(1987 and 1991: see below). Church and Hanks (r990) outline a
statistical basis for calculating collocational significance.
Such analyses and observations are far-reaching and of great impor-
tance, Blow and wind are not merely random co-occurrences in this,
individual sentence, nor are they just inevitably collocates because they
happen to fall into the same lexical set of ‘weather’ and are topic-
related. But itis part of the meaning (in the broadest sense of the word)
of ‘wind? that it blows and causes things to blow about, and it is part of
one of the meanings of ‘blow’ that it is what meteorological phenomena
such as the wind do (see also Firth, 1951/1957, who sees collocations as
part of a word’s meaning).
‘This can be seen more strikingly with the first word of the following,
example:
‘Torrential rain burst river banks and flooded homes in the
North-East. [The Bank of English]
‘The adjective torrential is shown in The Bank of English to be severely
restricted in lexicogrammatical terms. Ninety-eight per cent of in-
stances are in premodifying position, 99 per cent collocate with the
word rain or (much less often) a semantically related word such as
4Vocabulary and description
downpour or storm. It is part of the meaning of torrential that it
concerns, qualifies and categorises ‘rain’: lexical form and meaning are
inseparable here.
In his central paper on collocation, Sinclair (1987: 519-3253 also
1ggt: 110-115) sets out two principles which account for the structural
patterning of lexis. They are opposites which are complementary and
co-exist. The open choice principle:
«is a way of seeing language text as the result of a very large
number of complex choices. At each point where a unit is
completed ~ a word or a phrase or a clause ~ a large range of
choice opens up, and the only restraint is grammaticalness.
‘Thus the open choice principle is essentially a traditional approach to
language: compare slot-and-fller approaches, and the distinction
between syntagm (or syntax-governed structural frame) and paradigm
{or lexical/grammatical set of words available for each slot in the
frame). In contrast, the idiom principle:
is that a language user has available to him or her a large
number of semi-preconstructed phrases that constitute single
choices, even though they might appear to be analysable into
segments,
So the idiom principle restricts the choices not just in a given slot, but in
the surrounding co-textual slots.
Just as a syntactic view of language observes rules underpinning,
grammatically well-formed utterances, a collocationist view of language
observes the strong patterning in the co-occurrence of words. This itself
can be seen to some extent as rule-governed and motivated (that is, it
reflects some subliminal or underlying system or process of analogy),
however prolific the rules are and difficult to codify.
‘Complementing this kind of approach is that of psycholinguistics,
which observes how language is processed — and often acquired ~ in
chunks or groups of words, rather than on a word-by-word basis. This
is explored at length in, for example, the work of Peters (1983). It has
important repercussions with respect to vocabulary learning and
teaching, since words are again and again shown not to operate as
independent and interchangeable parts of the lexicon, but as parts of a
lexical system (see Aitchison, 1987). In acquiring full knowledge of a
word like the noun wind, a learner has, of course, to acquire its
meaning(s), pronunciation and morphology; also its grammatical beha~
viour (noun; countable, also after the with general/homophoric refer-
cence]; its set-relationship with hyponyms such as gale and breeze; its
collocating verbs (blow, sweep), adjectives (strong, north, cold, light,
e
Vocabulary connections: multi-word items in English
bitter, prevailing), and partitive nouns (gust, breath, puff) in the frame
‘a... of wind’; and so on.
Torrential rain is an example of a restricted collocation (Aisenstadt,
19793 1981). Restricted collocations are cases where certain words
occur almost entirely in the co-text of one or two other words, or of a
narrow set of words. The adjective torrential must be learned as part of
some kind of lexical unit. There are many other similar cases. For
example, collectives such as packs of houndshwolves/dogs, flocks of
sheep(birds/seagulls, and swarms of beeslinsects, and intensifiers such as,
stone which occurs only in stone deafideadicoldicold sober. ‘The
phenomenon can be seen in irreversible binomials (Malkiel, 1959),
where strings such as hot and cold, cause and effect, and Mr and Mrs
tend to occur in a fixed order. (The motivation for the ordering is
discussed by Lakoff and Johnson (r980: 132-3); also by Carter and
‘McCarthy (1988: 25) who point out the extent to which the ordering is,
clture-specific.) The phenomenon can also be seen in valency patterns
which govern the typical lexicalisations of the subjects or objects of
verbs. For example, PEOPLE guzzle DRINK, VEHICLES guzzle FUEL.
All this is the kind of information which is the very stuff of learners’
dictionaries.
Multi-word items: terms and categories
‘The remainder of this chapter will look at multi-word items, which in
many respects can be seen as extreme cases of fixed collocations. It is
essential to begin with terms. There are many different forms of multi-
word item, and the fields of lexicology and idiomatology have generated
an unruly collection of names for them, with confusing results. In the
following, I shall be using a set of fairly general terms which are
relatively well-used or understood in the Anglo and Anglo-American
traditions, in preference to more specialist terminology. Note that there
is no generally agreed set of terms, definitions and categories in use.
First, a definition of a multi-word item itself. A multi-word item is a
vocabulary item which consists of a sequence of two or more words (a
word being simply an orthographic unit). This sequence of words
semantically and/or syntactically forms a meaningful and inseparable
unit, Multi-word items are the result of lexical (and semantic) processes
of fossilisation and word-formation, rather than the results of the
operation of grammatical rules. By this token, multi-word inflectional
forms of words, for example comparative forms of adjectives or passive
forms of verbs, can be separated out and excluded from the category
since they are formed grammatically. In the following sentence:
4BVocabulary and description
The bushes and trees were blowing in the wind, but the rain had
stopped.
were blowing and had stopped are verb groups or verb phrases, but they
are not multi-word items.
‘There are three important criteria which help distinguish holistic
multi-word items from other kinds of strings. They are institutionalisa-
tion, fixedness, and non-compositionality
Institutionalisation is the degree to which a multi-word item is con-
ventionalised in the language: does it recur? Is it regularly consid-
ered by a language community as being a unit? Pawley (1986)
discusses the process and fact of institutionalisation of, in his terms,
‘lexicalization’.
Fixedness is the degree to which a multi-word item is frozen as a
sequence of words. Does it inflect? Do its component words inflect
in predictable o regular ways? For example, they rocked the boat
and not they rock the boated or they rocked the boats. Similarly,
does the item vary in any way, perhaps in its component lexis or
word order? For example, another kettle of fish and a different
kettle of fish are alternative forms, but on the other hand is not
varied to on another hand ot on a different hand.
Non-compositionality is the degree to which a multi-word item cannot
he interpreted on a word-by-word basis, but has a specialised
unitary meaning. This is typically associated with semantic non
compositionality: for example when someone kicks the bucket (i.e
‘dies!) they are not actually doing anything to a receptacle with their
foot, and cats’ eyes (luminous glass beads set into the road surface
to guide drivers) in British English, are not, in any degree biological.
However, non-compositionality can also relate to grammar or
pragmatic function, For example, of course is non-compositional
because it is ungrammatical, and the imperative valediction Take
care! can be said to be non-compositional because of its extralin-
guistic situational function or ‘pragmatic specialisation’.
“These three criteria operate together ~ in spoken English, in conjunction
with a phonological criterion where multi-word items often form single
tone units, The criteria are not absolutes but variables, and they are
present in differing degrees in each multi-word unit.
‘Multi-word item’ is a superordinate term. Looking more closely at
the different types of multi-word item:
Compounds are the largest and most tangible group, but arguably the
least interesting. They may differ from single words only by being
written as two or more orthographic words. They cannot properly
44
Vocabulary connections: multi-word items in English
be separated out altogether, since variable hyphenation conventions
blur the distinction between compound multi-word items and
polymorphemic single words. An orthographic example is car park
which is also spelled carpark and car-park; 2 morphological
‘example is the group sedan chair, dining-chair and armchair which
ultimately are not so very different lexically. At the same time,
Pawley (1986: 108-120) makes the important point that hyphena~
tion or fusion of words is a technique by which a string is,
designated as a unit and therefore lexicalised. For example, wild
flower seems to be a purely compositional, transparent string
compare garden flower, wild bird. However, the increasingly
common spelling wildflower, predominant in American English,
shows how far it is becoming lexicalised as a unit: compare
wildlife.
Many open or two-word compounds are nouns: Prime Minister,
crystal ball, collective bargaining and so on. They are very com-
monly terms or titles, or refer to things in the real world.
Compound verbs are typically hyphenated, and are comparatively
few in number. Some consist of two verbs strung together ~ freeze-
dry, spin-dry ~ but others are verbal uses of compound nouns ~
short-circuit, rubber-stansp. Compound adjectives are also often
hyphenated. A common pattern consists of an adjective and
participle ~ long-haired, brown-eyed, three-legged — ot of a modi
fier and superordinate adjective ~ navy blue, powder blue, royal
blue.
Compounds are generally fixed but their institutionalisation can
vary as widely as any other lexical items. The degree to which they
are compositional varies too. In general, compounding is an
extremely productive process in word-formation, Fuller discussion
can be found in Bauer (1983: 201-16).
Phrasal verbs are combinations of verbs and adverbial or prepositional
particles. The verbs are typically but by no means always mono-
syllabic, and of Germanic origin: particularly prolific are such
verbs as come, get, go, put, and take. The commonest particles are
up and out, followed by off, in, on, and dowrt, Many of the phrasal
verb combinations themselves are very frequent. In The Bank of
English, give up constitutes just over 5 per cent of the evidence for
the lemma give (a lemma is the set of inflected forms which
comprise a single word: in this case, give, gives, gave, giving,
‘given). Yet itis a common item in its own right, with a frequency of
around 6o occurrences per million words of corpus text: roughly
the same level as lemmas such as address, adopt, airline, airport
and appearance.
45Vocabulary and description
Phrasal verbs are typically a phenomenon of English and a few
cognate languages such as Dutch, and they are usually considered
problematic in terms of L2 teaching and learning for a number of
reasons, not least because they are common and fixed. They have
specialised meanings: these may range in compositionality from
transparent combinations such as break off and write down,
through completives such as eat up and stretch out, where the
particle reinforces the degree of the action denoted by the verb, to
relatively opaque combinations such as butter up and tick off.
Phrasal verbs have particular syntactic problems such as the place-
‘ment of any nominal or pronominal objects with respect to the
verb. They are stylistically heterogeneous, sometimes unmarked
(give up as against relinquish, forsake, abandon, cede, yield) but
sometimes informal or jargonistic (chill owt, hang out, talk up, wise
up). There are many distinctions between British and American
English. For example, the varieties have different meanings for tick
off, and in combinations with roundlarownd, British English
prefers round and American around, Lastly, phrasal verbs are often
presented as arbitrary combinations which cannot be analysed and
rationalised.
‘As in other cases, the stylistic and syntactic considerations
ultimately operate at the level of the individual item. However, the
situation is more complicated when their semantics are considered,
There are in fact systems underlying combinations, and neologisms
develop by analogy and in accordance with these systems. Phrasal
verbs are motivated and not arbitrary formations. For example, off
can be combined with the verbal use of most nouns which
designate barriers: hence block off, box off, cordon off, curtain off,
fence off, wall off and so on. This kind of information is rarely
made explicit in dictionaries, although it is sometimes implied in
entries for the individual particles, or in special features such as the
Particles Index in The Collins COBUILD Dictionary of Phrasal
Verbs (1989).
Idioms are a very complex group: not least because the term ‘idiom’
frequently occurs in the literature with a variety of different mean-
ings. I shall be using it in a relatively narrow sense, to refer to
multi-word items which are not the sum of their parts: they have
holistic meanings which cannot be retrieved from the individual
‘meanings of the component words. Classical examples include spill
the beans, have an axe to grind, and kick the bucket. Idioms are
typically metaphorical in historical or etymological terms. The
metaphor may be relatively straightforward to decode, as in a
snake in the grass ot bite off more than one can chew, ot obscure,
46
i
Vocabulary connections: multi-word items in English
as in kick the bucket and rain cats and dogs. Idioms rate highly in
terms of non-compositionality. With regard to institutionalisation
they are generally infrequent: the purely compositional string kick
the ball is seven times as common in The Bank of English as kick
the bucket. In terms of fixedness, they are often held to be relatively
frozen and to have severe grammatical restrictions, but it will be
pointed out later that idioms are by no means as fixed as conven-
tional accounts suggest.
Fixed phrases has been deliberately chosen here as a very general term
to cover a number of multi-word items which fall outside the
previous categories. They include items such as of course, at least,
in fact, and by far as well as greetings and phatics such as good
morning, how do you do, excuse me, and you know. Many of
these are strongly institutionalised, in that they are very high
frequency items, and many are strongly fixed. Their composition-
ality is variable in kind and degree, and may arise from the fact
that they are grammatically ill-formed or because they have
specialist and non-predictable pragmatic functions. Similes ~ white
as a sheet, dry as a bone ~ and proverbs = it never rains but it
pours, enough is enough ~ can also be included in this category:
these kinds of item are typically very infrequent and often unstable
in form.
Prefabs: Finally, there is another group which has recently been the
subject of some of the most innovative research in English idioma-
tology. I shall refer to them as prefabricated routines or prefabs.
They are also referred to as ‘lexicalised sentence stems’ (Pawley
and Syder 1983} or ‘ready-made (complex) units’ (Cowie 1992),
and Nattinger and DeCarrico (1989, 1992) call them ‘lexical
phrases’, although they use this as a superordinate term to encom-
pass other kinds of multi-word item. Prefabs are preconstructed
phrases, phraseological chunks, stereotyped collocations, or semi-
fixed strings which are tied to discoursal situations and which form
structuring devices. For example, the thingifactipoint is, that
reminds me, I'm a great believer in... and so on, They are
institutionalised because they are consistently and frequently used
as particular kinds of signal or convention, but they often vary
rather than being completely frozen. Their non-compositionality
stems from their discoursal uses, since their surface meanings can
be readily decoded,
Setting out categories in this way is not just an abstract task but a way
of identifying and drawing attention to the very range of units and their
differences. There are inevitably overlaps between the categories. For
a7Vocabulary and description
example is what are you driving at? a form of a phrasal verb, ot a
prefab? This, however, merely reflects the fact that there are few discrete
‘categories in the lexicon: things simply do not work like that.
‘The question might well be asked at this point: how many multi-
word items are there in English? ‘To which an appropriate answer is:
ow long is a piece of string? There is no canonical list of multi-word
items. The largest specialist dictionaries of English multi-word items,
The Oxford Dictionary of Phrasal Verbs (1993) and The Oxford
Dictionary of English Idioms (x993) contain some 15,000 phrasal
verbs, idioms and fixed phrases, but the total number of multi-word
items in current English is clearly much higher. There is no clear
boundary to the set of prefabs, and almost no limit co the number of
compounds which might be coined as terms. Furthermore, language
‘changes and vocabulary items fall in and out of use. The set of multi-
Word items is effectively open-ended and is not static. However, it is
safe to say that many thousands of multi-word items are in use and
within the competence of proficient speakers of English.
Traditions and models of multi-word items
‘The field of idiomatology or combinatorics is one of the most heavily
explored in lexicology. In addition to work in Britain and the US, there
are rich literatures and strong traditions in German and Russian/East
European lexicology. Substantial critical reviews in English are provided
by Makkai (1972), Fernando (1978), Fernando and Flavell (1981),
Wood (1981), and Gliser (1988). Weinreich (1963) gives a rare English~
language overview of the earlier Sovie/Russian tradition. 1 do not
propose to look at psycholinguistic aspects here, but Cacciari and
Tabossi (1993) provide a useful collection of papers on this, which
covers the main ideas involved.
Te is important to draw attention to the complexity of the subject,
although this chapter is not the place for an in-depth study of the
theoretical aspects of word combinatorics. At least some of the unruli-
ness and apparent conflicts in the literature result directly from there
being substantially different models applied to multi-word items, which
foreground different characteristics.
Semantics-based models are in many respects the most traditional, ‘They
lattempt to differentiate between categories of multi-word items ac-
cording to degrees of compositionality, and they aim to identify, as it
were, the irreducible semantic building: blocks of the lexicon. Important
work here is Makkai (op. cit.) and Mitchell (1971)
48
Vocabulary connections: multi-word items in English
{In contrast, syntax-based models take grammatical well-formedness as
their starting-point. Multi-word items ~ and in particular idioms and
fixed phrases ~ are often non-compositional because they do not obey
sules. For example, kick the bucket never passivises, by and large and
ypow come? are grammaticaly illormed. In these models, the structural
peculiarities of multi-word items become criterial features. Important
papers here are Katz and Postal (1963), Weinreich (1969), and Katz
(i973); also Fraser (1970), who develops a model of frozenness or
grammatical rigidity in idioms, with seven levels on a scale of idiomati-
ity. Both Makkai (op. cit.) and Healey (1968) incorporate structural
properties into their analyses of multi-word items, although semantics
remains their starting-point.
Soviet/Russian traditions in combinatorics have tended to focus on
collocation and types of collocation. This leads to a strong emphasis
being placed on phraseology and usage and fits well with the sorts of
observation of the lexicon which are made by corpus linguistics. They
then build in semantic (and syntactic) criteria in order to separate out
classes of, for example, pure idioms.
Lexicography has to varying degrees been influenced by all these
ideas. Unfortunately, dictionaries perpetuate a black and white distine-
tion between kinds of items. Either something is a ‘phrase’, ‘phrasal
verb’, or ‘compound’, or it is not: a decision has to be taken because of
placement conventions in paper books. Lexicographical techniques can
be devised to deal with hybrid cases such as restricted collocations or
completive particles such as out in semi-phrasal verbs such as spread
‘out, for example by mentioning the collocating word explicitly in the
definition. A traditional form of words in British lexicography has been
something along the lines of ‘torrential (of rain) very heavy’ or ‘spread
(often followed by out) to extend, move, or open outwards...
However, the very fact that this kind of important information is
embedded into the definitions of senses of single words may imply that
the collocations are peripheral features of these senses and words, oF it
may reinforce an inappropriately rigid distinction between single and
multi-word items. It may also mislead by subordinating within the entry
some items which are very important in the overall vocabulary. For
example, if give up is made a run-on or subordinate part of a dictionary
entry for give, it will inevitably seem less prominent and less important
than much less common vocabulary items which are given full head-
word status.
Models provide ways of categorising the different kinds of unit in the
lexicon: rather like sorting out chemical compounds from elements. But
a layperson might well argue that what is important about, say the
difference between substances like carbon and carbon dioxide ~ or
9Vocabulary and description
indeed graphite and diamonds is what they are used for: not what they
consist of. A further group of models/approaches can be termed
functional. Important work here is Pawley and Syder (1983), and
Nattinger and DeCarrico (1992). Here, multi-word items are integrated
into the vocabulary in terms of their pragmatics. This leads to a more
practical approach where multi-word items can be integrated into a
dynamic model of language-in-use, rather than language-as-artifact,
and seen as enabling devices (see further below).
Is it possible to synthesise these models? To some extent, yes. Models
can ~ and should - be developed using a complex of features: semantic
criteria, rule-conformism, collocation fixedness and so on. As Fernando
and Flavell say:
idiomaticity is a phenomenon too complex tu be defined ia
terms of a single property. Idiomaticity is best defined by
multiple criteria, each criterion representing a single property.
(x981: 19)
Yet the range of multi-word items is sufficiently heterogeneous that itis
difficult to see how any model can account for all types in a way that is,
helpful to theorists, practitioners and learners alike. More important,
corpus evidence consistently calls into question the givens of idioma-
tology and even suggests a need for new kinds of model altogether.
Multi-word items and corpus evidence
The models in the previous section set out to prove their robustness
through conventional modes of argumentation such as establishing
examples and counter-examples, exceptions to rules and so on. They
have, however, for the most part been based on intuition, introspection
and idiolect. In contrast, corpus linguistics over recent years has made it
possible to examine lexis in a more scientific and objective way. ‘First
generation’ corpora (Leech 199%: ro) of up to one million words
showed limited evidence for many multi-word items: they proved
simply too rare and too genre-specific to show up. For example,
Norrick (1985: 6-7) reports that he observed only two instances of
proverbs in Svartvik and Quirk’s Corpus of English Conversation
(170,000 words), and Strassler (1982: 77-81) found only 92 instances
of idioms in a corpus of just over 100,000 words of spoken interaction
of various kinds,
‘Second generation’ corpora of around 20 million words were able to
improve on this situation. In a study of a corpus of 18 million words of
British English, the Oxford Hector Pilot Corpus, I investigated the
50.
Vocabulary connections: multi-word items in English
distributions of 6,700 idioms and fixed phrases: the set of items more or
less corresponded to the sort of set to be found in the large British
learners’ dictionaries. The results are reported in detail in Moon (1994
and forthcoming). To summarise them: I found that more than 70 per
cent of the items I looked at had frequencies of less than per million
words of corpus text. (To set this in context, some single words with
frequencies in The Bank of English of one per million are algebra,
altruistic, chairperson, predictability and unaccompanied.) In fact, 40
pet cent of my target set of items occurred in the corpus with such low
frequencies that they were no better than random chance: that is, it was,
entirely a matter of chance that these items were found at all, and so
their presence or absence was statistically insignificant or meaningless.
Of the more frequent items, 21 per cent of the whole set had frequencies
in the range 1-5 per million; 4 per cent in the range s—10 per million;
and just over 3 per cent had frequencies of ro per million and above.
Only 16 individual items occurred more often than 100 per million, and
these included at all, at least, in fact, of course and take place. These
figures are set out in Table 1
Table 1 Overall corpus frequencies of idioms and fixed phrases
Rate of occurrence in corpus
percentage of items
Jess than 1 per 4 million words 40%
1 per 1-4 million words 32%
4-2 er million words 12%
2-5 per million words 9%
5-20 per million words 4%
30-50 per million words 3%
§g0-Z00 per million words a%
‘over 100 per million words a%
total 100%
Certain kinds of item were found to occur much less frequently than.
others. Almost no similes and proverbs occurred more frequently than
one per million, and most occurred with such low frequencies that they
must be considered less good than random chance. Of the idioms I
looked for, half occurred with frequencies which were less good than
random chance. Thirty-seven per cent of the set had frequencies which
were statistically significant but still les than one per million, and only
11 per cent of the set occurred more often than this. The frequencies are
set out in Table 2.
suVocabulary and description
‘Table 2 Frequencies of idioms, proverbs, and similes
Rate of occurrence in corpus idioms proverbs similes
Jes than x per 4 million words 51% 84% 91%
x per 1-4 million words 37% 16% 8%
1-2 per million words 8% «1% 1%
2-5 per million words 3% a% 0%
5-10 per million words a% 0% %
40-50 per million words a% 0% 0%
lover 5a per million words 0% o% 0%
total 100% 100% 120%
(number often investigated 2657 376 146)
These are benchmarking statistics, but crosschecking with other
corpora throws up comparable figures and distributions. For example,
‘some recent research at COBUILD looked at over 4,000 idioms, using
The Bank of English (a ‘third generation’ corpus with several hundred
million words). Thirty per cent occurred less often than once per
ten million words of corpus text; 35 per cent occurred 1~3 times per ten
million words, and while around 20 per cent had frequencies of at least
one instance per two million words, very few of these reached the one!
million threshhold.
‘Of course, distribution statistics are affected by corpus content, and
the following section will consider multi-word items and genre.
However, the general tendencies are shown up again and again in
different corpora. There are a lot of multi-word items in the language
but a lot of them are very infrequent.
Variability in multi-word items
One case where the use of corpora has pointed up some shortcomings in
previous idiomatological views relates to the stability of the forms of
‘multi-word items. Lexical phrases and prefabs are often fluid, of course,
bur idioms also show remarkable degrees of variation. In my study of
expressions in an r8-million-word corpus, I found that 40 per cent of
the items under investigation regularly varied and were unstable in
form: this figure does not include deliberate, jocular, or ad hoc exploita.
tions of idioms as in puns. The findings are discussed in detail in Moon
(1994a and forthcoming), and they are entirely borne out by recent
work with The Bank of English.
52
Vocabulary connections: multi-word items in English
Some of the principal kinds of variation are represented in the
following:
British/American variations:
‘not touch someone/something with a bargepole (British)
not touch someone/something with a ten foot pole (American)
hold the fort (British)
hold down the fort (American)
varying lexical component:
burn your boats/bridges
throw in the towel/sponge
unstable verbs
show/declare/reveal your true colours
costipaylspendicharge an arm and a leg,
sruncation:
silver lining/every cloud has a silver lining
last strawrit’s the last straw that breaks the camel's back
transformation:
break the icefice-breakerlice-breaking
blaze a trail/rail-blazer/teail-blazing,
In extreme cases, there are no fixed lexical items at all, but merely some
sort of lexico-semantic core which can be considered an idiom-schema
(Moon, op. cit. and forthcoming):
‘wash your dirty linen/laundry in public (mainly British English)
air your dirty laundey/linen in public (mainly American English)
do your dirty washing in public (British English)
washlair your dirty linen/laundry
wash/air your linen/laundry in public
dirty washing/linen/laundry
This suggests that in any new model of idiom, it might be better to have
a notion of ‘preference of form’ or ‘preferred lexical realisation’ rather
than ‘fixedness of form’, and better to build in the fact that there is a
complex relationship between deep semantics and surface lexis, rather
than it all being a simple case of individual anomalous strings with non-
compositional meanings.
Multi-word items in text and discourse
Corpora provide one kind of evidence for multi-word items; texts
provide another. By looking at the densities of different kinds of multi-
word item in particular text types, it can be seen that there are often
53Vocabulary and description
strong gente preferences. For example, idioms are especially associated
with journalism as well as informal conversation, Certain subgenres
seem to attract exceptionally heavy use of idioms. McCarthy (1992b:
62) and McCarthy and Carter (1994: 113) point out the frequent use of
idioms in horoscopes in journalism, as the following demonstrate:
Taurus: Try as you might to keep your feet on the ground, a
relationship is absorbing both your time and imagination.
However, wait until after November 3 if you are planning to
hitch your wagon to this star. (Weekly Observer [Birmingham]
27 October 1995)
Leo: Pride comes before a fall. Besides that, although colleagues,
relatives or partners are inclined to get on your nerves or your
wick, no on can really hold a candle to you this week.
(Metronews [Birmingham] » November 1995)
In such horoscopes, the use of items with general applicability, rather
than specific meanings, enables any number of interpretations on the
part of the reader/client, and so maximises the chances of their being
relevant to his/her individual situation, as well as lightening the tone
and enhancing the interpersonal relationship between writer/expert and
readerictien
want to explore multi-word items, text and genre by looking briefly
at three more extracts. Firstly, one from a handbook on painting:
‘The binder for oil paint consists of a vegetable drying oil, such
as linseed or poppy oil, which dries by absorbing oxygen from
the air. This is known as oxidative drying, and is a very slow
process. Acrylic paints, in contrast, are physically drying, which
‘means that they dry rapidly through evaporation of the water
contained in the binder. As the water evaporates, the acrylic
resin particles fuse to form a fairly compact paint film in which
each minute particle of pigment is coated in a film of resin, The
result is a permanently flexible paint film which is water-
resistant, does not yellow and reveals no sign of ageing.
(Collins? Artist's Manual 1995: 148)
Itis a fairly technical piece of writing, and it includes a lot of compound
words: oil paint, poppy oil, oxidative drying (here defined as a term),
acrylic paint, water-resistant, as well as one-off compound strings such
as paint film. The only other multi-word item is in contrast: a common
discourse-structuring device. The multi-word items chosen reflect the
genre: technical terms and clear signalling of structure and clause
relationships. This choice is entirely predictable.
54
Vocabulary connections: multi-word items in English
Secondly, the opening of a report on a soccer match, which reveals
quite different distributions:
‘What a sorry, sorry night for English football. Another lesson,
another pupil tured master. Leeds went out 8-3 on aggregate.
| ‘The expectation was that Leeds would go down with all guns
blazing, pursuing the impossible dream with the gusto which
hhad characterised their spirited fightback in the first leg. It was
not to be. PSV Bindhoven are too accomplished, too astute and
| ‘methodical a team to permit that sort of gung-ho nonsense.
Chasing goals to complete the most improbable of recoveries,
| Leeds succeeded only in leaking them at regular intervals. It was
not a pretty sight. Ourplayed hy men of vastly superior tech-
nique, it was, if not embarrassing, then utterly comprehensive.
Seeking to overturn a 5-3 deficit, the Leeds manager Howard
Wilkinson was expected to throw caution to the wind. He
decided instead that a difficult task would be approached
without the former England striker Brian Deane. (The Guard-
ian: 1 November 1995)
This is a highly emotive and evaluative text, where the evaluation is
conveyed by the use of words such as the adjectives sorry, spirited,
accomplished, astute, methodical, gung-ho and so on, and by thetorical
devices such as the chain of contrasts and repetition, as in What a sorry,
sorry night and Another lesson, another pupil turned master. Multi
word items include the compound adjective gung-ho, the phrasal verbs
go out (semi-technical in sport) and go down (informal), the fixed
phrase om aggregate (semi-technical in sport), and the idioms all guns
blazing and throw caution to the wind, as well as the items the
impossible dream and not a pretty sight. Note in particular that the
idioms (with) all guns blazing and throw caution to the wind are used in
complex ways: each evaluates positively what is being set up as the
desirable course of action, but each is then contrasted with the actual
| situation which is evaluated as unsatisfactory. So the idioms both
evaluate and form prefaces to evaluations.
Finally, a short extract from a screenplay
sures Hash is legal there, right? [there = in Amsterdam]
vincent Yeah, it’s legal, but it ain’t a hundred per cent legal. 1
‘mean, you just can’t walk into a restaurant, roll a joint, and
start puffin’ away. I mean, they want you to smoke in your
home or certain designated places.
yuLEs Those are hash bars?
55Vocabulary and description
Vincent Yeah, it breaks down like this: its legal to own it and,
if you're the proprietor of a hash bar, it’s legal to sell it. I's
legal to carry it, but, but, but that doesn’t matter ‘cause ~ get
a load of this, alright ~ if the cops stop you, it’s illegal for
them to search you. I mean that’s a right the cops in
Amsterdam don't have.
(Q Tarantino Pulp Fiction. 1994: 14)
‘This interchange attempts to replicate natural speech patterns. It
includes compounds such as hash bar and the deliberately discordant
legalese of certain designated places; the intensifier a hundred per cent;
the informal phrasal verbs puff away and break down, and prefabs and
fixed phrases I mean, get a load of this, and al(!)right. It has a different
flavour altogether.
By looking at che multi-word items in texts, their stylistics can be tied
into text analyses and enable fuller understanding of the text and often
its subtext: this is demonstrated in an analysis of a newspaper editorial
in Moon (r994b). It also shows up the fact that multi-word items have
important roles with respect to the structure of text. For example, to
‘generalise crudely, compounds typically denote and have high informa
tion content ~ often because they are technical terms or have specific
reference. Fixed phrases and prefabs often organise and provide the
framework for an utterance or the argument of a text; or they are
situationally bound, as in ritualistic formulae of greeting, thanking and
so on. Idioms typically evaluate and connote, and are shorthand,
therorically powerful ways of conveying judgements. What also appears
in studying them in context is that idioms often have other discoursal
roles, for example as prefaces or summarisers.
The expectation was that Leeds would go down with all guns,
blazing,
prefaces a restatement in
pursuing the impossible dream with the gusto which had
characterised their spirited fightback. .
as well as prefacing the confounding of the expectation in It was not to
be. Idioms are textual choices as well as semantic choices, and this is a
crucial point: they are rhetorically significant. Discussion of this and
related points can be found in Moon (1992); McCarthy (1992b), Moon
(2994a and forthcoming), and McCarthy and Carter (1994).
Recent work on the teaching of units above the level of single words
has indeed emphasised the importance of teaching multi-word items as
enabling devices, providing structures and frames, and functioning
56
Vocabulary connections: multi-word items in English
within real discourse situations. For example, Nattinger and DeCarrico
ist the following as ‘Summarizers’
OK so
so then
ina nutshell
that’s about it/all there isto it
remember that this means X
in effect
to make a long story short
what I’m trying to say is X (1992: 95)
These strings are idiomatic, but different in kind. Nattinger and
DeCarrico (op. cit: 113-73) demonstrate convincingly the utility of
teaching/learning such units so that the canonical structures and frames
of different discourse genres may be better recognised and reproduced.
Similarly, Willis (1990: p. v, 83, 115-20) draws attention to discourse-
structuring frames in his discussion of a lexically-oriented syllabus for
Lz language learners. Coulmas (19792, 1979b) and Nattinger (1980)
also discuss the significance of routines in pedagogy, and more prag-
matic approaches to the items. The examination of texts shows up the
crucial importance of this: the multi-word items chosen are not arbi-
trary or casual, bur integral parts of the whole discourse,
‘Second-language learning perspectives
‘Multi-word items are typically presented as a problem in teaching and
learning a foreign language. Their non-compositionality, whether syn
tactic, semantic or pragmatic in nature, means that they must be
recognised, learned, decoded and encoded as holistic units. The phe-
nomena of multi-word items and idiomaticity are generally held to be
universals in natural languages: this is discussed, for example, by
Makkai (1972) and Fernando and Flavell (1981). Foreign learners are
therefore likely to be aware of similar phenomena in their first language.
Carter points out:
‘The emphasis on problems may in itself be dangerous since it
concedes to idiomaticity and fixed expressions a problematic
status and this ignores arguments concerning the naturalness
and pervasive normality of such ‘universal’ relations in lan-
guage. (1987: 136)
Similarly, Baker and MeCarthy comment:
57Vocabulary and description
The more naturally MWUs {multi-word units] are integrated
into the syllabus, the less ‘problematic’ they are. (1988: 32)
At the same time, multi-word items are language-specific and they have
particular sociocultural connotations and associations: see, for example,
‘Alexander (1985, 1989). Even where analogous multi-word items exist
in both Lx and L2, they are unlikely to be exact counterparts, and there
may be different constraints on their use: Odlin (1989: 55), points out
restrictions on proverb use in English in comparison with other cultures,
This contributes to the difficulties facing Lz learners when confronted
with multi-word items.
Because many multi-word items, particularly metaphorical multi-
word items, are marked, infrequent, and generally considered ‘difficul’,
they may be taught sparingly as receptive vocabulary items. In dis
cussing pedagogical aspects of multi-word items, Gairns and Redman
{1986 35) point out that many Lz speakers are communicating with
‘other L2 speakers rather than Lx speakers, and the use of ‘idioms’ or
idiomatic strings may inhibit full understanding: thus the use of multi-
word items breaches conventions of politeness. Multi-word items may
also be used by L2 speakers in inappropriate semantic or discoursal
contexts, leading to further communicative errors. Teachers and lear
ners can avoid the problem by avoiding such items altogether, and in
many cases this can be justified because of the relative infrequency of
occurrence of many kinds of multi-word item, as discussed above. Yet
this is not a real solution, The appropriate use and interpretation of
multi-word items by L2 speakers is a sign of their proficiency, as is
pointed out, for example, by Kjellmer (1991), particularly with regard
to the creative exploitation and manipulation of multi-word items, and
Low (1988), with regard to both institutionalised and non-institutiona-
lised metaphors. It is also precisely the point made in the discussions of
routines by Pawley and Syder (1983), Nattinger and DeCattico (1992),
and Coulmas (1979, 1979b). It is a difficult situation: these items are
hard, but they need to be acquired at some stage. And the difficulty of
the situation is compounded by the inadequacy or misleadingness of
‘many teaching and reference materials.
Errors in the use of multi-word items can be categorised crudely as
formal, pragmatic or stylistic. Formal errors with multi-word items may
arise simply through failure to recognise a string as non-compositional..
Bensoussan (1992: 106) reports errors by Hebrew-speakers or Arabic
speakers such as £0 wonder a litle for little wonder and the smallest for
at least. There may be lexical errors, and Irujo (1986: 296) reports kill
two birds with one rock, swallow it hook, cord, and sinker, and come
Iow or high water in a group of Venezuelan Spanish-speakers; also putt
58
Vocabulary connections: multi-word items in English
something fast on her (confusing pull a fast one and put something over
‘on someone) and kicked the towel (confusing throw in the towel and
ick the bucket), Idioms may be transferted and translated literally,
although the resulting calque is not an institutionalised item in the
target/L2 language. For example, a personal letter from a French-
speaker with a native-like fluency in English contains the following:
‘This must be totally uninteresting to you. You must have ‘other
cats to thrash’.
He uses a calque of French avoir d'autres chats d fouetter instead of the
analogous English idiom have other fish to fry. In this case, the writer
thas demarcated the calque in quotes and may well have intended the
choice to be jocular and marked, but the problem of mistranslation and
miscommunication remains. In other cases, there may be interference
from similar multi-word items in the Lr, although, as Irujo (op. cit:
292) points out, it is not always possible to identify these as discrete:
For example, does put your leg in your mouth result from
interference from the similar Spanish idiom meter la pata (to
put in the leg’), or is it an overextension of the English word
foot? (1986: 292)
Malcolm Coulthard (personal communication) observes that the pro-
blems of interference and mistranslation may be exacerbated with L1s
such as Malay where a single lexeme kaki denotes both ‘foot’ and ‘leg’.
Finally, formal errors may arise where syntactic ‘rules’ for multi-word
items are not known or observed, so that items are strangely pluralised,
or used in an untypical tense, aspect or voice.
Pragmatic errors include those arising from the use of multi-word
items in inappropriate discoursal contexts or from misunderstandings of
the discoursal situation in some way. This kind of error may be
intervarietal as well as interlingual: for example, speakers of British
English and speakers of American English have different formulaic
routines in everyday interactions such as greetings and shopping. How
are you today? may be interpreted by a British speaker as a request for
information about their health, but by an American speaker as a simple
greeting formula, Pragmatic errors also include the use of multi-word
items with inappropriate or aberrant evaluations. For example, sit on
the fence might be used to mean ‘stay impartial’ with no negative
evaluation, whereas it is usually used to criticise a refusal or failure to
‘commit. Similarly, pearl of wisdom might be used to express approval
‘of what someone has said, whereas it is typically used ironically by
native speakers to express contempt. There are obvious problems in
59Vocabulary and description
encoding if the wrong evaluation is given, and in decoding if the implied
evaluation is not understood.
Stylistic errors in the use of multi-word items may arise through use
of an excessively marked multi-word item ~ very rare, dated or over-
informal ~ or in an inappropriate genre. In recommending the teaching.
of only ‘useful’ multi-word items, Gairns and Redman comment:
in deciding what is useful, it is worth considering whether
fan idiom can be incorporated into the students’ productive
vocabulary without seeming incongruous alongside the rest of
their language. Certain native speakers might ‘get the ball
rolling’, but few foreign learners could carry off this idiom
‘without sounding faintly ridiculous. (1986: 36)
Such concerns about the use of (marked) multi-word items by non-
native speakers and their ‘sounding faintly ridiculous’ may be prompted
by experience of these speakers’ subtly infelicitous uses of multi-word
items, or overuse of very marked or rare items.
‘A number of studies report that L2 leamers typically avoid using
multi-word items, even where the languages are closely related and have
apparently parallel expressions. Irujo (op. cit.) observes this in relation
to Spanish and English, and Kellerman (1977) in relation to Dutch and
English: similarly Hulstijn and Marchena (r989) with respect to Dutch
and English phrasal verbs. The most likely reason for this is that non-
native speakers are suspicious of apparently cognate or identical items
in their two languages. They have learned to be wary of ‘false friends’
and know only too well that chere may be subtle but crucial distinctions
in meaning, usage, or register which may lead to misreadings and
misunderstandings. These distinctions may be imagined; but they may
be real, as is the case of a pair cited by Fernando and Flavell (1981: 83}:
skate on thin ice is conceptually and semantically related to Serbian
navuéi nekoga na tanak led ‘pull someone onto the ice’, but the English
implies that someone is voluntarily taking a risk and the Serbian that
they are forced to behave in a risky way. A complicated case is pointed
cout by Platt et al., (1984: 108): Singaporean and Malay English have an
multi-word item to shake legs ‘to be idle’, a calque of Malay goyang
kaki, which has the opposite connotation and meaning from the similar
British English to shake a leg ‘to be active, to get up’
Teaching multi-word items
‘Various pedagogical techniques for the acquisition of multi-word items
have been suggested. Celce-Murcia and Rosensweig (1979: 251) suggest
60
Vocabiilary connections: multi-word items in English
that the most appropriate strategy for teaching them is the use of short
logues: certainly, at more advanced levels the use of contextualised
examples would show up discoursal features. Lattey (1986) advocates
a pragmatic classification of multi-word items which would foreground
correlates and restrictions as well as setting them in appropriate
sociocultural frames. Ghiser (1988: 264) advocates classifying multi-
word items according to subject arcas, or the speech acts they encode,
as well as making (advanced) learners aware of the range of multi-
word item types and of the differing degrees of transparency and
opacity. Irujo (op. cit.: 298) recommends that full advantage be taken
of latent knowledge of Lr multi-word items. She also suggests that
multi-word items to be taught should be ‘carefully chosen on the basis
of frequency, need, transparency, and syntactic and semantic simplicity”
(ibid: 300).
In fact, frequency is often mentioned as a useful criterion for judging
which items should be taught. Alexander (1987: 114-5) comments that
Jearners should be made aware of the relative frequencies of multi-word
jtems in the La, as do Carter (op. cit.) and Carter and McCarthy (1988:
56). The use of corpora helps here to provide a more objective and less
idiolectal or idiosyncratic basis for judgements about frequency, since
frequency is very hard to assess intuitively. For example, Arnaud
{1992a) established a rank-list of the French proverbs best known to his
informants, university undergraduates, but the rank-list conflicts mark-
edly with his survey of proverb frequencies in spoken and written data
{Arnaud and Moon, 1993).
‘Table 3 Multi-word items and frequencies
item teaching intuitions frequency per million
in The Bank of English
tp of my tongue possibly productive we
(Research and Practice in Applied Linguistics) Martha C. Pennington, Pamela Rogerson-Revell - English Pronunciation Teaching and Research - Contemporary Perspectives-Palgrave Macmillan UK (2019)