100% found this document useful (1 vote)
912 views12 pages

Multi-Word Items in English

Analysis

Uploaded by

Diego Eduardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
912 views12 pages

Multi-Word Items in English

Analysis

Uploaded by

Diego Eduardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
4.3 Vocabulary connections: mul items in English Rosamund Moon (COBUILD ane te Ureversty of etringham Introduction In looking at vocabulary, it is natural to focus on the word as the primary unit. Dictionaries help to reinforce this by representing the lexicon as a series of headwords or individual lexical items. But while this isa practical approach, it may also he dangerously isolationist, as in many respects a ‘word? is an arbitrary unit. Itis, after all, just a string, of characters, or a sequence of one or more morphemes, which is bounded at either end by a space or by punctuation, Text studies and corpus studies have revealed the significance and the intricacy of the links between words: for example, their strong clustering tendencies and the patterns which are associated with them. This chapter will consider lexical connections between words in English, with particular emphasis ‘on multi-word lexical items. It will then review some of the problems facing L2 teachers with respect to these items. Collocation Traditional models of language ~ or at least models of Western European languages — are generally built on grammatical principles, with the clause or sentence being the focal unit. In such models, connections are the syntactic relationships between elements in the clause or sentence. A sentence such as: ‘The bushes and trees were blowing in the wind, but the rain had stopped. (The Bank of English: a collection of over 300 million words of written and transcribed oral English texts which is held at COBUILD, the University of Birmingham, and is ‘managed by HarperCollins Publishers) contains 14 orthographic words and it can be analysed in a number of different ways, according to the grammatical model in operation. For 40 Vocabulary connections: multi-word items in English example, it is a single sentence, consisting of two finite clauses; it contains two noun-phrases and two verb-phrases at a primary-level analysis in a transformational-generative model, or two noun phrases, ‘wo verb phrases, and one adjunct or prepositional phrase in a systemic model of grammar. The syntactic connections manifest in the sentence enable the hearer/reader to (re}construct its meaning, In a similar vein, a discourse-based grammatical model would draw attention to the cohe- sive significance of the occurrences of the; the way in which the time frame is indicated, and the logical connection or in this case disjunction, signalled by but. ‘In contrast to these, a collocationist model would take into account considerations such as the predictability of the co-oceurrences of words in the slots that constitute the underlying structural frame. For example, it might consider the statistical significance of the lexical frame “SOME” THING blowing in the wind’ as against ‘the wind blowing SOME- THING’, or the significance of rain being followed by the verb stop rather than end, or finish or any other verb for that matter, (In fact, in The Bank of English, the commonest verb which follows rain as subject is fall, and stop is more typical or usual with rain than end.) Collocation studies are now inevitably associated with corpus studies, since it is difficult and arguably pointless to study such things except through using large amounts of real data. Important papers on lexical collocation are Halliday (1966), Jones and Sinclair (1974), and Sinclair (1987 and 1991: see below). Church and Hanks (r990) outline a statistical basis for calculating collocational significance. Such analyses and observations are far-reaching and of great impor- tance, Blow and wind are not merely random co-occurrences in this, individual sentence, nor are they just inevitably collocates because they happen to fall into the same lexical set of ‘weather’ and are topic- related. But itis part of the meaning (in the broadest sense of the word) of ‘wind? that it blows and causes things to blow about, and it is part of one of the meanings of ‘blow’ that it is what meteorological phenomena such as the wind do (see also Firth, 1951/1957, who sees collocations as part of a word’s meaning). ‘This can be seen more strikingly with the first word of the following, example: ‘Torrential rain burst river banks and flooded homes in the North-East. [The Bank of English] ‘The adjective torrential is shown in The Bank of English to be severely restricted in lexicogrammatical terms. Ninety-eight per cent of in- stances are in premodifying position, 99 per cent collocate with the word rain or (much less often) a semantically related word such as 4 Vocabulary and description downpour or storm. It is part of the meaning of torrential that it concerns, qualifies and categorises ‘rain’: lexical form and meaning are inseparable here. In his central paper on collocation, Sinclair (1987: 519-3253 also 1ggt: 110-115) sets out two principles which account for the structural patterning of lexis. They are opposites which are complementary and co-exist. The open choice principle: «is a way of seeing language text as the result of a very large number of complex choices. At each point where a unit is completed ~ a word or a phrase or a clause ~ a large range of choice opens up, and the only restraint is grammaticalness. ‘Thus the open choice principle is essentially a traditional approach to language: compare slot-and-fller approaches, and the distinction between syntagm (or syntax-governed structural frame) and paradigm {or lexical/grammatical set of words available for each slot in the frame). In contrast, the idiom principle: is that a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments, So the idiom principle restricts the choices not just in a given slot, but in the surrounding co-textual slots. Just as a syntactic view of language observes rules underpinning, grammatically well-formed utterances, a collocationist view of language observes the strong patterning in the co-occurrence of words. This itself can be seen to some extent as rule-governed and motivated (that is, it reflects some subliminal or underlying system or process of analogy), however prolific the rules are and difficult to codify. ‘Complementing this kind of approach is that of psycholinguistics, which observes how language is processed — and often acquired ~ in chunks or groups of words, rather than on a word-by-word basis. This is explored at length in, for example, the work of Peters (1983). It has important repercussions with respect to vocabulary learning and teaching, since words are again and again shown not to operate as independent and interchangeable parts of the lexicon, but as parts of a lexical system (see Aitchison, 1987). In acquiring full knowledge of a word like the noun wind, a learner has, of course, to acquire its meaning(s), pronunciation and morphology; also its grammatical beha~ viour (noun; countable, also after the with general/homophoric refer- cence]; its set-relationship with hyponyms such as gale and breeze; its collocating verbs (blow, sweep), adjectives (strong, north, cold, light, e Vocabulary connections: multi-word items in English bitter, prevailing), and partitive nouns (gust, breath, puff) in the frame ‘a... of wind’; and so on. Torrential rain is an example of a restricted collocation (Aisenstadt, 19793 1981). Restricted collocations are cases where certain words occur almost entirely in the co-text of one or two other words, or of a narrow set of words. The adjective torrential must be learned as part of some kind of lexical unit. There are many other similar cases. For example, collectives such as packs of houndshwolves/dogs, flocks of sheep(birds/seagulls, and swarms of beeslinsects, and intensifiers such as, stone which occurs only in stone deafideadicoldicold sober. ‘The phenomenon can be seen in irreversible binomials (Malkiel, 1959), where strings such as hot and cold, cause and effect, and Mr and Mrs tend to occur in a fixed order. (The motivation for the ordering is discussed by Lakoff and Johnson (r980: 132-3); also by Carter and ‘McCarthy (1988: 25) who point out the extent to which the ordering is, clture-specific.) The phenomenon can also be seen in valency patterns which govern the typical lexicalisations of the subjects or objects of verbs. For example, PEOPLE guzzle DRINK, VEHICLES guzzle FUEL. All this is the kind of information which is the very stuff of learners’ dictionaries. Multi-word items: terms and categories ‘The remainder of this chapter will look at multi-word items, which in many respects can be seen as extreme cases of fixed collocations. It is essential to begin with terms. There are many different forms of multi- word item, and the fields of lexicology and idiomatology have generated an unruly collection of names for them, with confusing results. In the following, I shall be using a set of fairly general terms which are relatively well-used or understood in the Anglo and Anglo-American traditions, in preference to more specialist terminology. Note that there is no generally agreed set of terms, definitions and categories in use. First, a definition of a multi-word item itself. A multi-word item is a vocabulary item which consists of a sequence of two or more words (a word being simply an orthographic unit). This sequence of words semantically and/or syntactically forms a meaningful and inseparable unit, Multi-word items are the result of lexical (and semantic) processes of fossilisation and word-formation, rather than the results of the operation of grammatical rules. By this token, multi-word inflectional forms of words, for example comparative forms of adjectives or passive forms of verbs, can be separated out and excluded from the category since they are formed grammatically. In the following sentence: 4B Vocabulary and description The bushes and trees were blowing in the wind, but the rain had stopped. were blowing and had stopped are verb groups or verb phrases, but they are not multi-word items. ‘There are three important criteria which help distinguish holistic multi-word items from other kinds of strings. They are institutionalisa- tion, fixedness, and non-compositionality Institutionalisation is the degree to which a multi-word item is con- ventionalised in the language: does it recur? Is it regularly consid- ered by a language community as being a unit? Pawley (1986) discusses the process and fact of institutionalisation of, in his terms, ‘lexicalization’. Fixedness is the degree to which a multi-word item is frozen as a sequence of words. Does it inflect? Do its component words inflect in predictable o regular ways? For example, they rocked the boat and not they rock the boated or they rocked the boats. Similarly, does the item vary in any way, perhaps in its component lexis or word order? For example, another kettle of fish and a different kettle of fish are alternative forms, but on the other hand is not varied to on another hand ot on a different hand. Non-compositionality is the degree to which a multi-word item cannot he interpreted on a word-by-word basis, but has a specialised unitary meaning. This is typically associated with semantic non compositionality: for example when someone kicks the bucket (i.e ‘dies!) they are not actually doing anything to a receptacle with their foot, and cats’ eyes (luminous glass beads set into the road surface to guide drivers) in British English, are not, in any degree biological. However, non-compositionality can also relate to grammar or pragmatic function, For example, of course is non-compositional because it is ungrammatical, and the imperative valediction Take care! can be said to be non-compositional because of its extralin- guistic situational function or ‘pragmatic specialisation’. “These three criteria operate together ~ in spoken English, in conjunction with a phonological criterion where multi-word items often form single tone units, The criteria are not absolutes but variables, and they are present in differing degrees in each multi-word unit. ‘Multi-word item’ is a superordinate term. Looking more closely at the different types of multi-word item: Compounds are the largest and most tangible group, but arguably the least interesting. They may differ from single words only by being written as two or more orthographic words. They cannot properly 44 Vocabulary connections: multi-word items in English be separated out altogether, since variable hyphenation conventions blur the distinction between compound multi-word items and polymorphemic single words. An orthographic example is car park which is also spelled carpark and car-park; 2 morphological ‘example is the group sedan chair, dining-chair and armchair which ultimately are not so very different lexically. At the same time, Pawley (1986: 108-120) makes the important point that hyphena~ tion or fusion of words is a technique by which a string is, designated as a unit and therefore lexicalised. For example, wild flower seems to be a purely compositional, transparent string compare garden flower, wild bird. However, the increasingly common spelling wildflower, predominant in American English, shows how far it is becoming lexicalised as a unit: compare wildlife. Many open or two-word compounds are nouns: Prime Minister, crystal ball, collective bargaining and so on. They are very com- monly terms or titles, or refer to things in the real world. Compound verbs are typically hyphenated, and are comparatively few in number. Some consist of two verbs strung together ~ freeze- dry, spin-dry ~ but others are verbal uses of compound nouns ~ short-circuit, rubber-stansp. Compound adjectives are also often hyphenated. A common pattern consists of an adjective and participle ~ long-haired, brown-eyed, three-legged — ot of a modi fier and superordinate adjective ~ navy blue, powder blue, royal blue. Compounds are generally fixed but their institutionalisation can vary as widely as any other lexical items. The degree to which they are compositional varies too. In general, compounding is an extremely productive process in word-formation, Fuller discussion can be found in Bauer (1983: 201-16). Phrasal verbs are combinations of verbs and adverbial or prepositional particles. The verbs are typically but by no means always mono- syllabic, and of Germanic origin: particularly prolific are such verbs as come, get, go, put, and take. The commonest particles are up and out, followed by off, in, on, and dowrt, Many of the phrasal verb combinations themselves are very frequent. In The Bank of English, give up constitutes just over 5 per cent of the evidence for the lemma give (a lemma is the set of inflected forms which comprise a single word: in this case, give, gives, gave, giving, ‘given). Yet itis a common item in its own right, with a frequency of around 6o occurrences per million words of corpus text: roughly the same level as lemmas such as address, adopt, airline, airport and appearance. 45 Vocabulary and description Phrasal verbs are typically a phenomenon of English and a few cognate languages such as Dutch, and they are usually considered problematic in terms of L2 teaching and learning for a number of reasons, not least because they are common and fixed. They have specialised meanings: these may range in compositionality from transparent combinations such as break off and write down, through completives such as eat up and stretch out, where the particle reinforces the degree of the action denoted by the verb, to relatively opaque combinations such as butter up and tick off. Phrasal verbs have particular syntactic problems such as the place- ‘ment of any nominal or pronominal objects with respect to the verb. They are stylistically heterogeneous, sometimes unmarked (give up as against relinquish, forsake, abandon, cede, yield) but sometimes informal or jargonistic (chill owt, hang out, talk up, wise up). There are many distinctions between British and American English. For example, the varieties have different meanings for tick off, and in combinations with roundlarownd, British English prefers round and American around, Lastly, phrasal verbs are often presented as arbitrary combinations which cannot be analysed and rationalised. ‘As in other cases, the stylistic and syntactic considerations ultimately operate at the level of the individual item. However, the situation is more complicated when their semantics are considered, There are in fact systems underlying combinations, and neologisms develop by analogy and in accordance with these systems. Phrasal verbs are motivated and not arbitrary formations. For example, off can be combined with the verbal use of most nouns which designate barriers: hence block off, box off, cordon off, curtain off, fence off, wall off and so on. This kind of information is rarely made explicit in dictionaries, although it is sometimes implied in entries for the individual particles, or in special features such as the Particles Index in The Collins COBUILD Dictionary of Phrasal Verbs (1989). Idioms are a very complex group: not least because the term ‘idiom’ frequently occurs in the literature with a variety of different mean- ings. I shall be using it in a relatively narrow sense, to refer to multi-word items which are not the sum of their parts: they have holistic meanings which cannot be retrieved from the individual ‘meanings of the component words. Classical examples include spill the beans, have an axe to grind, and kick the bucket. Idioms are typically metaphorical in historical or etymological terms. The metaphor may be relatively straightforward to decode, as in a snake in the grass ot bite off more than one can chew, ot obscure, 46 i Vocabulary connections: multi-word items in English as in kick the bucket and rain cats and dogs. Idioms rate highly in terms of non-compositionality. With regard to institutionalisation they are generally infrequent: the purely compositional string kick the ball is seven times as common in The Bank of English as kick the bucket. In terms of fixedness, they are often held to be relatively frozen and to have severe grammatical restrictions, but it will be pointed out later that idioms are by no means as fixed as conven- tional accounts suggest. Fixed phrases has been deliberately chosen here as a very general term to cover a number of multi-word items which fall outside the previous categories. They include items such as of course, at least, in fact, and by far as well as greetings and phatics such as good morning, how do you do, excuse me, and you know. Many of these are strongly institutionalised, in that they are very high frequency items, and many are strongly fixed. Their composition- ality is variable in kind and degree, and may arise from the fact that they are grammatically ill-formed or because they have specialist and non-predictable pragmatic functions. Similes ~ white as a sheet, dry as a bone ~ and proverbs = it never rains but it pours, enough is enough ~ can also be included in this category: these kinds of item are typically very infrequent and often unstable in form. Prefabs: Finally, there is another group which has recently been the subject of some of the most innovative research in English idioma- tology. I shall refer to them as prefabricated routines or prefabs. They are also referred to as ‘lexicalised sentence stems’ (Pawley and Syder 1983} or ‘ready-made (complex) units’ (Cowie 1992), and Nattinger and DeCarrico (1989, 1992) call them ‘lexical phrases’, although they use this as a superordinate term to encom- pass other kinds of multi-word item. Prefabs are preconstructed phrases, phraseological chunks, stereotyped collocations, or semi- fixed strings which are tied to discoursal situations and which form structuring devices. For example, the thingifactipoint is, that reminds me, I'm a great believer in... and so on, They are institutionalised because they are consistently and frequently used as particular kinds of signal or convention, but they often vary rather than being completely frozen. Their non-compositionality stems from their discoursal uses, since their surface meanings can be readily decoded, Setting out categories in this way is not just an abstract task but a way of identifying and drawing attention to the very range of units and their differences. There are inevitably overlaps between the categories. For a7 Vocabulary and description example is what are you driving at? a form of a phrasal verb, ot a prefab? This, however, merely reflects the fact that there are few discrete ‘categories in the lexicon: things simply do not work like that. ‘The question might well be asked at this point: how many multi- word items are there in English? ‘To which an appropriate answer is: ow long is a piece of string? There is no canonical list of multi-word items. The largest specialist dictionaries of English multi-word items, The Oxford Dictionary of Phrasal Verbs (1993) and The Oxford Dictionary of English Idioms (x993) contain some 15,000 phrasal verbs, idioms and fixed phrases, but the total number of multi-word items in current English is clearly much higher. There is no clear boundary to the set of prefabs, and almost no limit co the number of compounds which might be coined as terms. Furthermore, language ‘changes and vocabulary items fall in and out of use. The set of multi- Word items is effectively open-ended and is not static. However, it is safe to say that many thousands of multi-word items are in use and within the competence of proficient speakers of English. Traditions and models of multi-word items ‘The field of idiomatology or combinatorics is one of the most heavily explored in lexicology. In addition to work in Britain and the US, there are rich literatures and strong traditions in German and Russian/East European lexicology. Substantial critical reviews in English are provided by Makkai (1972), Fernando (1978), Fernando and Flavell (1981), Wood (1981), and Gliser (1988). Weinreich (1963) gives a rare English~ language overview of the earlier Sovie/Russian tradition. 1 do not propose to look at psycholinguistic aspects here, but Cacciari and Tabossi (1993) provide a useful collection of papers on this, which covers the main ideas involved. Te is important to draw attention to the complexity of the subject, although this chapter is not the place for an in-depth study of the theoretical aspects of word combinatorics. At least some of the unruli- ness and apparent conflicts in the literature result directly from there being substantially different models applied to multi-word items, which foreground different characteristics. Semantics-based models are in many respects the most traditional, ‘They lattempt to differentiate between categories of multi-word items ac- cording to degrees of compositionality, and they aim to identify, as it were, the irreducible semantic building: blocks of the lexicon. Important work here is Makkai (op. cit.) and Mitchell (1971) 48 Vocabulary connections: multi-word items in English {In contrast, syntax-based models take grammatical well-formedness as their starting-point. Multi-word items ~ and in particular idioms and fixed phrases ~ are often non-compositional because they do not obey sules. For example, kick the bucket never passivises, by and large and ypow come? are grammaticaly illormed. In these models, the structural peculiarities of multi-word items become criterial features. Important papers here are Katz and Postal (1963), Weinreich (1969), and Katz (i973); also Fraser (1970), who develops a model of frozenness or grammatical rigidity in idioms, with seven levels on a scale of idiomati- ity. Both Makkai (op. cit.) and Healey (1968) incorporate structural properties into their analyses of multi-word items, although semantics remains their starting-point. Soviet/Russian traditions in combinatorics have tended to focus on collocation and types of collocation. This leads to a strong emphasis being placed on phraseology and usage and fits well with the sorts of observation of the lexicon which are made by corpus linguistics. They then build in semantic (and syntactic) criteria in order to separate out classes of, for example, pure idioms. Lexicography has to varying degrees been influenced by all these ideas. Unfortunately, dictionaries perpetuate a black and white distine- tion between kinds of items. Either something is a ‘phrase’, ‘phrasal verb’, or ‘compound’, or it is not: a decision has to be taken because of placement conventions in paper books. Lexicographical techniques can be devised to deal with hybrid cases such as restricted collocations or completive particles such as out in semi-phrasal verbs such as spread ‘out, for example by mentioning the collocating word explicitly in the definition. A traditional form of words in British lexicography has been something along the lines of ‘torrential (of rain) very heavy’ or ‘spread (often followed by out) to extend, move, or open outwards... However, the very fact that this kind of important information is embedded into the definitions of senses of single words may imply that the collocations are peripheral features of these senses and words, oF it may reinforce an inappropriately rigid distinction between single and multi-word items. It may also mislead by subordinating within the entry some items which are very important in the overall vocabulary. For example, if give up is made a run-on or subordinate part of a dictionary entry for give, it will inevitably seem less prominent and less important than much less common vocabulary items which are given full head- word status. Models provide ways of categorising the different kinds of unit in the lexicon: rather like sorting out chemical compounds from elements. But a layperson might well argue that what is important about, say the difference between substances like carbon and carbon dioxide ~ or 9 Vocabulary and description indeed graphite and diamonds is what they are used for: not what they consist of. A further group of models/approaches can be termed functional. Important work here is Pawley and Syder (1983), and Nattinger and DeCarrico (1992). Here, multi-word items are integrated into the vocabulary in terms of their pragmatics. This leads to a more practical approach where multi-word items can be integrated into a dynamic model of language-in-use, rather than language-as-artifact, and seen as enabling devices (see further below). Is it possible to synthesise these models? To some extent, yes. Models can ~ and should - be developed using a complex of features: semantic criteria, rule-conformism, collocation fixedness and so on. As Fernando and Flavell say: idiomaticity is a phenomenon too complex tu be defined ia terms of a single property. Idiomaticity is best defined by multiple criteria, each criterion representing a single property. (x981: 19) Yet the range of multi-word items is sufficiently heterogeneous that itis difficult to see how any model can account for all types in a way that is, helpful to theorists, practitioners and learners alike. More important, corpus evidence consistently calls into question the givens of idioma- tology and even suggests a need for new kinds of model altogether. Multi-word items and corpus evidence The models in the previous section set out to prove their robustness through conventional modes of argumentation such as establishing examples and counter-examples, exceptions to rules and so on. They have, however, for the most part been based on intuition, introspection and idiolect. In contrast, corpus linguistics over recent years has made it possible to examine lexis in a more scientific and objective way. ‘First generation’ corpora (Leech 199%: ro) of up to one million words showed limited evidence for many multi-word items: they proved simply too rare and too genre-specific to show up. For example, Norrick (1985: 6-7) reports that he observed only two instances of proverbs in Svartvik and Quirk’s Corpus of English Conversation (170,000 words), and Strassler (1982: 77-81) found only 92 instances of idioms in a corpus of just over 100,000 words of spoken interaction of various kinds, ‘Second generation’ corpora of around 20 million words were able to improve on this situation. In a study of a corpus of 18 million words of British English, the Oxford Hector Pilot Corpus, I investigated the 50. Vocabulary connections: multi-word items in English distributions of 6,700 idioms and fixed phrases: the set of items more or less corresponded to the sort of set to be found in the large British learners’ dictionaries. The results are reported in detail in Moon (1994 and forthcoming). To summarise them: I found that more than 70 per cent of the items I looked at had frequencies of less than per million words of corpus text. (To set this in context, some single words with frequencies in The Bank of English of one per million are algebra, altruistic, chairperson, predictability and unaccompanied.) In fact, 40 pet cent of my target set of items occurred in the corpus with such low frequencies that they were no better than random chance: that is, it was, entirely a matter of chance that these items were found at all, and so their presence or absence was statistically insignificant or meaningless. Of the more frequent items, 21 per cent of the whole set had frequencies in the range 1-5 per million; 4 per cent in the range s—10 per million; and just over 3 per cent had frequencies of ro per million and above. Only 16 individual items occurred more often than 100 per million, and these included at all, at least, in fact, of course and take place. These figures are set out in Table 1 Table 1 Overall corpus frequencies of idioms and fixed phrases Rate of occurrence in corpus percentage of items Jess than 1 per 4 million words 40% 1 per 1-4 million words 32% 4-2 er million words 12% 2-5 per million words 9% 5-20 per million words 4% 30-50 per million words 3% §g0-Z00 per million words a% ‘over 100 per million words a% total 100% Certain kinds of item were found to occur much less frequently than. others. Almost no similes and proverbs occurred more frequently than one per million, and most occurred with such low frequencies that they must be considered less good than random chance. Of the idioms I looked for, half occurred with frequencies which were less good than random chance. Thirty-seven per cent of the set had frequencies which were statistically significant but still les than one per million, and only 11 per cent of the set occurred more often than this. The frequencies are set out in Table 2. su Vocabulary and description ‘Table 2 Frequencies of idioms, proverbs, and similes Rate of occurrence in corpus idioms proverbs similes Jes than x per 4 million words 51% 84% 91% x per 1-4 million words 37% 16% 8% 1-2 per million words 8% «1% 1% 2-5 per million words 3% a% 0% 5-10 per million words a% 0% % 40-50 per million words a% 0% 0% lover 5a per million words 0% o% 0% total 100% 100% 120% (number often investigated 2657 376 146) These are benchmarking statistics, but crosschecking with other corpora throws up comparable figures and distributions. For example, ‘some recent research at COBUILD looked at over 4,000 idioms, using The Bank of English (a ‘third generation’ corpus with several hundred million words). Thirty per cent occurred less often than once per ten million words of corpus text; 35 per cent occurred 1~3 times per ten million words, and while around 20 per cent had frequencies of at least one instance per two million words, very few of these reached the one! million threshhold. ‘Of course, distribution statistics are affected by corpus content, and the following section will consider multi-word items and genre. However, the general tendencies are shown up again and again in different corpora. There are a lot of multi-word items in the language but a lot of them are very infrequent. Variability in multi-word items One case where the use of corpora has pointed up some shortcomings in previous idiomatological views relates to the stability of the forms of ‘multi-word items. Lexical phrases and prefabs are often fluid, of course, bur idioms also show remarkable degrees of variation. In my study of expressions in an r8-million-word corpus, I found that 40 per cent of the items under investigation regularly varied and were unstable in form: this figure does not include deliberate, jocular, or ad hoc exploita. tions of idioms as in puns. The findings are discussed in detail in Moon (1994a and forthcoming), and they are entirely borne out by recent work with The Bank of English. 52 Vocabulary connections: multi-word items in English Some of the principal kinds of variation are represented in the following: British/American variations: ‘not touch someone/something with a bargepole (British) not touch someone/something with a ten foot pole (American) hold the fort (British) hold down the fort (American) varying lexical component: burn your boats/bridges throw in the towel/sponge unstable verbs show/declare/reveal your true colours costipaylspendicharge an arm and a leg, sruncation: silver lining/every cloud has a silver lining last strawrit’s the last straw that breaks the camel's back transformation: break the icefice-breakerlice-breaking blaze a trail/rail-blazer/teail-blazing, In extreme cases, there are no fixed lexical items at all, but merely some sort of lexico-semantic core which can be considered an idiom-schema (Moon, op. cit. and forthcoming): ‘wash your dirty linen/laundry in public (mainly British English) air your dirty laundey/linen in public (mainly American English) do your dirty washing in public (British English) washlair your dirty linen/laundry wash/air your linen/laundry in public dirty washing/linen/laundry This suggests that in any new model of idiom, it might be better to have a notion of ‘preference of form’ or ‘preferred lexical realisation’ rather than ‘fixedness of form’, and better to build in the fact that there is a complex relationship between deep semantics and surface lexis, rather than it all being a simple case of individual anomalous strings with non- compositional meanings. Multi-word items in text and discourse Corpora provide one kind of evidence for multi-word items; texts provide another. By looking at the densities of different kinds of multi- word item in particular text types, it can be seen that there are often 53 Vocabulary and description strong gente preferences. For example, idioms are especially associated with journalism as well as informal conversation, Certain subgenres seem to attract exceptionally heavy use of idioms. McCarthy (1992b: 62) and McCarthy and Carter (1994: 113) point out the frequent use of idioms in horoscopes in journalism, as the following demonstrate: Taurus: Try as you might to keep your feet on the ground, a relationship is absorbing both your time and imagination. However, wait until after November 3 if you are planning to hitch your wagon to this star. (Weekly Observer [Birmingham] 27 October 1995) Leo: Pride comes before a fall. Besides that, although colleagues, relatives or partners are inclined to get on your nerves or your wick, no on can really hold a candle to you this week. (Metronews [Birmingham] » November 1995) In such horoscopes, the use of items with general applicability, rather than specific meanings, enables any number of interpretations on the part of the reader/client, and so maximises the chances of their being relevant to his/her individual situation, as well as lightening the tone and enhancing the interpersonal relationship between writer/expert and readerictien want to explore multi-word items, text and genre by looking briefly at three more extracts. Firstly, one from a handbook on painting: ‘The binder for oil paint consists of a vegetable drying oil, such as linseed or poppy oil, which dries by absorbing oxygen from the air. This is known as oxidative drying, and is a very slow process. Acrylic paints, in contrast, are physically drying, which ‘means that they dry rapidly through evaporation of the water contained in the binder. As the water evaporates, the acrylic resin particles fuse to form a fairly compact paint film in which each minute particle of pigment is coated in a film of resin, The result is a permanently flexible paint film which is water- resistant, does not yellow and reveals no sign of ageing. (Collins? Artist's Manual 1995: 148) Itis a fairly technical piece of writing, and it includes a lot of compound words: oil paint, poppy oil, oxidative drying (here defined as a term), acrylic paint, water-resistant, as well as one-off compound strings such as paint film. The only other multi-word item is in contrast: a common discourse-structuring device. The multi-word items chosen reflect the genre: technical terms and clear signalling of structure and clause relationships. This choice is entirely predictable. 54 Vocabulary connections: multi-word items in English Secondly, the opening of a report on a soccer match, which reveals quite different distributions: ‘What a sorry, sorry night for English football. Another lesson, another pupil tured master. Leeds went out 8-3 on aggregate. | ‘The expectation was that Leeds would go down with all guns blazing, pursuing the impossible dream with the gusto which hhad characterised their spirited fightback in the first leg. It was not to be. PSV Bindhoven are too accomplished, too astute and | ‘methodical a team to permit that sort of gung-ho nonsense. Chasing goals to complete the most improbable of recoveries, | Leeds succeeded only in leaking them at regular intervals. It was not a pretty sight. Ourplayed hy men of vastly superior tech- nique, it was, if not embarrassing, then utterly comprehensive. Seeking to overturn a 5-3 deficit, the Leeds manager Howard Wilkinson was expected to throw caution to the wind. He decided instead that a difficult task would be approached without the former England striker Brian Deane. (The Guard- ian: 1 November 1995) This is a highly emotive and evaluative text, where the evaluation is conveyed by the use of words such as the adjectives sorry, spirited, accomplished, astute, methodical, gung-ho and so on, and by thetorical devices such as the chain of contrasts and repetition, as in What a sorry, sorry night and Another lesson, another pupil turned master. Multi word items include the compound adjective gung-ho, the phrasal verbs go out (semi-technical in sport) and go down (informal), the fixed phrase om aggregate (semi-technical in sport), and the idioms all guns blazing and throw caution to the wind, as well as the items the impossible dream and not a pretty sight. Note in particular that the idioms (with) all guns blazing and throw caution to the wind are used in complex ways: each evaluates positively what is being set up as the desirable course of action, but each is then contrasted with the actual | situation which is evaluated as unsatisfactory. So the idioms both evaluate and form prefaces to evaluations. Finally, a short extract from a screenplay sures Hash is legal there, right? [there = in Amsterdam] vincent Yeah, it’s legal, but it ain’t a hundred per cent legal. 1 ‘mean, you just can’t walk into a restaurant, roll a joint, and start puffin’ away. I mean, they want you to smoke in your home or certain designated places. yuLEs Those are hash bars? 55 Vocabulary and description Vincent Yeah, it breaks down like this: its legal to own it and, if you're the proprietor of a hash bar, it’s legal to sell it. I's legal to carry it, but, but, but that doesn’t matter ‘cause ~ get a load of this, alright ~ if the cops stop you, it’s illegal for them to search you. I mean that’s a right the cops in Amsterdam don't have. (Q Tarantino Pulp Fiction. 1994: 14) ‘This interchange attempts to replicate natural speech patterns. It includes compounds such as hash bar and the deliberately discordant legalese of certain designated places; the intensifier a hundred per cent; the informal phrasal verbs puff away and break down, and prefabs and fixed phrases I mean, get a load of this, and al(!)right. It has a different flavour altogether. By looking at che multi-word items in texts, their stylistics can be tied into text analyses and enable fuller understanding of the text and often its subtext: this is demonstrated in an analysis of a newspaper editorial in Moon (r994b). It also shows up the fact that multi-word items have important roles with respect to the structure of text. For example, to ‘generalise crudely, compounds typically denote and have high informa tion content ~ often because they are technical terms or have specific reference. Fixed phrases and prefabs often organise and provide the framework for an utterance or the argument of a text; or they are situationally bound, as in ritualistic formulae of greeting, thanking and so on. Idioms typically evaluate and connote, and are shorthand, therorically powerful ways of conveying judgements. What also appears in studying them in context is that idioms often have other discoursal roles, for example as prefaces or summarisers. The expectation was that Leeds would go down with all guns, blazing, prefaces a restatement in pursuing the impossible dream with the gusto which had characterised their spirited fightback. . as well as prefacing the confounding of the expectation in It was not to be. Idioms are textual choices as well as semantic choices, and this is a crucial point: they are rhetorically significant. Discussion of this and related points can be found in Moon (1992); McCarthy (1992b), Moon (2994a and forthcoming), and McCarthy and Carter (1994). Recent work on the teaching of units above the level of single words has indeed emphasised the importance of teaching multi-word items as enabling devices, providing structures and frames, and functioning 56 Vocabulary connections: multi-word items in English within real discourse situations. For example, Nattinger and DeCarrico ist the following as ‘Summarizers’ OK so so then ina nutshell that’s about it/all there isto it remember that this means X in effect to make a long story short what I’m trying to say is X (1992: 95) These strings are idiomatic, but different in kind. Nattinger and DeCarrico (op. cit: 113-73) demonstrate convincingly the utility of teaching/learning such units so that the canonical structures and frames of different discourse genres may be better recognised and reproduced. Similarly, Willis (1990: p. v, 83, 115-20) draws attention to discourse- structuring frames in his discussion of a lexically-oriented syllabus for Lz language learners. Coulmas (19792, 1979b) and Nattinger (1980) also discuss the significance of routines in pedagogy, and more prag- matic approaches to the items. The examination of texts shows up the crucial importance of this: the multi-word items chosen are not arbi- trary or casual, bur integral parts of the whole discourse, ‘Second-language learning perspectives ‘Multi-word items are typically presented as a problem in teaching and learning a foreign language. Their non-compositionality, whether syn tactic, semantic or pragmatic in nature, means that they must be recognised, learned, decoded and encoded as holistic units. The phe- nomena of multi-word items and idiomaticity are generally held to be universals in natural languages: this is discussed, for example, by Makkai (1972) and Fernando and Flavell (1981). Foreign learners are therefore likely to be aware of similar phenomena in their first language. Carter points out: ‘The emphasis on problems may in itself be dangerous since it concedes to idiomaticity and fixed expressions a problematic status and this ignores arguments concerning the naturalness and pervasive normality of such ‘universal’ relations in lan- guage. (1987: 136) Similarly, Baker and MeCarthy comment: 57 Vocabulary and description The more naturally MWUs {multi-word units] are integrated into the syllabus, the less ‘problematic’ they are. (1988: 32) At the same time, multi-word items are language-specific and they have particular sociocultural connotations and associations: see, for example, ‘Alexander (1985, 1989). Even where analogous multi-word items exist in both Lx and L2, they are unlikely to be exact counterparts, and there may be different constraints on their use: Odlin (1989: 55), points out restrictions on proverb use in English in comparison with other cultures, This contributes to the difficulties facing Lz learners when confronted with multi-word items. Because many multi-word items, particularly metaphorical multi- word items, are marked, infrequent, and generally considered ‘difficul’, they may be taught sparingly as receptive vocabulary items. In dis cussing pedagogical aspects of multi-word items, Gairns and Redman {1986 35) point out that many Lz speakers are communicating with ‘other L2 speakers rather than Lx speakers, and the use of ‘idioms’ or idiomatic strings may inhibit full understanding: thus the use of multi- word items breaches conventions of politeness. Multi-word items may also be used by L2 speakers in inappropriate semantic or discoursal contexts, leading to further communicative errors. Teachers and lear ners can avoid the problem by avoiding such items altogether, and in many cases this can be justified because of the relative infrequency of occurrence of many kinds of multi-word item, as discussed above. Yet this is not a real solution, The appropriate use and interpretation of multi-word items by L2 speakers is a sign of their proficiency, as is pointed out, for example, by Kjellmer (1991), particularly with regard to the creative exploitation and manipulation of multi-word items, and Low (1988), with regard to both institutionalised and non-institutiona- lised metaphors. It is also precisely the point made in the discussions of routines by Pawley and Syder (1983), Nattinger and DeCattico (1992), and Coulmas (1979, 1979b). It is a difficult situation: these items are hard, but they need to be acquired at some stage. And the difficulty of the situation is compounded by the inadequacy or misleadingness of ‘many teaching and reference materials. Errors in the use of multi-word items can be categorised crudely as formal, pragmatic or stylistic. Formal errors with multi-word items may arise simply through failure to recognise a string as non-compositional.. Bensoussan (1992: 106) reports errors by Hebrew-speakers or Arabic speakers such as £0 wonder a litle for little wonder and the smallest for at least. There may be lexical errors, and Irujo (1986: 296) reports kill two birds with one rock, swallow it hook, cord, and sinker, and come Iow or high water in a group of Venezuelan Spanish-speakers; also putt 58 Vocabulary connections: multi-word items in English something fast on her (confusing pull a fast one and put something over ‘on someone) and kicked the towel (confusing throw in the towel and ick the bucket), Idioms may be transferted and translated literally, although the resulting calque is not an institutionalised item in the target/L2 language. For example, a personal letter from a French- speaker with a native-like fluency in English contains the following: ‘This must be totally uninteresting to you. You must have ‘other cats to thrash’. He uses a calque of French avoir d'autres chats d fouetter instead of the analogous English idiom have other fish to fry. In this case, the writer thas demarcated the calque in quotes and may well have intended the choice to be jocular and marked, but the problem of mistranslation and miscommunication remains. In other cases, there may be interference from similar multi-word items in the Lr, although, as Irujo (op. cit: 292) points out, it is not always possible to identify these as discrete: For example, does put your leg in your mouth result from interference from the similar Spanish idiom meter la pata (to put in the leg’), or is it an overextension of the English word foot? (1986: 292) Malcolm Coulthard (personal communication) observes that the pro- blems of interference and mistranslation may be exacerbated with L1s such as Malay where a single lexeme kaki denotes both ‘foot’ and ‘leg’. Finally, formal errors may arise where syntactic ‘rules’ for multi-word items are not known or observed, so that items are strangely pluralised, or used in an untypical tense, aspect or voice. Pragmatic errors include those arising from the use of multi-word items in inappropriate discoursal contexts or from misunderstandings of the discoursal situation in some way. This kind of error may be intervarietal as well as interlingual: for example, speakers of British English and speakers of American English have different formulaic routines in everyday interactions such as greetings and shopping. How are you today? may be interpreted by a British speaker as a request for information about their health, but by an American speaker as a simple greeting formula, Pragmatic errors also include the use of multi-word items with inappropriate or aberrant evaluations. For example, sit on the fence might be used to mean ‘stay impartial’ with no negative evaluation, whereas it is usually used to criticise a refusal or failure to ‘commit. Similarly, pearl of wisdom might be used to express approval ‘of what someone has said, whereas it is typically used ironically by native speakers to express contempt. There are obvious problems in 59 Vocabulary and description encoding if the wrong evaluation is given, and in decoding if the implied evaluation is not understood. Stylistic errors in the use of multi-word items may arise through use of an excessively marked multi-word item ~ very rare, dated or over- informal ~ or in an inappropriate genre. In recommending the teaching. of only ‘useful’ multi-word items, Gairns and Redman comment: in deciding what is useful, it is worth considering whether fan idiom can be incorporated into the students’ productive vocabulary without seeming incongruous alongside the rest of their language. Certain native speakers might ‘get the ball rolling’, but few foreign learners could carry off this idiom ‘without sounding faintly ridiculous. (1986: 36) Such concerns about the use of (marked) multi-word items by non- native speakers and their ‘sounding faintly ridiculous’ may be prompted by experience of these speakers’ subtly infelicitous uses of multi-word items, or overuse of very marked or rare items. ‘A number of studies report that L2 leamers typically avoid using multi-word items, even where the languages are closely related and have apparently parallel expressions. Irujo (op. cit.) observes this in relation to Spanish and English, and Kellerman (1977) in relation to Dutch and English: similarly Hulstijn and Marchena (r989) with respect to Dutch and English phrasal verbs. The most likely reason for this is that non- native speakers are suspicious of apparently cognate or identical items in their two languages. They have learned to be wary of ‘false friends’ and know only too well that chere may be subtle but crucial distinctions in meaning, usage, or register which may lead to misreadings and misunderstandings. These distinctions may be imagined; but they may be real, as is the case of a pair cited by Fernando and Flavell (1981: 83}: skate on thin ice is conceptually and semantically related to Serbian navuéi nekoga na tanak led ‘pull someone onto the ice’, but the English implies that someone is voluntarily taking a risk and the Serbian that they are forced to behave in a risky way. A complicated case is pointed cout by Platt et al., (1984: 108): Singaporean and Malay English have an multi-word item to shake legs ‘to be idle’, a calque of Malay goyang kaki, which has the opposite connotation and meaning from the similar British English to shake a leg ‘to be active, to get up’ Teaching multi-word items ‘Various pedagogical techniques for the acquisition of multi-word items have been suggested. Celce-Murcia and Rosensweig (1979: 251) suggest 60 Vocabiilary connections: multi-word items in English that the most appropriate strategy for teaching them is the use of short logues: certainly, at more advanced levels the use of contextualised examples would show up discoursal features. Lattey (1986) advocates a pragmatic classification of multi-word items which would foreground correlates and restrictions as well as setting them in appropriate sociocultural frames. Ghiser (1988: 264) advocates classifying multi- word items according to subject arcas, or the speech acts they encode, as well as making (advanced) learners aware of the range of multi- word item types and of the differing degrees of transparency and opacity. Irujo (op. cit.: 298) recommends that full advantage be taken of latent knowledge of Lr multi-word items. She also suggests that multi-word items to be taught should be ‘carefully chosen on the basis of frequency, need, transparency, and syntactic and semantic simplicity” (ibid: 300). In fact, frequency is often mentioned as a useful criterion for judging which items should be taught. Alexander (1987: 114-5) comments that Jearners should be made aware of the relative frequencies of multi-word jtems in the La, as do Carter (op. cit.) and Carter and McCarthy (1988: 56). The use of corpora helps here to provide a more objective and less idiolectal or idiosyncratic basis for judgements about frequency, since frequency is very hard to assess intuitively. For example, Arnaud {1992a) established a rank-list of the French proverbs best known to his informants, university undergraduates, but the rank-list conflicts mark- edly with his survey of proverb frequencies in spoken and written data {Arnaud and Moon, 1993). ‘Table 3 Multi-word items and frequencies item teaching intuitions frequency per million in The Bank of English tp of my tongue possibly productive we

You might also like