Collocations in Science Writing
Collocations in Science Writing
270pp.
ISBN 3-8233-4945-7.
1
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Preface.
2
Christopher Gledhill (2000). Collocations in Science Writing.
CONTENTS
Section Page
I. Introduction 1 Aims 1
2 Underlying assumptions 5
3 Definitions of Collocation 7
II. Language and 19
Science
1 The Terminology of Science 20
2 The Discourse of Science 27
3 The Research Article Genre 35
3.1 Titles 40
3.2 Abstracts 41
3.3 Introductions 44
3.4 Methods and Results Sections 45
3.5 Discussion Sections 46
4. The Discourse Community 47
4.1 The Discourse of Cancer Research 47
4.2 A Textography of the Pharmaceutical Sciences
Department 51
4.3 Details of the Survey 54
III. Collocations and the 1 Choice in the Grammar of Texts 64
Corpus
2 The Lexico-grammar 73
3 Corpus Linguistics 79
4 Corpus Analysis and Languages for Specific 81
Purposes
5 The Status of Corpus Evidence 83
6 The Corpus and the Discourse Community 90
6.1 The Language View of the Pharmaceutical Sciences 91
Corpus
6.2 The Design Criteria of the Corpus 91
6.3 Choice of Material in the Corpus 93
6.4 Corpus Typology 98
6.5 Text Analysis 99
IV. Collocations and the 1. Collocations of Salient Words in the 110
Research Article Pharmaceutical Sciences Corpus
3
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
4
Christopher Gledhill (2000). Collocations in Science Writing.
I. Introduction
1. Aims
The aim of this book is to explore the language of science writing. The
method is to describe scientific research articles on the basis of a computer-
held text archive (a corpus). While many features of language have been
identified in scientific texts, I examine one phenomenon in particular:
collocation. Collocation is a process by which words combine into larger
chunks of expression. Some collocations involve words which seldom occur
in other combinations (for example: ‘auburn hair’, ‘rancid butter’, ‘ups and
downs’). Others are turns of phrase made up of words that commonly occur
in many combinations (‘of course’, ‘so be it’, ‘as a matter of fact’). These
expressions are all related in phraseology, roughly defined here as ‘the
preferred way of saying things in a particular discourse’ (a formula adapted
from Kennedy 1984). My use of the term differs from lexicologists such as
Dobrovol’skij (1992) and Howarth (1998). The notion comes instead from
recent research in discourse analysis (Moon 1998a and 1998b) and happens
to correspond to the everyday use of the term in English to denote skilful
mastery of linguistic formulations (e.g. ‘in the phraseology of diplomatic
circles’). Whatever words we use to talk about these expressions, it is clear
they are a key part of the writing process, and it is impossible for a writer to
be fluent without a thorough knowledge of the phraseology of the particular
field he or she is writing in.
The more specific aim of this book is to demonstrate the role of
collocations in scientific English. Although much research has been carried
out to establish the range of these expressions in English and in other
languages, there remains a great deal to be said about the phraseology of
science, in particular the differences between the typical collocations of the
language as a whole and the kinds of expressions that are used in very
specialist writing. Intuitively, most English speakers are able to guess that
expressions such as ‘ups and downs’ and ‘so be it’ are rare in science writing.
Some expressions or words are seen as more central or stylistically typical in
the language than others, a concept critical to vocabulary studies and known
as centrality (Carter 1998). What distinguishes scientific English from other
5
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Although there are several procedures for the preparation of chiral pyrrolidines
and pyrrolidinomes, the majority of these exhibit poor enantiomeric excesses,
lack versatility, suffer low yields or some combination thereof. Herein, we
describe an efficient asymmetric system of substituted pyrrolidines and
pyrrolidinomes that should find general applicability to a variety of modern
synthetic challenges. (J. Gardiner, 1992 ‘Total synthesis of
Didehydrodideoxythymidine d4T’).
This text has some predictable features of scientific prose and at the same
time has a very distinctive style that one would not necessarily associate with
science writing, or even with natural, well-formed English. The cohesive
devices thereof and herein strike the reader as archaic or legalistic rather than
technical, while some perfectly recognisable English words have taken on a
specialized meaning in novel combinations (exhibit excesses, lack versatility,
suffer low yields, find general applicability). It is clear that even this short
6
Christopher Gledhill (2000). Collocations in Science Writing.
7
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
are useful units of expression, their relative value depends on their position
within the overall phraseological system. The use of the passive voice and
technical terms implies certain belief systems that are perpetuated in science
writing, and I hope to be able to put these systems in context from a
phraseological perspective.
Throughout this book, I wish to pursue three basic research aims. The first
is a practical one: to provide a method of describing language in a reliable
and objective manner. This is mainly achieved by the use of a computer-held
archive of texts (the corpus) collected specifically for the purpose of
linguistic analysis, and also by the use of software which calculates word
frequencies (the wordlist program) and collects word patterns (a
concordancer). However, I also try to demonstrate that the specialist corpus
requires a contextual basis, in particular one that takes account of the
processes of production of the corpus (as the property of a community of
scientists, as well as a text in relation to other scientific texts). Thus while the
methodology of this book follows the corpus linguistic approach of Sinclair
(1991), its theoretical basis also draws on theories of discourse and genre -
especially those of Halliday (1985) and Swales (1990). The practical
applications of such a method include the well-documented ability to use the
corpus as a tool for language teaching, as well as the possibility of using a
corpus as an editing tool and as a source of specialist information. One
simple application was suggested by one of my specialist informants: he
wanted to know what information to include in Abstracts and how to express
himself when writing them, because he felt that he needed to follow accepted
practice. Although the field I have chosen is very highly specialised, I also
wish to demonstrate that the methodology is sound and applicable to other
specialist genres.
The second aim of this book is a theoretical one: to establish a notion of
collocation within a theory of language, in particular to discuss the role of
collocations within texts. While collocations have become a central issue in
the study of vocabulary and lexicology (Carter 1998), their role in discourse
and genre analysis has not yet been fully explored. Although many studies
conceive of collocations as lexical units which are self contained, with a
grammatical structure dependent on one lexical item – i.e. less restricted
forms of idioms, a number of studies have emerged recently in which the
collocational properties of words are seen as parts of a wider system (for
example, Francis 1993, Hunston and Francis 1998). It is possible to list the
collocational properties of words in corpus analysis, but it is also necessary to
explain how these expressions are related to each other in a particular
language or discourse. I intend to demonstrate that while science writing may
be very heavily constrained in certain respects, it also allows for considerable
8
Christopher Gledhill (2000). Collocations in Science Writing.
2. Underlying Assumptions
9
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Even within this very specific field, the complexity and degree of
specialisation involved in cancer studies means that the corpus would be
meaningless without an account of its context. The corpus in turn must
represent a reasonably homogeneous linguistic community. The specific
linguistic practices of a professional group are at the heart of the genre
analysis approach (Swales 1990), although they have received little attention
in mainstream corpus linguistics. On the other hand, genre analysis has only
recently begun to use computer-based corpora. My hypothesis is that any
distinctive ‘style’ or phraseology I discover can be attributed to a broad
community of scientists in pharmacology and cancer research and contribute
to a description of the research article genre. Section II in particular explores
these themes and discusses in detail the context of the corpus.
3. Definitions of Collocation
dictionary makers, for example, examine the way lexical words behave in
certain combinations. The adjectives strong and powerful can thus be seen to
have a similar meaning but a different range of use with certain nouns: strong
argument, powerful argument versus strong tea / *powerful tea, *strong car /
powerful car. Once such a restriction is identified for a pair of words, we are
dealing with some form of collocation.
However, as the word ‘familiar’ suggests in my working definition, there
is more to collocation than the combination of two or more words. In the
following discussion, I attempt to synthesise three different ways of
categorising and defining the notion of collocation: Halliday’s statistical /
textual view, the semantic / syntactic tradition in lexicology, and the
discoursal / rhetorical model from discourse analysis. I then go on to propose
an overall model of phraseology which serves as a basis for the analysis
carried out in the rest of the book. In the corpus analysis sections of this
book, Halliday’s statistical definition is specifically taken as the first and
simplest stage of my analysis, but is then supplemented by further stages of
interpretation in order to determine the structural and rhetorical significance
of the collocations identified in the corpus.
12
Christopher Gledhill (2000). Collocations in Science Writing.
A collocate can thus simply be seen as any word which co-occurs within an
arbitrarily determined distance or span of a central word or node. Collocation
is thus considered to be the frequency with which collocates co-occur with
one node relative to their frequency of collocation with other nodes. From the
point of view of many corpus linguists, all that separates collocation from
mere word co-occurrence is the statistical level at which the researcher is
happy to say that the co-occurrence is not accidental. This approach is also
‘textual’ in that it relies solely on the ability of the computer program to
analyse large amounts of computer-readable texts. Sinclair (1991:68) shows
this by noting that the independent probability of ‘set’ collocating with ‘off’
in the Cobuild corpus is just one in a million (1 855 instances of ‘set’
multiplied by 556 instances of ‘off’ from a total of 7.3 million words). Yet the
actual frequency of collocation is around 550 instances (that is: 70 in a
million). The expression ‘set off’ can thus be considered a significant
collocation without considering other semantic or lexical considerations
(1987b:153).
This perspective essentially emphasises collocation as co-occurrence
(words which frequently combine) and recurrence (combinations which
frequently occur in language). The notion of statistical collocation is integral
to Halliday’s theory of discourse and the theory is discussed in section III. It
is sufficient to note here that a statistical view of language allows the linguist
to identify patterns that would not normally be recognised using traditional
categories. The textual view of collocation also emphasises the fact that
collocations are not disembodied lexical units inserted into the body of a text
without modification, but are the result of reformulations and paraphrases
which have developed throughout the length of a text. A textual collocation is
likely to have a specific textual function or may occur in a rather restricted
set of contexts. These expressions can be seen to be couched seamlessly in
the surrounding text, and in many of the examples we see below, the
collocational patterns of a specific phrase are motivated or triggered by other
phrases which appear to be at some distance (a phenomenon observed by
Phillips 1985 and Hoey 1991). This is what is meant by ‘long-range
collocation’.
13
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
14
Christopher Gledhill (2000). Collocations in Science Writing.
I will use the term COLLOCATION as the most general term to refer to all
types of fixed combinations of lexical items. In this view idioms are a
special subclass of collocations, to wit those collocations with a non-
compositional, or opaque semantics. An idiom might even be defined as any
grammatical form whose meaning is not deducible from its structure. In this
view all morphemes are idioms. (van der Wouden 1997:9).
Makkai (1992) has similarly argued that collocations and idioms can be seen
as extended forms of words. Kjellmer makes a similar point:
Van der Wouden further makes the point that idioms and collocations share a
number of properties, not least of which the ability to contain analogies
which are not carried on into the rest of the language system:
[...] you cannot predict that the meaning of sleep like a log will denote an
intense form of sleeping, but after you have learned what it means, you see
that like a log is an intensifier. The essence of collocation is that the
assignment of like a log to the meaning ‘very’ does not feed other
combinations. So even though we have a meaning for it, that meaning is
only valid in a certain collocation [...] (van der Wouden 1997:54-55).
From this discussion, it emerges that the distinction between idiom and
collocation is difficult to justify on purely semantic or syntagmatic grounds.
Instead, collocation constitutes a general system of abstract relations which
underpin much phraseology in the language, and range from relatively free to
relatively fixed expression. A different perspective, although still within our
‘semantic / syntactic’ framework, relates collocational patterns to the wider
grammatical system, as in the work of Sinclair (1991). For example, Renouf
and Sinclair (1991) have noted that the meaning of a lexical item can be
predicted by the presence of grammatical items and the sequence in which
they are arranged. Thus in expressions such as an X of, X is often a quantity,
or in too Y in the Z, Y and Z are often time expressions (such sequences are
termed collocational frameworks). Louw (1993) has noted that clusters of
lexical collocations often share a similar semantic profile or ‘semantic
prosody’. Thus the NP subjects of the phrasal verb set in belong invariably to
a semantic field with negative associations (the bad weather, gangrene, the
rot, depression ... sets in). According to this perspective, the grammatical
patterns of co-occurrence are an intrinsic meaning of an expression, and any
16
Christopher Gledhill (2000). Collocations in Science Writing.
item which is inserted into the pattern can be re-interpreted in terms of the
existing collocational framework (e.g. a cacophony of musicians [collective],
the Labour party have set in [negative connotation]).
In a large-scale study of verb complementation, Hunston and Francis
(1998) similarly make a specific link between the grammatical form of an
expression (its underlying word class pattern) and its meaning, claiming that
the pattern is part of the meaning of the expression. Hunston and Francis
identify a number of collocations which share specific grammatical patterns
and yet also display a closely related meaning. Here is one example:
...sense and pattern tend to be associated with each other, such that a
particular sense of a verb may be identified by its pattern. The verb recover
has two main senses: ‘to get better’ following an illness or period of
unhappiness, and ‘to get back’ something that was lost. The first of these
senses has the pattern ‘V from n’ (e.g. He is recovering from a knee injury)
[...] and ‘V’ (e.g. It took her three days to recover), whilst the second has
the pattern ‘V n’ (e.g. Police... recovered stolen goods). (Hunston and
Francis 1998:51).
The principle of idiom is that a language user has available to him or her a
large number of semi-preconstructed phrases that constitute single choices,
even though they might appear to be analysable into segments. To some
extent, this may reflect the recurrence of similar situations in human affairs;
it may illustrate a natural tendency to economy of effort or it may be
17
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
From the ‘semantic / syntactic’ perspective, we have seen that the notion of
collocation has been extended from traditional restricted collocations and
idioms (curry favour, strike a chord) to less conventional notions such as
grammatical collocation (linking grammatical items with lexical items, as in
phrasal verbs refer to, answer for) and de-lexical verbs (have a break, take a
decision). Many of these patterns can be seen to obey underlying lexical
relationships. The notion has recently been applied to a much wider category
of expression following work in corpus analysis, including semantic prosody
(clusters of semantically related words: push through [a reform, a project, a
law...]), collocational frameworks (lexical and grammatical collocation: not
only... but also, find / make it [easy, difficult, hard, impossible] to + clause)
and colligation (collocation between grammatical categories, e.g. the set of
nouns that can introduce NP complement clauses: the idea, conviction, belief,
thought that). These patterns demonstrate the close correlation between
syntax and semantics and are seen as a confirmation of Halliday’s (1985)
notion of a lexico-grammar: a theory of lexis and grammar as an interrelated
continuum rather than as separate levels.
18
Christopher Gledhill (2000). Collocations in Science Writing.
Fillmore and Atkins (1994) and Kay and Fillmore (1999) have similarly
questioned the need for a distinction between idiom and collocation on the
grounds of syntactic and semantic frozenness. Fillmore, Kay and O’Connor
emphasised the fact that collocations are culturally salient items which need
to be learnt as part of the language. According to their well-known definition,
fixed expressions are:
[…] phenomena larger than words, which are like words in that they have to
be learned separately as individual facts about pieces of the language, but
which also have grammatical structure [and] interact in important ways with
the rest of the language. (Fillmore, Kay and O’Connor 1988:501)
In a similar approach, Pawley and Syder have been influential in the area of
language learning theory, and were among the first to emphasise that
conversational gambits in natural speech were speech acts organised around
fixed expressions of the type it’s easy to talk (a reprimand for some
criticism), she’s busy right now (denying access by telephone) and I thought
you’d never ask (expressing relief after permission has been granted)
(1983:307). They pointed out that these expressions are effectively social
institutions, and have specific cultural functions in the language:
Lexical phrases are parts of language that often have clearly defined roles in
guiding the overall discourse. In particular, they are the primary markers
which signal the direction of discourse, whether spoken or written. When
they serve as discourse devices, their function is to signal, for instance,
whether the information to follow is in contrast to, in addition to or is an
example of information that it to proceed. (Nattinger and DeCarrico
1992:60)
20
Christopher Gledhill (2000). Collocations in Science Writing.
21
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
To give another example, the British English greetings How do you do?,
How are you? How do?, How’s it going?, How goes it? Wotcha! etc. vary
from unmarked to marked in different contexts. The native speaker knows the
core items (depending on dialect) and knows implicitly their rhetorical value
in the phraseological system. How do you do? is felt to be the standard
prototypical form, but this does not mean that it is the unmarked, neutral
choice used in the majority of circumstances. The corollary of this is that
prototypical expressions do not correspond to typical expressions. In
addition, a notion of what constitutes ‘collocation’ or ‘idiom’ may also
depend on an appropriate register or style and part of the meaning of an
idiomatic phrase is its specific context of use in which it is deemed to be
appropriate (a pragmatic dimension rather than a strictly textual one). Thus
from a discourse perspective, idioms (as relatively marked expressions) and
collocations (as relatively unmarked expressions) might not be fixed
categories, but may be perceived differently in different contexts.
Collocations can be said to have a less fixed pragmatic set of uses than
idioms; while lexical phrases, with their specific rhetorical roles, occupy a
position somewhere in-between. From this basic premise, we can postulate a
shifting rhetorical continuum between the usual phraseology of collocation
and other more unusual expressions (including original expressions which
break with collocational convention or stylistically marked idioms belonging
to another discourse).
22
Christopher Gledhill (2000). Collocations in Science Writing.
23
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
of collocation in specialised texts and set out more fully Halliday’s concept
of the lexico-grammar.
The notion that grammatical items are closed class words will serve as my
basic rule-of-thumb in order to identify these items. However, I also wish to
explore the possibility that high frequency items (such as auxiliary verbs is
and has) play an important role in the formation of collocations and fixed
expressions, and assume therefore that such high frequency items are for the
purposes of my analysis ‘grammatical’. This frequency-based approach to
lexis is consistent with Sinclair’s view, and allows for a more nuanced
analysis of words which are often considered to be at the intersection
between grammar and lexis.
24
Christopher Gledhill (2000). Collocations in Science Writing.
This chapter sets the scene for the corpus design in section III and data
analysis in section IV. The aim here is to justify my specific object of enquiry
(science writing in cancer research) and my methodology (an approach
within discourse analysis). I set out here the theoretical basis for a corpus
analysis of cancer research articles. I explain briefly the relationship between
science and language from the point of view of terminology and then from
linguistics (especially genre analysis). In order to put the research article
genre in context, I then discuss a specific discourse community: the
Pharmaceutical Sciences Department, Aston University.
The language of science is a fruitful and well-documented area of
research, most notably in philosophy, sociology and linguistics. The role of
language in science was the object of enquiry of philosophers concerned with
hermeneutics and the reflective function of science (Gadamer, Wittgenstein
and Foucauld) as well as theories of knowledge and scientific epistemology
(Bachelard, Piaget and Kuhn). In sociology there has been much research on
the discourse of science in relation to science policy and the public
understanding of science. There is particular interest in the ways in which
technical issues are affected by economics, politics and personal agendas
(Kevles 1995 sets out a comprehensive history of the discourse of cancer
research). For the most part, research on science writing in linguistics has
been the realm of applied linguistics, in particular the divergent fields of
terminology and discourse analysis. The two approaches can be summarised
as follows:
25
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
2 ) Discourse analysis discusses the activity of science writing and the role of
language use among specialists. Applied research on scientific discourse is
known as English for Specific Purposes (ESP: Swales 1981b,1990), with the
emphasis being on the problems associated with the use of a specific national
language (English) in international science. In applied linguistics, ESP and
‘English for Academic Purposes’ have become widely recognised fields of
research, with dedicated academic journals (English for Specific Purposes,
ESPecialist, Fachsprache, Anglais de Spécialité). Many specialist areas have
come under scrutiny, especially in the medical sciences and areas such as
doctor-patient dialogue and the popularisation of science. The field has
several theoretical traditions, and applications tend to centre on language
teaching.
26
Christopher Gledhill (2000). Collocations in Science Writing.
27
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
However an alternative view has emerged, in which the central concept of the
term has been challenged, and the ‘special’ nature of the LSP has been
eroded, largely because of the increasing tendency for sciences to become
interdisciplinary. The emphasis has turned instead to ‘knowledge-banks’
rather than ‘term-banks’ (Papegaaij and Schubert 1988, Thomas 1993). Many
terminologists see the LSP as a variety of the general language, its difference
lying in functionality rather than abstraction or degree of specialism.
Following the functional linguists Hjelmslev, Bühler and Halliday, Sager,
Dungworth and McDonald (1980) consider the function of terminology and
the LSP within a system of discourses. Science writing is defined not just in
terms of conceptual abstraction, but in terms of its relation to different types
of discourse, and to different structures of knowledge. Firstly, conceptual
discourse is concerned with reference beyond the environment of the text
into the abstract conceptual world of scientific knowledge. Perceptual
discourse on the other hand, involves reference to the immediate physical and
temporal context of the text itself. Finally, metalinguistic discourse
(including extratextual comment) is said to untypical of scientific text and is
a resource that appears to fade away as the language becomes increasingly
graphic and conceptual. Sager et al. also make an interesting distinction
between the LSP and register (in the Hallidayan sense). Halliday uses
register to refer to the traditional ‘modes of discourse’ such as the language
29
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
• Scientific terms follow a chain of definition from LGP words to LSP terms.
• Scientific terms enjoy an absence of ambiguity in context and out of
context.
• Scientific terms avoid figurative or metaphorical meanings.
• Scientific terms have origins that can be definitely traced.
Béjoint asks whether such terms as key idea pointer, bone tissue or bacterial
culture can be considered unambiguous out of context, can ever be traced
back to original definitions or usages, or can be held as un-metaphorical.
Béjoint challenges the underlying assumption that greater precision can be
30
Christopher Gledhill (2000). Collocations in Science Writing.
This discussion leads us to examine the scientific text itself and its role in the
formation of terminology. The Canadian linguist Pavel (1993 a / b) has
emphasised the role of the research article in the formation of terminology.
She postulates that terminological change is contrary to stereotypes
unplanned and opportunistic, and largely emerges from the processes of
scientific writing itself. Other linguists (such as Linstromberg 1991) have
noted that metaphor is a key feature of science writing. In addition, Vidalenc
(1997) points out that the ‘natural language’ philosophers preferred simple
metaphors such as Aristotle’s substitutions and comparisons or Austin’s
speech acts. Salager-Meyer (1990a:354) argues that metaphors can become
dominant in specific research areas. She reports that 70% of head nouns in
medical terminology tend to be metaphorical collocations involving
structures (nerve roots, abdominal walls) while the rest involve processes,
functions and relations (migratory pain, vehicles of infection). In addition,
terminologists such as Koch (1991) and Pavel see the particular choice of a
metaphor as vital in the long-term chances of survival of a specific term, a
neo-Darwinian notion evoked by such writers as Cavalli-Sforza and Felman
(1989) on the cultural evolution of discourse and Chesterman (1997) in his
discussion of collocations and memes as translation units.
Pavel specifically examines the effects of interdisciplinary research in the
terminology of fractal science. Since fractal imagery is largely adapted as
31
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Thus the role of the terminologist has moved from providing definitions and
basic grammatical features to setting out a phraseology of meaning. Besides
constituting patterns of particular importance in the conceptualisation of
fractal imagery, Pavel considers the role of these collocations within the text.
Her claim is that new formulations effectively reconstruct the terminological
knowledge structure of science. As new phrases become neologisms and
accepted terms, these in turn bring along their own suite of associated
metaphors, sometimes from different disciplines. Pavel refers to these
metaphors as LSP collocations (1993a:29). She recalls the example of the
theatre in one model of artificial intelligence (namely: Schank and Abelson
1977), where terms such as ‘scripts’, ‘actors’, ‘thematic roles’, ‘frames’ and
‘props’ help to conceptualise the brain as ‘a theater of mental representations’
(1993a:25). Such terms not only permit analogy in creating a new conceptual
space, but more importantly they bring along the phraseological patterns
from their original context. These terms are initiated, negotiated and finally
accepted by the wider scientific community:
...languages are seen not only as social tools that human communities have
created and are continually refining for communication purposes, but also
as agents that constantly condition individual behaviour by virtue of social
interaction in historically, geographically, and culturally defined settings.
(Pavel 1993a:23)
Even Descartes, that great and passionate advocate of method and certainty,
is in all his writings an author who uses the means of rhetoric in a
magnificent fashion. There can be no doubt about the fundamental function
of rhetoric within social life. But one may go further, in view of the
ubiquity of rhetoric, to defend the primordial claims of rhetoric over against
33
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
35
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
36
Christopher Gledhill (2000). Collocations in Science Writing.
Discourse analysts therefore reject the term ‘special’ in LSP, and refer
instead to terms such as variety (Richards and Schmidt 1983). A variety is
commonly seen as a type of language which varies within a general system,
and there is no implication that it is limited in function to a specialism or set
apart from what is considered to be the general language system. As such it
serves as a generic term. Much work on scientific writing however has been
conducted on the basis of the LSP (as we have seen in terminology). Other
terms have come to be used for specific texts including ‘register’ (Halliday
1966, Biber 1996), ‘genre’ (Swales 1990), ‘text type’ (de Beaugrande and
Dressler 1981:85), ‘sublanguage’ (Lehrberger 1982, McEnery and Wilson
1996) and ‘special text unit’ (Sager et al. 1980). As might be expected, none
of these terms is exactly interchangeable and each carries with it a different
view of the relation between the general language and the specific variety.
37
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Sager et al.’s ‘special text unit’ demonstrates the problems that emerge
when linguists attempt to pin down the variable features of texts. In this
functionalist model, the primary functions of texts are broken down into
categories: status and topic. ‘Status’ is determined by the knowledge
structure which a text aims to represent and modify. ‘Aspect’ is subcategory
of status: the use to which the text is to be put (administrative, pedagogical,
descriptive...) (Sager et al. 1980:102). ‘Mode’ is a also subcategory of
‘status’, representing formality and planning involved in the text. ‘Topic’
involves participants’ knowledge and level of reference (from specialised to
popular) and also includes ‘field’ (from the very broad field of physics to the
narrower field of nuclear physics). Sager et al. (1980:120) claim that these
dimensions manifest themselves in various prototypical categories or special
text units:
question is termed a genre. The linguistic characteristics of the genre are seen
as secondary to its status in relation to other genres and its value depends on
the institutional framework of the scientists or specialists concerned. These
groups are in turn defined as discourse communities: ‘...socio-rhetorical
networks that form in order to work towards sets of common goals.’
(1990:9). Thus while speech communities are defined by the language they
speak (with different registers and dialects), discourse communities are
defined by what they are talking about (with different genres and jargons).
The discourse community always consists of individuals with different
interests and specialisms, but the group is also defined by a common aims
and the fact that all members are aware of the central issues and debates that
preoccupy the community as a whole, even if they do not actually ascribe to
them all. Political parties, trade unions, professional associations, commercial
companies, government organisations, campaigning lobbies, and voluntary
interest groups are therefore all considered to be discourse communities.
Successful discourse communities evolve efficient mechanisms of interaction
and control. These mechanisms include ‘control of technical vocabulary’ and
the establishment of a professional ‘hierarchy of expertise’ (Swales 1990:32).
The texts used by the group, its genres, are central mechanisms of interaction
within the system and are seen as ‘...the properties of discourse
communities... classes of communicative events which typically possess the
features of stability, [rhetorical] move recognition and so on.’ (1990:9). In
other words, a genre is a particular language practice, a text type with a
variable but implicitly recognised set of linguistic features. Scientific
communities recognise a complex system of genres: text books, review
articles, peer-review articles, research journals, grant proposals, lab reports,
calls for papers, conferences, seminars, newsletters and so on. Unlike other
definitions of genre which we encounter below (Biber 1994, for example),
Swales’ notion of genre implies that there is a discourse community behind it
regardless of linguistic or functional definitions of the text.
The language of the genre is seen as very heavily constrained, at least
from the point of view of rhetorical structure and effect (Swales places less
emphasis on grammar and vocabulary). Swales claims that his analysis of
textual genres ultimately stems from Propp’s (1928) ‘Morphology of the
Folktale’. Folktales work because their readers are familiar with conventional
rhetorical events, so readers expect a damsel in distress (a conventional plot
device) or the couple lived happily ever after (a conventional ending). The
point is that these events have conventionalised (arbitrary) wording, and are
highly restricted in content and outcome. Research articles in science have
similar devices, which Swales terms ‘moves’ (described below). Swales thus
sees the genre as means to an end, fulfilling a definite set of communicative
39
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Swales’ approach has been influential, but it is so different from that of other
linguists that the basic terminology and the theories underlying the different
terms have become confused. The originality of Swales’ analysis is that
genres are defined in relation to other genres, not just by a series of internal
linguistic features or external social functions. This differentiates genre from
sublanguage used as a textual category by several corpus linguists, including
Barnbrook (1996), McEnery and Wilson (1996) and Pearson (1998). As the
term sublanguage itself is derived from terminology rather than discourse
analysis, many of these works are oriented to a linguistic description of
terminology, or tend to analyse very broad categories of text rather than
specific text types. Barnbrook (1996: 122) describes a sublanguage as
having:
This definition combines features of the LSP or ‘special language’ and the
‘artificial language’ (‘the use of special symbols’) as well as bringing other
important characteristics into the picture (such as unusual features of text
structure and ‘deviant’ grammar).
Rather confusingly, Swales’ view of genre also differs from the work of
Biber and Finegan (1994) where the term register is seen as a social
convention, and conversely genre is seen as a regular set of inter-related
linguistic features. We have also seen that register can be usefully defined as
the text types used to communicate between the discourse community and the
general speech community, a concept that is more in line with Halliday’s
view of register discussed below (Sager et al. 1998). Since Biber’s concept of
40
Christopher Gledhill (2000). Collocations in Science Writing.
The claim advanced in this book is that discourse analysis provides a more
accurate account of the context and grammatical features of language
varieties than the register approach adopted elsewhere (Biber 1994, for
example). In Swales’ analysis, and unlike Biber’s (1986) concept of register
or Barnbrook’s (1996) use of the term ‘sublanguage’, the principle is that the
same grammatical feature may function differently in different contexts. Any
evidence to suggest that certain features function differently in the general
language and the specialist variety tends to undermine Biber’s view of
register, which places a high premium on identifying differing distributions
of linguistic features and grammatical categories. Biber’s ‘multifactorial’
approach has been to analyse large groups of grammatical features (from a
tagged corpus, such as passives and relative clauses) and to correlate their
relative frequency with certain intuitive internal functions of the texts
involved (such as abstraction, narrative structure). This has led to important
work on specialist texts (Biber, Conrad and Reppen 1998). However, this
approach does not account for the fact that the same grammatical features
may be present in two text corpora but function differently, in which case
linguistic cluster analysis is incapable of accounting for these features of the
genre. Swales therefore calls to attention the very specific means by which
specialist discourse appropriates existing linguistic features and changes their
nature. He calls this the discourse coherence of a linguistic feature, and the
principle is derived from Firth’s theory of meaning.
Swales (1981c) fist demonstrated discourse coherence in his analysis of
the past participle in technical English. He found that participles function
mostly to bring the reader’s attention to non-linguistic text (a table, figure or
illustration as in the curve shown, the list given) or are used idiomatically as
premodifiers (as in a given reaction) in a similar way to classifiers as in a
certain reaction. He argued that these uses are particular to scientific
discourse, and have developed a unique function within the research article
genre. I have similarly noted (Gledhill 1995b) that numbers are used
throughout pharmaceutical research articles as ‘pronomials’, replacing
references to long chemical names. This has consequences for the rest of the
pronomial system of the text (especially the range of anaphora, as noted by
Liddy et al. 1987), and presumably implies that pronouns have a different
profile of use in chemistry texts. These examples certainly fit Barnbrook’s
description of ‘unusual features of text structure’ and perhaps also ‘lexical,
41
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Another reason for adopting the genre analysis approach, is that Swales has
established a tradition of analysing research article sections, not just research
articles as a whole. Such attention to ‘subgenres’ has only been tentatively
explored in recent corpus work (Biber, Conrad and Reppen 1998). Before the
introduction of large corpora, Swales (1990:134) showed that rhetorical
sections (Introductions, Methods and so on) have consistent and predictable
rhetorical structures of their own. While the model is well known and has in
many respects been surpassed by later work (Swales 1998), it remains the
first characterisation of science writing that emphasises differences in
wording and style rather than the assumption that the text has a consistent
system of expression throughout. Swales’ work was followed by a number of
studies extending his concepts to the entire research article genre and also
examining different lexico-grammatical features from the point of view of
‘discourse coherence’. In order to give a broad picture of the research article,
I summarise some of these studies below, separating those studies which
examine the research article as a whole from those which explore specific
sections. Since my main method is to analyse the role of collocation from one
42
Christopher Gledhill (2000). Collocations in Science Writing.
section to the next, it is important to set out here a picture of the general
linguistic properties of each part of the research article in turn. To avoid
confusion subsections of the text (known as rhetorical sections) are
henceforth indicated by an initial capital letter: Title - Abstract - Introduction
- Methods - Results - Discussion.
Swales’ work remains the most detailed analysis of the inner workings of the
research article genre. In the context of the massive flow of written data in
science, Swales sees refereed journals as the ‘traffic officers’ (1991:94) of
scientific information: articles are channelled to the appropriate journals on
the basis of how original or significant they are perceived to be in the
discourse community. In the case of the research article each specialism has
its own conventions regarding graphic and textual format as well as devices
for academic accreditation and citation (Swales 1990:6). Despite these
differences, Swales claims that there is a fundamental underlying rhetorical
system.
At the discourse level, Swales identifies a stereotypical rhetorical structure
that is analogous to the knowledge structures of Schank and Abelson’s
(1977) scripts and Van Dijk and Kintsch’s (1989) textual macrostructure. In
particular, Swales (1981a, 1990) proposes that the rhetorical structure of
Introductions in research articles from a series of different specialisms can be
characterised by a macrostructure of one global purpose: to create a research
space (the CARS model). This aim is realised in obligatory and optional
stages in the argumentation of the text that Swales terms Moves (obligatory)
and Steps (optional) (1990:137). Since moves are rhetorical in nature they
represent a summary of many different pathways that the argument of a text
can go through. The first move, for example, ‘establish a territory’ is made up
of a series of steps which introduce specific areas of the research field as
important and relevant to the study, as well as stating the general topic of the
study and items of previous literature.
The linguistic features of move 1 include time references to previous
research (adjuncts of time such as recently, and use of the present perfect),
evaluative statements of importance or interest to the field (it is well-known
that) (1990:144) or, specifically in step 2 statements of amount or quality of
evidence established in the field (1990:145). In step 3 the linguistic
resources consist of a specification of previous findings followed by a
temporal qualification, reporting phrases (was found to be) or reporting verbs
(show, demonstrate, suggest), and bibliographic attribution (1990:149). The
43
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
A number of other linguistic studies have been carried out on the research
article as a whole. Some work has been carried out on the distribution of
lexical items in research articles (Inman 1978, Love 1993). Most research on
IMRD sections has however concentrated on rhetorical move analysis or
theme-rheme patterns (Nwogu 1989, Nwogu and Bloor 1991). In a different
direction, Atkinson (1992) has traced the historical development of the
scientific paper and the evolution of the IMRD sections (the core sections of
the research article) from letters to editors in the Edinburgh Medical Journal.
Many studies have established that grammatical features (most often
verbal tense, voice, or modality) are associated with specific rhetorical
functions, such as statements about the use of the passive or authorial
comment. Gerbert (1970) for example, analysed 24 verbs in English technical
writing, and found that the present represents a limited set of meanings
(scientific laws, processes and repeated actions, definitions, descriptions,
observations and material properties). The perfect aspect is used to indicate
relevance to the research process. Oster (1981) found that non-finite verbs
tend to be used for attribution and definition as pre-modifiers (tumor-derived
44
Christopher Gledhill (2000). Collocations in Science Writing.
[...] the differences in the communicative purpose and its textual realization
between medical research types has been much greater than previously
assumed [...] (I. A. Williams 1996:195).
largely because scientific texts are treated as whole units and placed together
in order to arrive at coverage of several fields (with the assumption that they
are all related by degree of specialism).
However, there has been much corpus analysis of research articles in the
fields of terminology (Thomas 1993, Pearson 1996) and there is a growing
amount of corpus-based discourse analysis. In a corpus analysis of eleven
texts on oceanography, Banks (1994b) analyses the distribution of the
passive, personal pronouns, modal verbs and lexical hedging (in verbs and
adverbs) across rhetorical sections. He finds that there are phraseological
differences between modals such as can and may and that a high proportion
(69%) of modalised mental process verbs are used in the passive (it is
believed that...). He also notes that the lexical hedging of verbs with adverbs
(probably, generally) is so widespread towards the latter part of articles
(Results and Discussion sections) that their effect is at times redundant.
Myers (1989) has argued that such hedging is obligatory when the author
expresses some imposition on the community (claims, denials, coining of
new terms, apologising for speculation). More recently, Varttala (1999) has
compared hedging devices in a 50 text corpus of popular science and
technical research articles. All of this evidence of ‘hedging’ suggests that a
conventional voice has become entrenched in science writing, a point that is
supported by work on collocations and phraseology.
Corpus analysis on lexical collocation in research articles has also been
undertaken, either taking a phraseological perspective or concentrating on
typical NP complements of verbs. Zambrano (1987) analyses the
phraseological patterns common to Abstracts and Discussion sections,
including phrases identifying general problems, concerns of the research
article (this article / paper / study etc. shows /suggests / investigates etc.),
findings (involving nominal comparatives with show) and implications
(involving a high degree of modality: the possibility that, the fact that).
Master (1991) finds that inanimate nouns (shuttle, particle) are more likely to
be the subjects of active verbs than passives, and such verbs are more likely
to be verbs of causal processes (cause, affect, prevent) than reporting verbs
(show, indicate, suggest) (a distinction echoed in the PSC - the research
article corpus, as described later). Other work concentrates on the clause
patterns associated with certain families of nouns (Dubois 1981, Francis and
Kramer-Dahl 1991).
A small number of studies address the use of grammatical items and
cohesive devices. Thyman (1981) proposes that the description of non-linear
(simultaneous) events in scientific writing has led to changes in the use of
specific cohesive devices, such as the classifying and defining function of
this. This is widely used in the process of reformulation, a point noted in the
46
Christopher Gledhill (2000). Collocations in Science Writing.
3.1 Titles
Very few studies have concentrated on research article Titles in their own
right. Apart from observations of their highly condensed nominal style, little
is known about the relationship between the Title and the rest of the research
article. Generally speaking, Titles are seen as sources for keywords in the
47
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
information sciences. For example, Diodato (1982) has studied the relative
frequency of Title words in 50 chemistry, history, mathematics and
philosophy papers. Her findings indicate that 70-80% of all Title words occur
in the Abstracts and the first paragraphs of articles. She finds that chemistry
papers are the only papers to have an increase in the amount of Title words
throughout the paper, with the largest increase in the final reference sections.
The implication is that Titles are a good indicator of subject-matter, but
Diodato has little to say about the role of the Title in staking out the research
article’s claims.
In a rare analysis of research article Titles as a subgenre, Jaime-Sisó
(1993) examines a corpus of 2 000 journal Titles from six fields of medicine
(all downloaded from the electronic indexing service MEDLINE). Jaime-Sisó
is particularly interested in grammatical change over time. She finds that
from 1980 to 1990 the number of Titles with active clauses (e.g. Dietary fish
oil delays puberty in female rats) rose from steadily 0% to 40%. She observes
that these Titles are used in dynamic areas of science (developmental
biology) and in high prestige journals with consistently high scores on the
impact factor scale (Williams 1996, see section 3 below for an explanation of
‘impact factors’). Jaime-Sisó also finds that the types of verbs involved in
these active-clauses (contribute to, is required for, contains) do not give
empirical facts or findings as such, but oblige the author to justify the novel
results elsewhere in the article. The Title effectively becomes a promissory
notice of results. The point here is that linguistic change reflects the changing
role of the Title in terms of its environment. Titles have to ‘compete’ for
readers’ attention, and the use of Titles to suggest (if not carry) significant
results corresponds to the growing use of graphic abstracts in chemistry and
in other fields. This also implies the increasing independence of the Title and
Abstract as ‘stand-alone’ text types, a concept introduced by Gläser (1991).
Jaime-Sisó is careful to note that the occurrence of active verbs has only
become prevalent in a restricted field: other fields have significantly not been
affected by the trend. These observations require more extensive comparative
work, but do provide an interesting picture of the Title as a key element in the
framing of scientific claims. Although Titles do not normally set out a
propositional argumentation as such (unless they contain a full clause, as
Jaime-Sisó has demonstrated), they clearly have a function in situating the
research article in a wider framework and one might assume that Titles vary
in ambition, from setting out very specific technical points to evoking or
questioning the general status quo.
48
Christopher Gledhill (2000). Collocations in Science Writing.
3.2 Abstracts
49
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
knowledge of a specialist field. Gibson (1992) and Drury (1991) have both
demonstrated that non-author Abstracts which are perceived to be successful
tend to have topical sentence themes as opposed to textual and interpersonal
themes. Drury (1991) finds that rather than simplifying texts, summarisers
tend to render themes more abstract and technical (1991:436). The successful
summariser also reduces the number of relational and embedded material
verbs from the original text, introducing more material processes at the rank
of clause (1991:447: i.e. from It is thought that the temperature rises to The
increased temperature...). This is mirrored by increasing lexical density and
use of grammatical metaphor in successful summaries (Drury 1991:448).
Similarly, Salager-Meyer (1990b) finds that unsuccessful Abstracts are
particularly difficult to read, partly because they omit important moves
(conclusions or purpose) or order them in unexpected ways (results before
purpose, conclusion before results) and partly because the ‘valuable
signposts’ of discourse signalling and cohesive devices are usually absent in
Abstracts (1990b:378).
There has also been much descriptive linguistic work on a typology of
Abstracts. Generally, two main forms are recognised. The informative
Abstract introduces the main ideas and explains the essential points of the
original article. The indicative Abstract on the other hand reformulates the
article, following the progression of the article as closely as possible.
Informative Abstracts in particular are said to use markedly different
expressions and terms than the original text (Cleveland and Cleveland
1983:4). Grätz (1985) claims that most Abstracts in the sciences follow the
rhetorical structure of the original text closely and serve as indicative
Abstracts. However, Gläser (1991) has argued that the Abstract is a separate
genre rather than a rhetorical section, and points to its condensed presentation
of content and lack of deictic reference or stylistic devices. Endres-
Niggemeyer (1985) suggests that authors do not follow journals’ instructions
on Abstract and IMRD sections in any case. She argues that the categories
suggested by journals do not cater for the needs of the reader, and that
authors tend to structure Abstracts and other sections according to their own
specific objectives. This is an interesting observation, suggesting that
rhetorical sections are less clear cut than Swales and others have assumed,
and that scientists impose their own rhetorical goals rather more freely than
might have expected. Endres-Niggemeyer proposes conceptual text types
situated around topical poles, such as the overview and model building
Abstract versus the practice oriented and theory-descriptive Abstract
(1985:45). These are the modes of discourse successfully adopted by authors
rather the kinds of text requested by journals.
50
Christopher Gledhill (2000). Collocations in Science Writing.
51
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
rather than longer clause complexes. Also, Kretzenbacher finds that Abstracts
tend to use nominal groups and finite verbs as attributive elements of clauses,
a typical construction in German (1990:101). Kretzenbacher also finds that
Abstracts have relatively more genitive attributes (part of the general nominal
style in German) and definite articles, while the main texts have relatively
more infinitives, anaphoric reference, and personal deictic reference.
In the first of a series of large corpus-based analyses of Abstracts, Salager-
Meyer (1992) analyses verb tense and voice usage and modality in 84
Abstracts (from 49 research papers, 21 reviews and 14 case reports). She
finds that the active past tense is the most frequent verb form (51% across all
types) and corresponds with the rhetorical moves of purpose, results,
methods and case presentation. The past passive is particularly prevalent in
the methods move, indicating that this is an obligatory form of expression. In
the purpose and conclusion moves on the other hand, Salager-Mayer finds
that the choice of tense is more open to rhetorical interpretation: the present
may be used to state basic truths, but also to emphasise that previous research
is relevant to the study. The present perfect also has a multiple function of
reference to past experiments, introducing a topic as well as distancing the
author from the findings (1992:106). The past tense is found to be much less
prevalent in moves of statement of the problem and data synthesis, where the
function of the past is to indicate the undeveloped nature of previous
findings. Finally, modality is also found to be move-related, with the most
frequent modal, may, indicating a high probability of claims in the
conclusion; can being associated with data synthesis, and should used in
preference to other modals in the recommendation move (1992:105). Such a
consistent use of verbs for rhetorical purposes (in tense or modal form)
further supports Swales’ observations about the controlled nature of scientific
discourse, but also suggests that tenses and verb forms imply a much more
sophisticated set of interpretations than was previously thought.
52
Christopher Gledhill (2000). Collocations in Science Writing.
relatively freer style than other research article sections and are also
considered to provide the writer with a certain degree of stylistic freedom.
Apart from Swales’ (1990) analysis of Introductions set out above, West
(1980) has studied the use of that-nominals which are relatively more
frequent in the Introduction section as opposed to the other rhetorical
sections. Hanania and Akhtar (1985) found the present to be the usual tense
in the Introduction, associated with the functions of introducing background,
establishing assumptions and the purpose of the research. Gunawardena’s
(1989) analysis of 10 biology and biochemistry articles shows that the
present perfect is particularly prevalent in Introduction and Discussion
sections, where both sections relate shared experience as well as report past
research. In their analysis of 15 medical research articles, Nwogu and Bloor
(1991) found that Introduction and Discussion sections have overlapping
thematic structures (associated with explanation and argumentation) while
Methods and Results sections have relatively constantly changing theme
structures (associated with description). Finally, the similarity between
Introduction and Discussion sections has been often noted, especially in
terms of phraseology and use of modal verbs (Salager-Meyer 1992, Williams
1996, Gledhill 1996).
Methods and Results sections are the most inaccessible parts of the research
article to the non-specialist. However, for the expert reader these sections
usually constitute the first port of call, especially in the experimental
sciences. While few studies have concentrated on these sections in their own
right, a small number of comparative analyses have been carried out.
Generally speaking, Methods sections are found to be predictable and
repetitive, and generally set out procedures as well as detailed findings. It is
well known that Methods account for the vast majority of passive verbs,
especially in chemistry (Hania and Akhtar 1985). Ironically, findings are not
always fully set out in Results sections, which are generally limited to
reformulating the Methods and summarising quantitative observations and
statistics. Evaluation and interpretation are reserved instead for the
Discussion section. Practices vary considerably from one journal to the next,
and sometimes these sections are combined or accompanied by
supplementary sections known as ‘Materials and Methods’, ‘Experimental’
or ‘Results/Discussion’.
For Swales (1990), Methods sections constitute the core science of the
research article. In most cases, especially in structural chemistry, the
53
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
54
Christopher Gledhill (2000). Collocations in Science Writing.
In the previous sections, I have set out an introduction to the theory of the
terminology and discourse of science. In this section I examine these theories
in the context of a cancer research laboratory. In the first part, I explain the
context of cancer research and set out a basic explanation of cancer with a
view to defining the discourse of cancer research itself. I then conduct a
survey of cancer researchers, designed in part to provide a context for the
corpus set out in sections III and IV. Given that many of my informants have
themselves contributed their texts to the corpus, any light they can shed on
the writing process and their use of research articles is relevant to this study.
55
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
that journal. These correspond to the titles and bibliographic data listed in
Gledhill (1995b) and in Appendix 2].
56
Christopher Gledhill (2000). Collocations in Science Writing.
This section describes some of the problems encountered when one considers
the extent to which a corpus can be ‘based on’ a very specific discourse
community.
59
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
60
Christopher Gledhill (2000). Collocations in Science Writing.
This raises the potential distinction between the discourse community and a
community thrown together from the point of view of an institution (a
difficulty discussed in Swales 1998). Generally speaking, institutional
communities do not necessarily correspond to the notion of discourse
communities (defined by ‘what they talk about’ and social networking rather
than by socio-economic grouping). An extensive survey of 20 000 academics
by Boyer (1994) has suggested that many researchers in British universities
have a greater sense of identification with their discipline than with their
own institution. As we have seen above, simply because a researcher is
working on a ‘cure for cancer’ does not mean that he or she defines their own
specialism as ‘cancer research’. The survey reveals below that the research
goals of my informants were not fixed to cancer research per se and that
researchers did not always respond to the question ‘are you working on
cancer research?’.
For example, the structural chemists (SF, BF, JG) had recently won a
substantial grant from the Cancer Research Campaign - yet during the survey
they distanced themselves from cancer research per se. Such issues as
funding or research group membership is therefore not a clear guide to an
individual or group’s perception of community, at least as they present
themselves to outsiders. To complicate things further, one informant admitted
that there was an unofficial policy of understating involvement in cancer
research because of potential animal rights protests. In another example, the
pharmacist WF felt obliged to switch his research to DNA molecules from his
more original work on a specific inhibitor because of departmental policy.
Did WF feel he belonged to the community of ‘cancer researchers’? His
answer to this was not clear-cut. Such institutional matters of policy and
presentation presumably constitute an area of tension in the department, and
suggest that a corpus of texts on ‘cancer research’ is not a truly accurate
description of the kind of texts and genres that the scientists see as valid and
central to their professional work.
It might be possible to determine which texts to include in a specialised
corpus by referring to statistical measures of importance or centrality, such as
the impact factor. Such a measure would presumably separate the choice of
texts from the personal and subjective feelings of the researchers. As
mentioned above, the impact factor (IF) in the Science Citation Index is a
statistical measure of the number of references that have been made to a
61
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Survey question 1). What is your title and position within the
Pharmaceutical Sciences department? The survey involves a wide range of
scientists: the chief academic administrator (PRL), three professors
(MT , WI and AG), two senior tutors (RL, KW), one senior lecturer (PL), five
lecturers (DP,WF, JG, SF YW) and three research fellows (DA, HM, RW).
Survey question 2). What is your specialism, the main field to which you
would say you belong?
62
Christopher Gledhill (2000). Collocations in Science Writing.
The symmetrical way the scientists fit into the department’s research groups
was not echoed by researchers’ opinions about their own specialism. All the
members of the Cancer Research Group described themselves first as
microbiologists, and stated that their general expertise was in cancer research
(MT, KW, YW metabolic effects of cancer, PL cellular properties of tumours
compared to other diseases, AG chemotherapy and cellular delivery of drugs).
Another three microbiologists were interested in cancer and how
its treatment affected their own discipline, citing expertise in enzymology
(PRL), cell differentiation (DP) and developmental biology (RL). On the
other hand, the pharmacists and chemists also cited cancer as the first of
many applications of the synthetic molecules they are designing. WF is an
expert on the synthetic production of organic compounds that are part of the
chain structure of DNA, as well as cyclic compounds that can
inhibit carcinogenic factors. SF, WI and RW are each interested in the link
between growth inhibition and a specific family of compounds (phosphates).
JG is concerned with the synthesis that takes place between medical
compounds and their target sites. DA is interested in the structural elaboration
of chemical chains, with long term medical applications.
The perceptions of researchers about each other also made this a complex
issue, RW describing the ‘pure chemist’ WF as a cancer researcher. As noted
above, these differing perceptions arise from the complexity of the problem,
and from the seeming impossibility, within the field, of conceiving of cancer
as a unitary entity or process.
Survey question 3) How would you describe your field of research in terms
of a) its aims?, b) its main concepts or objects of research?, c) its
methods?
63
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
they were keen to mention possible applications and diseases, their methods
differed more distinctly from their aims than those of the other research
groups.
The survey question suggests that informants state the aims
and methodology of the research discipline. However, it is hard to see how
these cannot also include claims of centrality and individual originality, and
this is how most respondents answered it. The phrasing of most of the
methods (items such as new, novel, development, accurately) and some of
the aims (WF, MT) emphasise at least some implicit claim of individual
originality within the context of an established research paradigm.
Survey question 4) How does your own specialism relate to those of your
colleagues inside and outside the university?
64
Christopher Gledhill (2000). Collocations in Science Writing.
Survey question 5) What are the main sources of information for your
research?
Researchers in the sciences notoriously skim and scan their texts, often
using them indexically (as we see below). The range of sources is therefore
wider and more likely to be driven by indexes, both the basis of traditional
indexes or on computer. Text books appear to be given much less priority,
although they are obviously important for teaching (not a priority in the
PSD). Research articles, indexes and electronic indexes were cited as primary
information sources. Researchers were asked to select five journals of general
interest and five that they considered essential to their own field. They found
this rather difficult, presumably because of the sheer number of possible
responses. Among the journals researchers mentioned, Nature, the British
Medical Journal (BMJ), the Lancet and the International Journal of
Cancer (IJC) were mentioned by over five researchers. Science,
Pharmaceutica Acta Helvetica (PAH), the British Journal of Pharmacology
(BJP), Cancer Chemotherapy and Pharmacology (CCP), Cancer Research
(CR), Journal of the Chemistry Perkin Transactions (JCPT) and Journal of
the American Chemical Society (JOACS) were all mentioned more
than once.
Researchers also mentioned extensive use of the electronic Title and
Abstract databases MEDLINE, SCI, Index Medicus and ADONIS. Some
claimed that these were beginning to replace traditional ‘journal loyalties’
since a relevant title may be found in an index which covers hundreds of
journals, all from the researcher’s office. PRL suggested that regional and
specialised journals would flourish since their coverage could be made
more widely available through publication in indexes.
65
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
been taken. Nystrand’s dynamic reading model (1988) proposes that such
decisions are probabilistic, based on factors that are given different
weightings which change according to how far along the decision making
process the reader has gone. Researchers were asked to demonstrate with a
journal at hand which articles would attract their attention: JG proposed that
he read around ten papers per hour from as many journals.
Other researchers stated that they read from one morning a week to ‘every
spare moment’, in the library or on the train, and when they occasionally had
to check for specific information in the lab.
Key terms in Titles, as well as compounds in formulae, recognisable
diagrams and data formats are the first entry points and the first clues. The
respondents stated that specialist entities (a term I use later but first employed
by WF when talking of specific compounds, cell lines, diseases etc.) were the
main criteria, followed by or in combination with abstract properties or
processes (stability, expression, total synthesis). Both entities
and processes were inferable from titles, figures and reaction schemas, as
mentioned in the introduction. Neither had to be exactly in the researchers’
first list of major concepts. Another motivation for reading papers was
curiosity, to catch up with related fields, or according to PL ’keep up to date
general science I should know’. DP stated that a half-relevant term
would ’fish out a subset’ to provide a relevant connection. WI states certain
preliminary questions that the researcher brings to the journal:
66
Christopher Gledhill (2000). Collocations in Science Writing.
Survey question 7) What information do you derive from titles, abstracts, and
other sections of the research article?
This revealed perhaps some of the most interesting discussion with the
expert informants. Two reading patterns emerged: browsing and consulting.
While browsing involves skimming the text for relevant details, consulting
involves what I term the ‘indexical’ function: researchers use a number of
different entry-points (graphics, keywords, bibliographic references) to
approach the text. The text therefore becomes non-linear, and is structured
accordingly to allow for this. Most generally, indexical reading takes place in
the lab, when a straightforward fact is required from a text book or an index.
The fact that some technical research articles are used in this way constitutes
a major difference with research articles in the humanities, for example, and
implies radical differences in the way the text is organised. Most chemistry
texts for example establish temporary codes for relevant chemical
compounds which allow the researcher to look directly at diagrams and then
jump straight into the text. The information derived from different parts of
67
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
claimed that there was an overlap between them, as Methods sections start
off as lab book transcriptions combining a template of measurements, while
the Results ‘re-ordered’ the measurements. This corresponds with an
unexpected symmetry in my corpus: all of the ‘Experimental’ sections
occurred in chemistry journals, and these often replaced Methods and Results
sections in these journals (especially in the shorter ‘communications’ papers).
Presumably the experimental data for the pharmacists can stand alone, while
the shape of the data and medical applications can be treated separately in the
Discussion section. In contrast, the microbiologists (PL, MT) saw Results
and Discussion sections as distinct from Methods. Indeed, in the corpus all
the joint Results/Discussion sections occur in microbiology and cancer
journals. PL stated that this was because experimental data are seen as an
‘extension to the research model’ (as AG implied above) and thus in
microbiology actual results should be interpreted and integrated in the
context of medical applications.
This implied distinction between applied biochemistry and theoretical
chemistry may be an oversimplification, but any distinction between these
two essentially different positions means that not all of the rhetorical sections
are equivalent, even if they have the same subtitle in different journals. As far
as the corpus is concerned, this forces us to down-play some of the
distinctions to be made between such sections as Methods / Results and
Discussion sections. In practical terms, I was also obliged to exclude a small
number of hybrid sections (most notably Results / Discussion sections) from
the main Wordlist comparison, since the two sections were completely
merged in some journals.
69
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
c) How is the writing related to the research activity, and where is it stored?
Research articles are not only read in non-linear fashion,
their production appears to be non-linear as well. Myers (1990) suggests that
a paper is built and redrafted by several writers from the ‘middle’ out towards
the Introduction and Discussion sections. Different members of the PSD
conferred that they record reaction details of syntheses and other
measurements over a period of months in the lab book with its various
sections:
This template provides the shape of the Methods, Results and Experimental
sections. When transferred to the word processor, this list forms the backbone
of the research article that can be fleshed out by adding explanations of
unfamiliar procedures.
Survey question 10) What procedures exist to ensure the quality of research
writing?
This question attempted to raise issues of editing as well as peer-review.
All the researchers referred to the instructions for authors included in most
journals. The Journal of the Chemical Society (Perkin Transactions)
stipulates the format and the constitution of the research article, especially
concentrating on the Experimental section and on the organisation of material
(reaction schemes, the use of italics for position-defining prefixes, hyphens
for chemical bonds etc.) as well as setting out rules for the authentication
of novel compounds, this being the primary objective of the specialism.
Contributions are generally judged on criteria of:
71
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
72
Christopher Gledhill (2000). Collocations in Science Writing.
In the first part of this book, I have demonstrated some of the complexities of
the terminology and discourse of cancer research. In this section, I set out the
theoretical and technical notions of phraseology and collocation on the basis
of Firth’s theory of meaning. This prepares the way for an analysis of
collocations in research articles in section IV. As collocational analysis
requires large amounts of authentic textual data, the final sections of this
section set out the design features of a representative corpus of cancer
research articles: the Pharmaceutical Science Corpus (PSC).
73
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Here ‘context’ refers to textual context (co-text) in the first instance, but also
to semantic knowledge and Malinowski’s ‘context of situation’. The point is
argued in similar terms by Wittgenstein, who not only conflates meaning
with use, but also links our understanding of an instance with our knowledge
of the whole system:
is typically seen in the use of the passive. From a thematic point of view, the
passive effectively ‘saves’ new information in the message until the end of
the sentence. Although this is seen as a prototypical feature of science
writing, the same process occurs in other genres, especially news reporting
(McCarthy and Carter 1994).
Finally, the interpersonal metafunction involves the clause as a rhetorical
proposal which can be subjectively asserted or qualified. In science writing,
the interpersonal function is realised by various impersonal devices which
effectively obscure the direct involvement of the scientists or express some
degree of ‘polite’ hesitation in order not to overstate the claims of the author,
as pointed out by Myers (1989). Modality in science involves inanimate
subjects (results suggest that), the hedging of data using modals (it may be
the case that), the use of mental or verbal process nouns (projecting nouns
such as belief, suggestion) and, as might be expected, the generalised use of
the passive (cell growth was analysed). In the above example, the sentence
can be seen to have the same propositional meaning as This protein is a
major factor in breast cancer, but incorporates a further degree of modality
in the form of a mental process verb (thought). This is further modalised by a
passive (is thought to be) in contrast to a more direct alternative ‘we believe
this protein to be a major factor...’.
Thus from Halliday’s point of view, a specific grammatical form can be
treated to different kinds of interpretation within the same overall framework.
The passive emerges as a simultaneous collaboration of three different
choices: a way of placing the agent or medium (an ideational function) in the
‘new’ position of the clause (a textual function) at the same time as avoiding
the expression of personal involvement (an interpersonal function). Although
the metafunctions are often discussed in terms of clauses, they are not tied to
grammar alone and have provided a framework for lexical studies of idiom
(Fernando 1996) and the analysis of scientific texts (Wikberg 1990,
Mauranen 1993).
The concept of value-related choice is at the heart of Halliday’s systemic
grammar. As Halliday puts it:
The system of available options is the ‘grammar’ of the language, and the
speaker, or writer, selects within this system: not in vacuo but within the
context of speech situations. Speech acts thus involve the creative and
repetitive exercise of options in social and personal situations and settings.
(Halliday 1976:142)
The term ‘systemic’ therefore indicates choice within a system. The concept
of choice does not imply free expression with infinite possibilities, but
instead indicates a continuous spectrum from a typical to a more marked
76
Christopher Gledhill (2000). Collocations in Science Writing.
77
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
This view is not far removed from Enkvist, who provided a definition of style
that is tailor-made for corpus linguists, being statistical in nature as well as
incorporating the idea of register change:
The style of a text is a function of the aggregate of the ratios between the
frequencies of its phonological, grammatical and lexical items, and the
frequencies of the corresponding items in a contextually related norm... past
contextual frequencies change into present contextual probabilities, against
whose aggregate the text is matched. (Enkvist 1964:28)
from the cell / the cell as it is grown). Halliday has shown that scientific texts
systematically construct compound nominals by building the nominal up
piece by piece until several explicit grammatical relations are finally hidden.
The following example demonstrates how compound nominals are typically
formed within in a single text (Halliday 1992:70-71):
How glass cracks ... The stress needed to crack As a crack grows ...
glass ...
The crack has advanced ... will make slow cracks The rate at which cracks
grow grow ...
The rate of crack We can decrease the Glass fracture growth
growth ... crack growth rate ... rate ...
80
Christopher Gledhill (2000). Collocations in Science Writing.
This militates against the view of a text as a unit where every semantic
signifier and signal plays an equal and necessary role. Hoey’s conclusion is
that texts may make use of fixed expressions in order to allow the reader to
predict content and argumentation (1991a:154). He points to cloze testing
where informants successfully fill in lexical gaps and reconstruct coherent
text (he calls this the Jabberwocky principle, since the only clues lie
effectively in identifying the typical members of meaningful grammatical
81
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
frameworks). This may also explain the observations I set out in the survey,
which suggest that researchers read ‘indexically’; that is, they are able to
successfully predict and by-pass much of the linear detail of the research
articles they have to process. As an extended reformulation, the research
article need not be read from beginning to end for all purposes. Lundquist
(1989, 1992) appears to provide evidence for this by showing that non-
experts who read scientific texts tend to rely heavily on lexical networks to
establish long-range links, while experts do not need explicit signalling and
are thus able to skip and skim through the text and establish a meaningful but
partial reading of the text (1989:141).
However, Myers (1991) has argued cohesive systems are in fact specific to
different registers, and take on different functions in the research article
genre. In his analysis of cohesion in science writing, Myers (1991:13) points
out that a reliance on lexical networks is not enough for non-expert readers.
Myers underlines the difficulty involved in deciding how cohesive lexical
repetitions really are, especially in terms of synonyms (DNA vs. genome) and
superordinates (molecule vs. product of transcription). He argues (1991:5)
that background knowledge of the scientific paradigm is essential for any
networks to be built up, and this accounts for the differing forms of cohesive
devices used in scientific and popularised texts. As with Hoey, he suggests
that phraseology may be the key to understanding cohesive relations:
2. The Lexico-grammar
In the Introduction, I set out some of the theoretical issues surrounding the
notion of collocation, and suggested that collocations can be analysed in
terms of three increasingly complex standpoints: statistical / textual,
semantic / syntactic and discoursal / rhetorical. I argued that these three
perspectives are compatible and bring considerable value to the notion of
collocation. The statistical / textual approach insists on collocation as a
product of on-going discourse and seeks data which is unconstrained by
theory and categories which may be ‘self-selecting’. The semantic / syntactic
approach on the other hand demonstrates the need to restrict the analysis of
collocation to meaningful expressions and the need to establish the internal
cohesive properties of each phrase. Finally, the ‘discoursal / rhetorical’
perspective underlines the textual function of collocation as well as the idea
that collocations operate in a system of alternative choices of expression. It is
not surprising that the three approaches lend themselves naturally to a three-
stage methodology (data analysis, data selection, interpretation), and I
attempt to set this out my corpus methodology, below.
83
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
data base (Luhn 1968, Yang 1986, Källgren 1988a and 1998b, Wilbur and
Sirotkin 1992). Previous studies have claimed that high frequency items are
stable in use and meaning across different types of language, and the reverse
assumption is that if a word is stable it is a ‘grammatical item’ or a ‘function
word’. Sager et al. (1980:238), for example associate a descending type /
token ratio (a measure of the density of different word forms) with increasing
levels of specialism in technical texts, that is: the most frequent words in the
language account for proportionally less of the total vocabulary of LSP texts.
They assume from this that high frequency words are of little use in the
analysis of specialist texts. Phillips also characterises grammatical items as
noise, distinguishing them from ‘carriers of local meaning in text’ (1985:66).
There are obvious justifications for this in an automatic analysis of semantic
structure in text. The assumption of redundancy has also been applied to
high-frequency items, even in collocational studies such as the BBI dictionary
(Benson et al. 1986) which eliminates common words (such as big, cause and
make). And the influential lexicologists Thoiron and Béjoint have stated that
high frequency words can collocate with ‘almost any words in the language’
(1992:7).
Yet if we are to adopt a systemic approach to discourse, it is important to
see grammatical items as fully part of the lexical system as a whole. While
Halliday proposes a theory of grammar and Sinclair works on lexis, both
view lexis as the bedrock of grammar and both see grammar and lexis in
terms of a continuum rather than a categorical divide. Halliday in fact terms
the complete grammatical system a ‘lexico-grammar’, where grammar is a
heavily constrained and abstract form of vocabulary rather than a separate
linguistic level:
Grammar and vocabulary are not two different things; they are the same
thing seen by different observers. There is only one phenomenon here, not
two. But it is spread along a continuum. At one end are small, closed, often
binary systems, of very general application, intersecting with each other but
each having, in principle, their own distinct realization [...] At the other end
are much more specific, loose, more shifting sets of features, realized not
discretely but in bundles called ‘Words’, like bench realizing ‘for sitting
on’, ‘backless’, ‘for more than one’, ‘hard surface’; the system networks
formed by these features are local and transitory rather than being global
and persistent (Halliday 1992:63)
84
Christopher Gledhill (2000). Collocations in Science Writing.
lemma or base word. Thus goes and went are analysed separately from the
base form go, as though they are separate lexical items. As we have seen in
the previous chapter, Sinclair holds collocation to be a purely statistical and
syntagmatic feature of language: collocations do not have to be fully
grammatical, and are not necessarily limited to the boundaries of the phrase
or the clause. And as with Nattinger and DeCarrico’s approach, this feature
alone makes Sinclair’s idea of collocation a very different notion to the
mainstream view in lexicology and phraseology studies.
The starting point of the idiom principle is that the collocational behaviour
of a word is not an issue of individual item selection, but depends on the
unstable and shifting nature of the word as a whole unit and the indeterminate
nature of its grammatical class, at least in a historical perspective. Sinclair
points to word blends as clear instances of items that have lost their status as
separate words in English (because, of course, maybe, another, altogether,
alright etc.). Many of these expressions represent the kind of
grammaticalisation observed in the development of pidgin languages: the
gradual formation of grammatical words from bound lexical phrases
(Traugott and Heine 1991). For example, Tok Pisin uses the lexical bye and
bye and finis from English as grammatical particles of aspect. Words are
therefore not fixed in position but may be used along a continuum from pure
vocabulary items to features of grammar. This degree of continuum from one
category to another is also evident in in lexical paradigms. Hence suppletion
is seen in forms such as went (originally derived from the verb to wend),
which historically drifted into the paradigm for the verb to go. The
conjugation paradigm of a verb may be a cognitive reality, but its
constituents are historically contingent and unrelated.
This kind of long-term change suggests that the upper level boundary
between the lexical item and the phrase is in constant flux. But there is also
evidence for what might be seen as the development of larger-than-word
lexical items in the contemporary language. Nattinger and DeCarrico
(1992:24) and Willis (1993:88) refer to holophrastic phrases: prefabricated
chunks of language which lead a clichéd or marginal existence, including
wannabe, allgone, watsup? Similarly, high frequency content words (such as
the delexical verbs get, make, set, take) also depend on complements or
particles to be fully lexical semantic units (get even, get on, make for, make
way, set up, set off, take place, take part etc.). Sinclair has suggested that the
many combinations in which these words enter must form a large part of the
total lexicon (rather than a simple count of single lexical items), and that
many texts may be characterised as being largely ‘delexicalized’
(1987c:323). This modifies somewhat the traditional view of lexical density
(Ure 1971), which relies on a count of lexical forms and does not normally
85
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
86
Christopher Gledhill (2000). Collocations in Science Writing.
87
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
88
Christopher Gledhill (2000). Collocations in Science Writing.
collocation is not just simply about the grammatical items themselves. The
theory of lexico-grammar implies that grammatical items are simply
consistent elements in longer-range fundamental phraseology.
We have seen so far that a statistical analysis of collocation may be a
sufficient basis for establishing the basic collocational properties of words.
We have seen that grammatical collocation is an important feature of the
general language, at least in English, and that certain studies have posited a
fundamental role for collocation as a bridge between the notion of the word
and the text. However in practice, as I have noted in the Introduction, the
statistical notion of collocation needs to be restricted (in terms of the internal
cohesion of the expression) and also requires a more contextual interpretation
(in terms of its place in the general discourse). These issues are well known
in the field of corpus linguistics and lead us to a wider discussion of
approaches to corpus analysis and the identification of collocations in
specific text archives.
3. Corpus Linguistics
89
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
and Mercer 1993), word association tests (Church and Hanks 1990), natural
language processing (especially the application of syntactic notation: Leech
and Fligelstone 1992), general lexicography (Clear 1987, Sinclair 1987),
semantic labelling for dictionaries and language research (Vossen et al.
1986), machine translation (Schubert 1986), the development of
terminological knowledge banks (Ahmad et al. 1991) and the development of
language teaching materials and syllabuses (Willis 1990, Johns and King
1993).
Generally speaking, there are three different schools in English-speaking
corpus linguistics. Firstly, there has been much corpus-based work in
computational linguistics and terminology, with a long tradition of statistical
modelling (Butler 1985a, Oakes 1996). Secondly, descriptive linguistics has
concentrated on the tagging and parsing of corpora, usually within a
generative framework (the Lancaster school: McEnery and Wilson 1996).
Similarly, corpora are also tagged for text type analysis (Biber, Conrad and
Reppen 1998). A third tradition involves the development of corpora for
applications such as language learning (as emphasised by Barnbrook 1996) or
dictionary-building (in a continuation of the Cobuild project: Sinclair and
Renouf 1991) as well as the statistical analysis of texts in authorship studies
(Oppenheim 1988). The third approach usually entails an emphasis on
statistical properties of the texts rather than parsing procedures. Since I have
adopted a view of collocation from Sinclair’s and Halliday’s perspective, the
third approach is particularly relevant to my methods of corpus design and
analysis.
The Brown corpus of one million words was one of the first electronic
stores of texts for the analysis of English, with the underlying aim to be as
representative of the general language as possible (Kučera and Francis 1967).
The London-Oslo-Bergen corpus (LOB, Svartvik and Quirk 1980, Svartvik
1992a/b, Leech 1991) was also built up to one million words and was one of
the first to attempt coverage of different language varieties, including 15
types of written text – although the texts were artificially curtailed, with a
maximum length of 2 000 words. Nevertheless, LOB constituted for some
time a major source of data for the study of text types (Biber 1986 et seq.).
While the first generation of corpora were developed for general linguistic
description, the second generation aimed at maximum coverage of the
language for the purposes of dictionary-building. These included, in the UK,
Birmingham’s Bank of English (once known as the Cobuild corpus: Sinclair
1991) and the British National Corpus (BNC) of Oxford University and
Longman (Burnard 1992). These corpora quickly built up the number of texts
to hundreds of millions of words by accessing the electronic press and other
networks that became available in the early 1990s. Although both corpora
90
Christopher Gledhill (2000). Collocations in Science Writing.
had at one point over two billion words (Sinclair 1993a, Rundell and Stock
1992), each corpus has recently been limited to a selection of just over 100
million words. Another notable corpus project, the Cambridge Language
Survey, attempted to build up corpora and develop software in order to
compare seven major languages with particular emphasis on developing
agreed codings (tags) for semantic, functional and syntactic categories
(Atkins, Clear and Ostler 1992). These lexicographic corpora have now been
joined by a third generation of more fragmented text collections, including
dialect corpora, spoken corpora, restricted language corpora and other
specialist text collections (Svartvik 1992:12, Biber, Conrad and Reppen
1998).
As corpora grow in size and complexity, ‘representativeness’ or an idea of
what proportion of texts should be included in the corpus has proven to be a
major stumbling block. In his comparison of three major English language
corpora (Brown, LOB, and Cobuild), Ljung (1991) points out that within the
most frequent 1 000 items of each corpus, 204 words are not shared. Such
differences seem to undermine the claims of the corpus-builders that their
corpora are representative of the language in general. Ljung further notes
very important genre differences between the corpora, especially Cobuild,
with its large number of high frequency abstract nouns dealing with domains
of behaviour, geometric shape and politics - the kinds of lexical
preoccupations to be found in journalism (1991:249). Because of the wide
availability of journalistic texts in the initial years of corpus analysis,
linguists pointed out that the data in large corpora were susceptible to stylistic
bias (Rundell and Stock 1992). While quantitative representation is a
problem, there are also artificial barriers to inclusion which arbitrarily restrict
the nature of the corpus. For example, Burnard (1992) noted that his own
corpus, the BNC had a no-translations policy which eliminated such
influential texts as the Bible. Similarly, Collins and Peters (1988) have
questioned the motivation behind the text categories of several corpora. They
note for instance that LOB gives as much weighting to belles lettres,
biographies and essays as to the Press or learned and scientific writings.
Nevertheless, genres are by their very nature unequal, and it is perhaps
unreasonable to describe the whole language on the basis of equally
represented text-types. One might argue that the spoken language and
dialogue should make up the vast majority of any general language corpus,
since the corpus may wish to represent exposure (from an individual’s point
of view) rather than textual variety. The other possibility is that each
recognised register or genre should have an equal footing because the
language system is not wholly represented in the more frequently
encountered varieties. These are clearly fundamental questions but with very
91
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
few straightforward solutions. It is for this reason that it may be prudent not
to scale down the corpus, but to favour the analysis of specialised genres.
However, as noted above in terms of the discourse community, even the
question of representativeness of a single subject matter (cancer research)
appears to be a complex issue.
92
Christopher Gledhill (2000). Collocations in Science Writing.
93
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
frequency of function words. Fox (1993) has analysed the frequency of then
following sentence subjects as a characteristic of the language of law
enforcement. Choueka et al. (1983) studied collocation in the language of the
New York Times. Butler (1993) studied discontinuous collocational
frameworks in Spanish magazines and found that prose articles can be shown
to be different to interviews. He found that frameworks contain more textual
information in the former and interpersonal, discursive phrases in the latter.
Finally, Collot (1991) has examined the use of comparative constructions in
e-mail communication. As noted above, with some exceptions (Butler 1993,
Banks 1994b, Gledhill 1996 and Williams 1996) the focus of work even in
such a large area as stylistics or register studies has been on grammatical
categories rather than on collocation and phraseology.
94
Christopher Gledhill (2000). Collocations in Science Writing.
95
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
96
Christopher Gledhill (2000). Collocations in Science Writing.
al. (1983) in their study of the New York Times corpus and by Burnard
(1992:15) who terms them ‘text-oriented’ co-occurrences.
97
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
being carried out from Sinclair’s perspective does in fact exploit tagged
corpora.
Perhaps one of the more hotly contested points has been over the extent to
which it is necessary to mark up the corpus grammatically. The
‘collocationalists’ and followers of Sinclair argue that since they do not
impose traditional grammatical categories, only their approach can achieve
original insights about language:
Conversely, Leech and Fligelstone (1992) and others consider that the
counting of concordance items is at best ‘a trivial facility’ and that the only
significant data can come from annotated corpora. Aarts is of the opinion that
without some degree of syntactic classification, a corpus is useless:
[...] as everyone knows, the comparison of corpora containing just raw text
cannot go beyond linguistically rather trivial observations. (1992:180)
1996, Barnbrook 1996). In this light, some tag sets have attempted to
incorporate ‘discourse items’. Svartvik (1993:24) has proposed a 170 tag
system with labels such as greeting, fluency device, hedge and so on.
Linguists who impose tags on a text in such a ‘manual’ fashion are faced with
the difficult task of lemmatisation, whether to treat forms such as be, is, are
as one or different word types. Lemmatisation is particularly criticised by
Sinclair (1991) and Francis (1993) who point out that it is a redundant
process because collocational patterns tend to reveal differences between
word types: the collocations of be are different to the collocations of is and
this distinction is effectively eliminated if both are counted as the same
lexical item. There is also some statistical evidence in support of this.
Youmans, in his analysis of the ‘velocity’ or rate of change of frequency of
new words in texts found that lemmatisation does not significantly change
the curves of type / token ratios (1991:766). Whatever the accuracy of
tagging and parsing, I hope to demonstrate below that the quality of analysis
relies just as much on the depth of preparation of material as on the formulae
used to arrive at automatic analysis.
The fact remains that manual analysis of unrefined concordances can still
reveal much interesting data. This is especially true of features of discourse
which do not have categorical forms (such as evaluation, modality,
grammatical metaphor, discourse anaphora and so on.) as the work of Stubbs
(1996) and others has demonstrated.
One of the more fundamental debates that have been conducted in corpus
linguistics centres on Sinclair’s claim that corpus work must attempt to
account for the naturalness of authentic data rather than a theoretical search
for an abstract notion of grammaticality. However, many linguists warn
against seeing the corpus as a guarantee of truly objective data. In Fillmore’s
(1992) analysis of the use of the word risk he demonstrates that the word has
a unique lexico-grammar in the language in that ‘running a risk’
conceptualises harm as a result of an action, while ‘taking a risk’ sees harm
as a result of a goal. But he cannot see how a computer could ever come to
determine such a pattern, or how it could rule out alternative expressions.
Chafe takes a similar stance:
A corpus cannot tell us what is not possible... Should it ever come about
that linguistics can be carried out without the intervention and suffering of
a native-speaker, I will probably lose interest in the enterprise. (Chafe
1992:59)
In a sense, this argument could be turned around against tagging, since Chafe
and Fillmore are discussing linguistic features that appear to be beyond
99
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
automatic parsing, but are not beyond more basic empirical quantification. In
any case, Chafe, Fillmore and others claim that Sinclair has missed the point
about intuition, and has ruled out the important function of negative data in
constructing a model of syntactic principles. For them it is important for the
model to be able to explain why certain features of language do not occur,
and the corpus does not provide this explanatory adequacy. They also point
out that there is nothing inauthentic about a native-speaker’s intuitions about
examples and counter examples (although as we have seen, other
generativists have made much use of corpora to test their hypotheses for
positive data).
Chafe’s point essentially contrasts the generative linguist’s preoccupation
with selected counter-examples with the empirical linguist’s interest in
authentically occurring data which is often more difficult to analyse.
Sinclair’s approach is not concerned with grammaticality but with an
account of naturalness in language. Native intuition and invented examples
may be enough to explain the underlying syntactic principles of potential
expression, but they are inappropriate when we need to address issues of
style and textual acceptability. He argues that although the corpus replaces
introspection in linguistic analysis (essentially guessing at data and inventing
examples), the computer still implies the use of human intuition (a native
speaker interpretation, a linguist’s skill in explanation), a factor that Fillmore
and Chafe appear to have overlooked.
In addition, a corpus of authentic texts is undoubtedly the product of a
human intuition, but the linguistic behaviour used to produce authentic texts
is uninhibited, unselfconscious and natural. The same can not be said for
invented examples or examples created to prove some grammatical point.
Sinclair cites a continuum of examples from cryptical to explicit: we
searched (most cryptical), we searched all night, we searched all night for
the missing climbers (most explicit) (1984:206). He asks at what point or in
what context each of these kinds of utterance would be deemed to be natural,
and suggests that most authentic text occurs at some point in-between. In
natural speech, therefore, there is a happy medium between the cryptical and
the overtly explicit. This argument for authentic examples has been
particularly relevant in the field of lexicography, where the examples chosen
for each entry in his Cobuild dictionary were not designed for lexicographic
purposes but taken from authentic texts. Furthermore, Sinclair claims that
the internal grammatical relations of the sentence are not relevant when one
attempts to take account of the function or natural feel of the sentence in
context. As with Hoey’s discussion of lexical cohesion, we can see how
Sinclair’s approach moves our attention away from words in a sentence-
based grammar to items with a definite textual function.
100
Christopher Gledhill (2000). Collocations in Science Writing.
In the preceding sections, I have set out Halliday and Sinclair’s perspectives
on discourse analysis and corpus linguistics. Halliday establishes the notion
of register as probable expression, and emphasises the changing role of
linguistic features as they are used in different rhetorical contexts. In
addition, we have seen that Halliday and Sinclair’s view of the lexico-
grammar prioritises the role of grammatical collocation and grammatical
items, and my corpus analysis below therefore concentrates on the
phraseology of these items and their distribution within the corpus. The
following sections discuss the main steps involved in the corpus analysis and
attempt to implement the ‘statistical / textual’ analysis of the corpus as a first
stage in the phraseological analysis of the research article genre.
It is now necessary to set out the principles underlying my choice of texts for
the Pharmaceutical Sciences Corpus (PSC). In brief, the PSC contains:
purposes. In the first instance, the rhetorical aims of the writers are known
and can be prioritised in the analysis: this is not an anonymous collection of
texts. In addition, we have seen that while there are many studies of
phraseology and lexico-grammar in the general language, few specialist
varieties have benefited from a large-scale corpus analysis of this kind. The
corpus does not represent the register of science writing, but instead focuses
on one genre (the research article) dealing with one very specific discourse
(cancer research). The usual problem of representativeness is therefore
minimised, although not entirely eliminated.
We have seen above that, historically speaking, corpus projects have
tended to opt to represent an entire register or language variety. These
projects have often found it difficult to delimit boundaries for their
constituent texts. For example, Renouf (1987b) states that the texts used in
the Cobuild corpus range from very broad registers (non-fiction, procedures,
argument-positional texts and narrative) to very specific genres (surveys, the
NATO-corpus, the Sizewell enquiry corpus). Since such a disparate collection
of texts is not clearly defined, Sinclair (1993), Atkins, Clear and Ostler
(1992), Ahmad et al. (1991) and others have argued for a more systematic
approach to text types in corpus linguistics. Sinclair (1993c:6-7) proposes
four principles of corpus design which I adopt in the following sections:
As stated earlier, the research article – despite its variety of forms - is seen as
a privileged statement of public research and is thus a major object of enquiry
in linguistics. Other texts, such as grant proposals and internal documents
mentioned in my survey can be ruled out of the corpus because they form
part of the non-public world of Auger’s (1989) ‘grey literature’. Instead of
exact representation of genres in the discourse community therefore, a
rhetorical overview of the department should emerge from a mixture of
authors’ own research articles. These texts are considered to be central to the
researchers’ work, and appear in the journals which the researchers regularly
use for ‘indexical’ purposes in the lab and for general research reading.
102
Christopher Gledhill (2000). Collocations in Science Writing.
One cause of imbalance in this and perhaps many other corpora lies in the
range of potential criteria for the selection of texts as can be seen below
(from Sinclair 1993c: 6-7):
Medium-oriented choice:
1-Author Texts selected from informants’ own publications.
2-Access Texts chosen on the basis of free access, machine-
readability, etc.
Research-oriented choice:
3-Journal Texts from the same journals as informants’ papers.
4-Prestige Texts from recognised or prestige journals.
Topic-oriented choice:
5-Sample Texts from a wide sample of journals which cover the area
generally.
6-Centrality Texts or journals considered essential by informants.
7-Field Texts covering one research activity or concern only,
perhaps on the basis of bibliography or keywords.
8-Coverage Texts chosen at the level of overview or specialisation.
A combination of these criteria were used to select the texts for the PSC,
although some criteria account for more research articles in the corpus than
others (especially author, prestige and centrality but also access: see below).
Such variables cannot be made entirely distinct. As we saw in the survey of
the Pharmaceutical Sciences Department, the fourteen researchers had
published in their respective fields, and some of their articles provided a
substantial basis for the corpus as a sample of their output. However, their
contributions alone would result in a very heterogeneous body of texts, not
only in terms of different sub-fields as mentioned above, but in the degree of
coverage of the field. For example, one researcher donated an introductory
paper taking a long-term view of his work, in a journal which would have
had a wide readership: Trends in Pharmaceutical Sciences (TPS); whereas
another donated an article in the specialised Tetrahedron Letters (TL) which
was an incomplete part of a series of communications on a specialised drug.
Clearly, the readership of such a paper would be highly limited.
In an attempt to collect a representative spread of research articles, one
might calibrate the papers by criteria such as ‘field’, ‘centrality’ as suggested
above, or by classifying journals by ‘coverage of subject’ (general or
103
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The compilation of the PSC involved 150 research articles from a selection of
22 journals. A full list of these articles and the source journal are set out in
Appendix 2. A target of 500 000 words was set as the initial corpus size. In
order to reach this target after the initial collection of papers from the authors
in the survey (which gave 46 papers, criteria 1 and 2, below), a further 104
random papers were selected according to prestige and accessibility (criteria
3 and 4, below). The number of articles collected from each journal was
largely determined by how many papers could be copied a factor limited by
copyright restrictions (usually one paper from each issue was permitted for
104
Christopher Gledhill (2000). Collocations in Science Writing.
research purposes). But equally crucial were the length of the article and
quality of paper for scanning. The following conditions of inclusion in the
corpus emerged:
4- Accessibility: The journals FAT, JPP and CAR were available on Medline
and could
be immediately downloaded (abbreviations refer to journal titles listed in
Appendix 2). Article AC was submitted by a researcher from Birmingham
University. This gave 24 articles.
It was decided that the PSC would be split into several subcorpora
(pharmacology and cancer – the main division within the pharmaceutical
sciences department) but also into sections including Titles and Abstracts (as
subgenres in the research article) and Introduction, Methods, Results and
Discussion subsections (TAIMRD). Although the original 150 Titles and
105
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Abstracts of the PSC are compared directly with other rhetorical sections, an
additional subcorpus was deemed to be necessary in order to obtain more
results. This was derived from the electronic index Medline. The PSC-
Medline subcorpus consists of the first 572 abstracts (58 332 running words)
selected by the keyword ‘cancer’ in December 1993. The subcorpus also
includes a separate text of the 572 corresponding Titles (7 626 tokens) for
comparison with the Abstracts. The Abstracts are all author-abstracts, from a
very wide variety of English-language journals and relate to cancer either
from within the Title or Abstract or from the list of keywords included as
Medline data (the keywords are discarded for this study). The Medline corpus
thus has the advantage of topical specificity as well as being a homogenous
source of scientific texts. In the data analysis section, I compare the PSC
titles subcorpus with the PSC as a whole to give a picture of the salient
lexical items which are typical of titles with the PSC. These results can then
be analysed using the Medline corpus, since the PSC titles corpus alone is not
large enough to reveal interesting concordance data.
A number of scanning mistakes due to small print account for certain
anomalies of word counts in my data. In many cases, this meant that some
experimental sections had to be discarded as they often have smaller print
than the rest of the article. The texts that accompany tables were also
eliminated unless they had a considerable amount of argumentation, in which
case they were considered to be valuable parts of the rhetorical section in
which they were situated and added to the end of that section. Once post-
edited, all the texts were converted to text files for use on a PC mounted
UNIX system for frequency tests and then converted to text files for analysis
by a PC wordlist and concordance package (detailed below).
The PSC thus consists of 150 research articles, consisting on average of 7
sections each. Using Roe’s word analysis programs (1993b:10) a UNIX word
frequency count calculates the total word count to be 515 073 running words
(tokens) (Roe takes a word to consist of any string of symbols bound by two
spaces, excluding figures). However, this number of words is probably too
large (some chemical symbols, Greek letters and mis-scans are also identified
by this procedure). A second count by the Wordlist program (Scott 1993)
gave 499 105 words, of which 24 253 were different words (types). The PSC
was then split into sections (including Abstracts) and counted using the
UNIX wordcount (percentages have been adjusted to take account of
overlapping sections such as MR and RD sections):
106
Christopher Gledhill (2000). Collocations in Science Writing.
[JNCI, CCP, CL, FAT, JGM and JPP are not ranked within the first 600]
In terms of relating the PSC with its discourse community, the PSC therefore
includes many high impact journals, and has quite a specialised coverage
with the exception of such ‘introductory’ articles as TPS. It is surprising that
CCP (Cancer Chemotherapy and Pharmacology) is not a ‘very high’ prestige
journal : it was mentioned by researchers from both sides of the department
as a key link between them, as the title of the journal suggests.
Having compiled the PSC, the next stage involves a topical overview of
the specialisms covered in each research article. Two researchers (one from
each main division) helped to classify and gloss all the research articles in the
PSC according to the following research categories:
The corpus emerges with a large number of papers on the biology of cancer
(55% of the PSC), covering a range of probably the most important cancer
specialisms, from descriptions of the problem to testing biochemical
solutions to the problem (chemotherapy and immunohistochemistry), the
latter forming the larger part of the cancer research division. The minority
part of the corpus, pharmaceutical sciences (42%) is more diverse, covering
more specialisms than is perhaps suggested by the term ‘structural
chemistry’. As can be seen in Appendix A some journals are topic-specific
being mostly pharmaceutical and low impact (BJP, CCP, FAT, JCPT,
JOACS, JOC, JPP, PAH) while others have a range of specialisms (BMJ,
BJC, CAR, CL, CR, IJC, JGM) and tend to be high impact cancer research /
microbiology journals. The British Journal of Medicine was one of the most
favoured journals, (more than five mentions). Unfortunately, no examples of
BMJ papers on cancer were available, so five random papers were included
as examples of the genre.
Knowing that your corpus is unbalanced is what counts. (Atkins et al. 1992:14)
such analysis would be of benefit to the genre analysis of the research article,
the rhetorical sub-section of the article remains the main focus of analysis in
this book and should serve as a model for future analysis of other dimensions.
The procedure used to prepare and compile the PSC is similar to that used in
the compilation of the Cobuild dictionary (as set out by Krishnamurthy 1987,
Clear 1987 and Sinclair 1991) and has been broken down into a series of
computational steps by Roe (1993a:10-13) on a UNIX-based system called
the ASTEC suite and later developed for the WINDOWS environment as the
Aston Text Analyser (ATA). Burnard (1992:21) describes UNIX in terms of
libraries of routines used for common procedures that can be integrated into a
common environment. While this makes the ASTEC analysis extremely
flexible, commercially available programs emphasise the presentation of data
which is an important consideration in concordance analysis. Further steps in
the analysis as well as comparison of the rhetorical sections were thus carried
out at a later stage by an PC-based collocation program (Microconcord:
Johns and Scott 1993) and the wordlist compiler (Wordlist: Scott 1993). The
differences in definitions of what is an acceptable and unacceptable ‘word’ in
these programs, and textual changes of format in converting the PSC for
these systems mean that consequent differences in word frequency lists must
be taken into account.
111
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
words in the PSC with the 17 million word Cobuild corpus (these figures
differ slightly from the Wordlist generated list in Appendix 1). This is
calculated by the ASTEC program by simply comparing two frequency lists
as follows:
Table 3: The Astec top ten lexical items in the PSC and Cobuild corpora.
The ASTEC comparison reveals clear differences between the specialist and
the general corpora, especially in the sharp increase in the proportion of
many prepositions in the PSC (this increase can be more clearly seen in the
first 100 words of the PSC in Appendix 1). It is also notable that the
conjunction / pronoun that at rank 7 in the general language corpus drops to
rank 12 in the PSC (with 3 359 occurrences) and the pronoun it at rank 8 in
Cobuild drops down to rank 41 in PSC (with 1 006 occurrences).
As part of ASTEC, the ‘COMMON’ program produced a list in
descending order of relative frequency of each item in the PSC and a figure
indicating the relative frequency in the Cobuild list. A clear pattern emerges
from this analysis: clumps of words are very significantly associated with the
PSC in the mid-range level of frequency as one would expect (between,
human, table, using, results, both, study, shown, protein, observed, DNA,
data are all at 0.4% or more compared to their occurrence in Cobuild: 0.14%
or less). Other higher frequency words have a slightly higher relative
frequency in the PSC: of, and, in, was, with, for, were, by, cells, at, from, or,
et al., these, after, also, mice, activity (all at 0.7% frequency or more in the
PSC). Conversely, several grammatical items have a significantly higher
percentage frequency in Cobuild than in the PSC: the, a, to, that, is, as, on,
this, are, be, not, which, an, have, it, all, has, but, other.
112
Christopher Gledhill (2000). Collocations in Science Writing.
113
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
PSC
Items at the top of the word list are relatively more frequent than those near
the bottom. This represents the first page of several, so all of these words are
particularly ‘salient’ or typical of Abstracts. Near the bottom of the list in
Appendix 4, it can be seen that immortalized is the 32nd most Abstract-
salient word (by virtue of its observed frequency in the Abstract, i.e. 13
tokens). This result is divided by the observed frequency of the word in the
PSC (69 tokens). Its occurrence is not judged by the program to be significant
(the chi-square is calculated as 17.9 but a p score is not shown). In fact, from
the Wordlist tables it can be seen that there is a statistical cut-off point in
terms of items that are too ‘infrequent’ compared to items from the whole
corpus. For Abstracts the cut-off point is 90. This means that while items
with fewer than 90 occurrences in the PSC may be very frequent in Abstracts
(i.e. ‘salient’), they are not given a p-score.
On the other hand, but is the 31st most abstract-salient word, the first
grammatical item on the list and has a chi-square score of 18.1, which at 1
degree of difference (Butler 1985a:176) places it even below the 0.1% level.
This is considered to be ‘highly significant’ (5% or less is regarded as
‘significant’) and those items with a p = 0.000 score in the lists are all
considered statistically very highly significant. Wordlist signals words that
are important to the corpus as a whole by showing their percentage if it is
greater than 0.1% (in the case of but 0.2%). As a statistically salient word as
well as a grammatical item, but therefore merits out attention. This word is
114
Christopher Gledhill (2000). Collocations in Science Writing.
115
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Some initial results are worth mentioning at this point. The following
grammatical items were identified by Wordlist as salient words in the
different parts of the corpus (I indicate by code the original subcorpus of each
item. Some items, like ‘both’ or ‘this’ are listed by their most frequent word
class as observed in the corpus):
Auxiliary / Modal verbs (11): was (A, M), did (A, R). been (I), has (I),
have (I, D), is (I, D), can (I), were (M), had
(R), be (D), may (D).
Prepositions (11): of (T, A, I), for (T, M), on (T), in (T, A,
R, D), to (I), at (M), from (M), after (M, R)
Determiners (8): these (A), such (I), each (M), no (R), the
(R), all (R), our (D), this (D)
Conjunctions (5): and (T, M), but (A), that (A, D), both (A),
when (R)
Pronouns (4): there (A, R), who (A), it (I), we (I, D),
Grammatical Adverbs (2): then (M), not (R, D)
The analysis covers 38 items in total, and certain items are salient in a
number of different sections of the research article. As mentioned above, this
allows for an analysis of phraseological distribution across the corpus: the
behaviour of in for example, can be analysed in Titles, Abstracts and Results
and Discussion sections. The salience of in in these sections can be regarded
as a result of its relative infrequency of use elsewhere (in Methods and
Introductions). Below I set out the analysis in two different ways: by
grammatical item (thus examining the changing phraseology of one item
116
Christopher Gledhill (2000). Collocations in Science Writing.
investigation.
Underlined item a highly frequent collocate of the node word.
{Item in curly brackets} a cluster of semantically related lexical
items.
<Items in angled brackets> a fixed sequence of collocates.
We can see from the example concordances that the fixed sequence <in the
management of> is not just a phrase in itself but is related to a broader
phraseology. This is because it collocates with a consistent set of topical
patterns with few deviations from the pattern. For example, the expression is
introduced by a general statement of research, in particular the collocations
current trends in, diagnosis and... or a less fixed and more varied semantic
set (clinical histochemical approaches: {Treicoplanin in, irradiation in,
resistance in...}). However, the word management on its own has a different
phraseology. It allows the researcher to signal the general methodology to be
undertaken in the rest of the article: {anesthetic, neurosurgical,
psychological, interdisciplinary}. Similar modification of the type of cancer
is also involved to the right of the expression and these could be said to be
typical processes of inclusion of methodology and precision of problem in
the noun phrases of titles.
The advantage of this kind of visual analysis is that it reveals patterns that
may not easily be revealed by automatically derived collocation counts.
Having identified a pattern such as management of, it can be seen that the
expression is semantically modified by a topic that is only intuitively
accessible: a statement of the disease or its symptoms (Y cancer, Y patients).
The visual cues are not used in all cases, but it can be immediately gathered
from the above example that the term management involves two consistent
phraseologies.
In order to signal where a reading of the concordance has revealed a large
scale lexical pattern, a semantic covering term is expressed in brackets and in
small capitals {DISEASE Y}. In the phraseological analysis section of the book I
have identified four major semantic categories: RESEARCH, CLINICAL, EMPIRICAL
and BIOCHEMICAL, with certain further subcategories. I have also used the
symbol X to demonstrate the many types of treatment-related names of
compounds (often with positive connotations), and Y for many disease-related
items. Finally, in order to make the optimum use of examples, a maximum of
five concordance lines is usually shown for each pattern.
This is useful for determining distribution according to position, but does not
give an immediate pattern that can be followed up by closer analysis by
119
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
This shows that patterns appear to be established even across such a wide
span (of + breast, of + human). The program also allows for a distribution
analysis not across several texts but within a text, giving a ‘bar code’ of the
co-occurrence of up to three items. In his own collocation program, Clear
(1993) takes a window of 5 words i.e. a span of 2 x 2 (two words to the left
of a node, the node itself, two words to the right of a node) and does not take
into account whether items are left or right collocates: they are all calculated
together. Clear uses two principles of information retrieval from corpora.
Precision is the measure of how successfully the system retrieves interesting
data. Recall is a measure of how much interesting data are actually found and
how much are lost. Phillips (1985) and Smadja (1993a) aim at a total
collocational description of a corpus, and thus recall is an important concept
for them. For the purposes of this book, however, precision is a sufficient
measure of the significance of what Clear terms mutual information.
120
Christopher Gledhill (2000). Collocations in Science Writing.
The MI score also reveals different patterns: it is only until the last half of the
MI table for of (see the Analysis section 11.1 and Appendix C for full details)
that right-hand collocates appear, suggesting that the use of of is largely
motivated by a limited set of left-hand research-activity or empirically
oriented words like presentation, department, majority, measurement which
are then qualified by a more diverse group of disease-related items (disease
Y, cancer X, patient...). This example illustrates the fact that frequency and
significance only tell half the story: there may be collocational patterns to be
discerned in the less statistically salient parts of the table.
For a number of reasons the MI score was not used in the main analysis of
this book. To begin with, I examined fifty collocations of of to obtain the
above table. If ten items from each rhetorical section were analysed, I would
have to calculate a large number of collocates for each of the 38 items: that
means 1900 (38 x 50) two-word combinations. Since I am interested in
121
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
122
Christopher Gledhill (2000). Collocations in Science Writing.
The context and specificity of the research article genre have been explored
in the introductory sections of this book. A theory of text has been proposed
in which collocations and phraseology are seen as central to the discourse of
science. In order to examine the research article genre more systematically,
the construction of the Pharmaceutical Sciences Corpus (PSC) was described
in section III. In this section, I examine the specific phraseological and
collocational properties of the corpus with a view to exploring the typical
style of scientific texts.
The description throughout the following sections attempts to answer a
basic hypothesis about the research article: collocational patterns are assumed
to correspond to rhetorical functions, and are also considered to be consistent
within different sections of the cancer research article (the so-called
rhetorical sections: Title, Abstract, Introduction, Methods, Results and
Discussion). In order to examine this specific claim, I set out firstly a
separate analysis of those grammatical items of statistical significance in
different research article sections (at times this extends to four sections per
item). On the basis of the remaining grammatical items (those which are only
salient in one specific section), I then examine the particular phraseology of
each rhetorical section in turn.
123
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
124
Christopher Gledhill (2000). Collocations in Science Writing.
It can be seen that some sections are more ‘Cobuild-like’ than others.
Paradoxically, 35 of the 55 words set out in the table above are in fact
relatively more frequent in the Cobuild 1987 corpus than in the PSC (as
detailed in section 2.6 above). Patterns attributed to Cobuild items may
represent a ‘general language’ quality of that rhetorical section, although as
we demonstrate below, their use in fact changes significantly in the corpus.
Perhaps not surprisingly however, Introduction and Discussion sections have
a more ‘general language’ vocabulary, while the salient items in Titles and
Abstracts seem to be further away from general usage. Salient words that are
more frequent in the corpus (in Titles and Abstracts) presumably have
phraseological patterns which move the corpus as a whole away from the
general language. This sense of distance is of course a convenient metaphor:
the real difference lies in the high density of use of such items as prepositions
in these sections. Such features of language are noted in the analyses set out
below. In summary, when grammatical items are analysed in the corpus, we
are characterising a particularity of the rhetorical section that sets it apart
from other sections, not necessarily one that sets the corpus apart from
Cobuild or the general language. Some words, such as ‘between’ have a
higher rank in the PSC but are relatively stable across the corpus: they are
therefore not covered this kind of analysis.
In the following sections, I have set out grammatical items which are
salient in several sections in alphabetical order in order to immediately
compare the behaviour of an item from one section to the next (such as is
which is salient in Introduction and Discussion sections). Secondly, certain
items are very highly significant for that rhetorical section only, and can be
125
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Each one of these items is analysed as a node word below, thus has and have
are analysed separately (it is worth noting here that each word form has a
sufficiently different set of collocates to justify this separation, a point
defended in our discussion of the lexico-grammar, above). These salient
words are analysed below with the data that motivate their selection (these
figures can also be seen in the Appendices). I have attempted to limit the
number of examples of collocation to five, although there is some variation in
this. With long examples I have sometimes had to omit all other elements
except the heads of complex nominals or omit modifying words which did
126
Christopher Gledhill (2000). Collocations in Science Writing.
not fit into the span (for example, a long set of technical pre-modifiers placed
before a significant collocate of the node word).
One specific finding which emerges from the corpus needs to be signalled
here before I set out the data in full. There is a strong tendency for
collocations to cluster around lexical items that share similar semantic
characteristics. Four process types appear to predominate in the corpus data.
They are listed here from relative proximity to the scientists (research
processes) to relative distance (biochemical processes):
I find below that so called ‘regular’ phraseological units typically restrict the
semantic components of the phrase to one of these process types (or even a
subtype). In other words, one of the defining characteristics of each process
type is that they occur in complementary distribution to each other. This is in
effect the principle behind the original Cobuild dictionary: senses are defined
by collocational or even grammatical behaviour. I use this classification to
describe the global characteristics of a phrase but emphasise here that these
categories emerged initially from the corpus analysis and need to be
considered in their phraseological environment.
It should also be noted here that I make reference to clause structure often
in terms of Hallidayan grammar (1985), including terms such as relational
(copular) clauses and material (transitive) clauses, adjuncts (sentence
modifiers) etc. The scientific processes: biochemical, clinical, empirical or
research also closely relate to Halliday’s transitivity processes (material,
relational, verbal, mental, behavioural...). For example, most research
processes correspond semantically (if not phraseologically) with Halliday’s
mental or verbal processes.
127
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
In this section I set out alphabetically those grammatical items which are
salient in more than one research article section. Their relative rank of
salience in relation to the Wordlist comparison is included in brackets.
We have seen above that in a general lexical comparison between the PSC
and the Cobuild corpus, prepositions emerge as the most significantly
frequent items in science writing, whereas auxiliaries and modal verbs,
conjunctions, pronouns and determiners appear to be less prevalent. This
suggests that the research article genre differs from the general language at a
basic grammatical level in nominal groups (in which prepositions play a key
role), phrasal / prepositional verb usage and the use of sentence adjuncts. The
phraseology of ‘after’ is important in Methods sections in the expression of
time. The preposition does not however head a time-related PP (preposition
phrase), but instead introduces a clinical process performed before the action
indicated by the verb. The methodological procedure is thus presented in
reverse order in the sentence. Some typical examples include:
128
Christopher Gledhill (2000). Collocations in Science Writing.
130
Christopher Gledhill (2000). Collocations in Science Writing.
It is notable that these Titles (derived from the Medline subcorpus) involve
non-finite and finite clauses, which are as we have noted above a novel
characteristic of Titles in developmental biology. Besides relating previously
unrelated causes of disease, relationships are also established between
scientific disciplines:
131
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
132
Christopher Gledhill (2000). Collocations in Science Writing.
As with the items ‘then’ and ‘each’ which we see below, the statistical
significance of ‘and’ in Methods sections is due to the general tendency to
sequence stages of clinical and empirical analysis. And is used in fixed
expressions which can be seen as routine collocations, as in the following
recurrent examples: cut and stained, cut and mounted, cut and plated,
cultured and plated , sected and stained with...treated and counterstained
with removed and routinely stained with...developed and stained... However,
chronological sequence is not always respected in the phraseology, and
clinical processes such as collected seem to be expressed as a redundant
intensifier:
We have seen in the basic statistical count that verb forms, especially
auxiliary and modal forms such as did and have are in fact somewhat less
frequent in the PSC in comparison with Cobuild. The salience of did in
Abstracts and Results is therefore significant, because we are dealing
therefore with a phraseology that is very specific to these two sections. The
modal verb did is only used in two ways in Abstracts: to introduce the
negative not, and in elliptical expressions such as <as did the> + NP...
Perhaps surprisingly, the presentation of negative results is a key function in
Abstracts. Such findings are included partly to deflect possible criticism but
also because empirical negative results are just as newsworthy in the
discussion of null-hypotheses.
133
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The subjects of did reflect the typical sentence themes of the Abstract:
processes of tumour growth (or stopping the growth) (propagation, growth,
expression, inhibition) and pharmaceutical molecules that are involved in
helping or hindering these processes (cholesterol, methyl chloride,
doxorubicin, heparin). Verbs that are negated tend to be empirical
measurement or reporting verbs prevalent after ‘but’ (<but did not>...
increase, decrease, show that). Typical subjects of these clauses are
quantitative empirical processes (efficiency, correlation, the data, sample
response). This pattern differs slightly for did in Results sections, where
negative findings tend to relate to empirical processes of causality rather than
quantification. The reason for the difference in expression may be that
Results sections tend to justify and explain negative findings (such as lack of
causality, effect or evidence) while Abstracts state data-related results,
leaving inferences about ‘higher’ empirical or research implications to the
main text.
I discuss the role of ‘did’ in Results sections in the next section (under not).
However, did is frequently used in two other important syntactic
environments. The first after but is as an intensifier of a biochemical process
or empirical finding (notice that in Abstracts expressions of this type involve
the negative not):
The second use is elliptical after the conjunction than and an empirical or
biochemical process verb in a comparison of findings (such a discursive
expression is also not used in Abstracts):
134
Christopher Gledhill (2000). Collocations in Science Writing.
In the larger Medline control corpus of titles, two thirds of expressions of this
sort are placed in thematic position as in Bioreversible protection for the
phospho group:.... in a similar results-related pattern to the one described
under and. For is thus not widely used as an adjunct in this part of the
research article.
135
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
fifteen patients were <eligible for> entry into the present study
the control group <eligible for> the study
In order to be <eligible for> the study
two groups were <eligible for> the present study
136
Christopher Gledhill (2000). Collocations in Science Writing.
137
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
In Introductions ‘has, have’ are most often used with specific expressions of
past research reporting ‘have led to debate / has attracted attention’. In
Discussions, more specific research processes are more emphasised.
Although most research is expressed actively in terms of we (see ‘we’
below), passivised reports of research processes are the next most frequent
use:
‘In’ is salient in four rhetorical sections in the corpus and presents us with the
opportunity to test whether phraseology is consistent throughout the corpus.
As noted above, prepositions appear to account for many of the major
differences in vocabulary and style between the PSC and the general
language (at least in terms of a comparison with Cobuild). The highly
frequent prepositions in and of in the corpus are thus key to an understanding
the fundamental phraseology of the genre. In Titles in functions as a
prepositional phrase functioning as either modifier or complement in
complex nominals (we have seen one use under and above). There are two
distinct semantic patterns:
1) In modifier expressions, the left collocate is a biochemical process and
the right collocate a clinical or biochemical entity. Where the head of the left
phrase is not the immediate collocate, the head item is usually an empirical or
clinical process. It is noticeable that for each left-collocate, a more or less
139
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
limited pattern emerges to the left again of this item (for example, gene
expression). Head items are noted in italics:
140
Christopher Gledhill (2000). Collocations in Science Writing.
The only exception to this pattern involves the modifier (of X) in patients
with:
141
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The spatial metaphor of in in Titles is not prevalent in the rest of the article.
‘In’ in Abstracts is used in three semantic patterns (the most frequent first).
certain specific semantic domains in the general language, then it may be that
determiners are also constrained by prepositions in the ESP.
In the first pattern, the most typical use of ‘in’ is to express data direction
(increase in, increases in: 61 occurrences) after either a semi-technical
empirical verb such as ‘yields, expressed, produced’: {empirical process}
a/an {specific data shape} increase in {measurable, often disease-related
empirical item}:
143
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Another frequent expression in the first pattern involves the empirical process
‘resulted in’ in which the direction of the data is emphasised by some
intensifier: {clinical process} resulted in {intensifier} {empirical measure /
biochemical process}. Unlike the yielded phraseology, this expression
generally allows for very explicit modality (if no explicit evaluation is
expressed, then a determiner or similar expression to the first pattern is used):
The writer may also choose to express positive results as a relation (is, be,
were) with higher. Such a phraseology is oriented towards an evaluation of
change in biochemical data (in animals or cells): {empirical measure} is
{empirical evaluation} higher in {animate material}:
144
Christopher Gledhill (2000). Collocations in Science Writing.
This is related to the second, spatial use of ‘in’ in Results sections, in which
the preposition introduces a biochemical. In some cases, as in the last
examples, the biochemical entity is a data set itself. For example, ‘in’ is used
in the basic comparison of results where the data sets are expressed as
subjects or patients:
Mutations in turn are typically detected in genes (the p53 gene, exon 6 of
p53, k-ras exons, H-ras gene). An alternative wording is to premodify the
mutation with a gene classifier, thus enabling it to be detected in tumours
[variation in spelling here indicates the use of British spelling in such
journals as BMJ, BJ, etc.]:
The spatial use of ‘in’ also reveals terminological consistency within right-
hand collocates. For example, only nude mice are used for skin grafts:
while frameworks with other common lexical items also reveal the
terminological properties of related words. For example, tumours are
associated with a variety of physiological locations (from genes and cells to
146
Christopher Gledhill (2000). Collocations in Science Writing.
Interestingly, while the Latin ‘in vivo’ is often used as a sentence adjunct, its
complementary expression ‘in vitro’ tends to be used as a premodifier in
noun groups, and so we get the following expressions (in such usage in vitro
functions as a single lexical item - as such in vitro is not as clear-cut a case
of in as in vivo):
147
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The expression ‘as seen in’ is also involved in a longer fixed expression
observed in two structural chemistry texts:
Finally, the use of ‘in’ in lexical phrases in Results is more varied than for the
other prepositions we observe in the corpus, and we note here briefly such
expressions as in addition, in all, in comparison, in contrast. This suggests
148
Christopher Gledhill (2000). Collocations in Science Writing.
To summarise the uses of ‘in’ so far: in Titles, expressions after ‘in’ modify
some biochemical item or process (metastases in, expression in, growth in) or
complement an empirical item (role of... in, change in). Such patterning
constitutes important evidence for grammatical and semantic correspondence,
in other words a lexico-grammatical system. In Abstracts, we noted mostly
nominal reformulations of quantitative results and a number of expressions
involving empirical quantification (increase in, decrease in, reduction in,
difference in). In Results sections the use of in extends to more complex
forms of quantification, a spatial use with biochemical entities and the use of
lexical phrases and cross references to other parts of the research article. In
Discussion sections the tendency is again to express empirical shapes and
directions of data (the most frequent pattern) and causal relations (the second
pattern). A third pattern involves research processes, and a fourth comprises
large numbers of discourse markers. Such increasing variation in the
phraseology of a single grammatical item supports a general observation that
the final sections of the research article become increasingly stylistically
diverse.
The role of the Discussion section also returns to explanation, in a similar
mode to that of Introduction sections. Thus the fixed expression <play a role
in> becomes a significant phrase in Discussions where some degree of
explicit evaluation is often present:
while ‘in_ with’ signals that findings have or have not been replicated
elsewhere:
150
Christopher Gledhill (2000). Collocations in Science Writing.
151
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
When ‘is’ is used in equative relational clauses (i.e. where the verb simply
identifies one token as another), the element of evaluation is transferred to a
notion of ‘measure’ or ‘causality’ as in the fixed expressions ‘is one of the
152
Christopher Gledhill (2000). Collocations in Science Writing.
most...is one of the main causes of’. In attributive clauses, on the other hand,
disease- and treatment- related items have stereotypical patterns Only disease
related items, for example can be ‘associated with’:
The reason for these patterns stems fairly straightforwardly from the research
activity. Diseases are being associated with potential causes, while treatments
are being compared and measured. So phraseological patterns correlate
according to some convention with the common semantic categories
naturally involved in the research. This is complicated however by the
varying phraseologies of different word forms. I note later that these patterns
do not correspond with the use of ‘was’ (in Methods and Results sections).
Is also reveals a limited set of items which can introduce nominal
complement (projecting) clauses (known as ‘fact clauses’, as in the fact is
that: Halliday 1985:244). Fact clauses in the corpus are almost always
empirical and premodified by some degree of evaluation. The following list
gives all the possibilities:
153
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
However, there is one important exception to the evaluative pattern for ‘is’.
In the Introduction corpus, when the researchers are saying that something is
not something else, explicit evaluation becomes more implicit:
Although its sensitivity to is not yet proven, mouse stamen have been
ATP examined...
Although cholesterol is not fully responsible for the formation of
liposomes, it is often used in pharmaceutical
liposome formulation
Although the regulation of is not fully understood [it and others] appear to
MyoD1 perform critical functions
Despite massive lipid is not elevated in the cachectic state...
mobilisation, the plasma
level of these metabolites
While p52 expression is not detected, it is unlikely that overexpression is
related to LMF factors outside the cell.
154
Christopher Gledhill (2000). Collocations in Science Writing.
In Discussion sections, as with other grammatical items the patterns are more
distributed across a range of expressions, have a greater emphasis on research
processes and evaluation and have in some cases different lexical
components:
It is interesting that
It is apparent that
It is clear that
It is most likely that
155
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
156
Christopher Gledhill (2000). Collocations in Science Writing.
However, these patterns contrast with ‘is related to’ which has as subject an
empirical observation which is related to more specifically biochemically
oriented items. Unlike empirical expressions in Abstracts and Results
sections, and as noted above in the phraseology of in, these phrases deal more
with qualitative explanation than with quantitative measurement. The
following pattern is shared by less frequent expressions (‘is present in’, and
‘is responsible for’):
can see that while verbs like ‘show’ are used in affirmative statements to
describe ‘increases in’ the data, or changes of the data shape (as described
under ‘in’ above) negative expressions with ‘show’ are used mostly to
explain the relevance of data or the idea that a specific biochemical
phenomenon did not take place. The implication is that in Results sections,
the researchers are making a statement about causality in relation to their
‘failed’ or negative hypotheses but use positive statements for reporting
changes in the data shape. This is contrary to the pattern in Abstracts, where
negative polarity is reserved for quantitative statements (usually related to
adversative expressions signalled by but).
The most frequent right-collocate of not is ‘show’: {biochemical entity,
usually living cells} did not show {biochemical process, usually treatment
related}:
158
Christopher Gledhill (2000). Collocations in Science Writing.
Such biochemical process verbs have very much the same distribution as
nominalisations (c.f. induction of tumor necrosis factor). But there are also
cases in which biochemical processes are explained rather than simply
observed, in which case the writers use less technical verbs such as ‘cause’
and ‘affect’. For example, ‘affect’ is very specifically limited to the chemical
process of (cell) binding:
Such expressions can be partly seen as brief claims or explanations, but can
equally be seen as fixed delexical phrases (such as take a bath, make one’s
fortune). Apart from biochemical or semi-technical explanations, the negative
in the Results section is also used to signal what the researchers didn’t find.
With ‘was / were’, we see below that the passive in Methods sections tends to
be used with technical biochemical process verbs. In Results, the passive
reverts to research process verbs and, at least in negative voice, is usually
modal: {biochemical process} could not be {research process}:
160
Christopher Gledhill (2000). Collocations in Science Writing.
The negative also plays a key role in signalling gaps in existing research. The
expression, not known is part of the ‘end-game’ of the Discussion section
which allows for further applied research:
Another important signal for future research possibilities is ‘not clear’ where
negative findings are reformulated by higher empirical or research processes
(in italics):
It is therefore not clear why cells are not able to [use] serum plasmogen.
‘Of’ eclipses ‘the’ in an Astec comparison with the Cobuild corpus, and is a
salient word in Titles, Abstract and Introduction sections, thus marking its
phraseology as particularly typical of technical science writing. While the use
of of described below is somewhat complex, it is worth noting that the four or
five major uses of the preposition in the PSC can be contrasted with a very
broad set of uses in the general language: Cobuild, for example, lists 19 non-
idiomatic uses for ‘of’.
In Titles, as in the rest of the corpus, ‘of’ is fundamental to the
construction of complex nominals, in particular expressions of empirical
relations and quantification as well as compound nominal terminology. In
Titles there are no examples of quantification (a number of), or support (a
group of). Instead, ‘of’’s left-collocates are nominalisations of research or
empirical processes (effect/s of x30, treatment of x24, study of x16,
evaluation of x15) while its right-collocates are nouns synonymous with the
illness or the patient (cancer x69, human x26, breast x25, patients x18,
tumor x15, prostate x13). The majority of the left-collocates of ‘of’ can be
divided into four groups of patterns. Research processes are the most frequent
left-collocates of of in Titles, and typical expressions from the Medline
control corpus include nominal research process titles premodified by a topic-
specific specifier and post-modified by illness-related items most often
involving cancer patients. The expression ‘study of’ is typical:
162
Christopher Gledhill (2000). Collocations in Science Writing.
Clinical process phrases such as ‘treatment of’ and ‘management of’ share a
similar phraseology to ‘study of’:
Of empirical processes, the phrase ‘effect/s of’ is the most frequent in the
subcorpus and has the following phraseology: {treatment-related item X}
effect/s of {treatment X} on {illness-related item Y}:
find that there are many such ‘collocational cascades’ in the corpus. What is
interesting about them is that phrases such as ‘effects of’ appear to be implicit
in the longer chains, or are reformulated.
An idiomatic use of the phrase ‘a case of’ emerges. While the word ‘case’
on its own is involved in the longer phraseology ‘a case control study in
(Brazil / Greece / Sweden) of (subjects participating in the Nottingham
study / the blood screening programme)’, it also acts as head for 12 titles
introducing specific disease-related items which are then postmodified by a
response to the disease {treatment} or (in a minority of examples) an
explanation of its cause:
In the control corpus of Titles (as seen above), of plays a key role in nominal
groups with a typical treatment-of-disease pattern. Such a symmetrical
solution-problem pattern is expanded in Abstracts, the major difference being
that while items in the title corpus tend to predict of with no strong right-
collocates, in Abstracts there are just as many significant right-collocates,
such as human, these, was. Another difference from Titles is that Abstracts
involve the quantification or description of disease, where of introduces
semantic ‘support’ (not necessarily ‘head’): number, concentration, levels,
incidence, frequency, majority, presence ... of... cancer, tumour, oncogene,
growth, expression, patients, mice, human. A second pattern tends to
introduce either empirical or biochemical items that explain the potential
treatment of the disease (effect, role, mechanism, treatment / inhibition,
synthesis... of.. drug X, doxorubicin, compounds, [disease Y]). As the first
element becomes more necessary to the interpretation of the next item, the
phrase introduced by of in the second group can be seen as ‘focus’ rather than
support (Sinclair 1991:82-83).
The ‘treatment-of-disease’ pattern can be seen as an overriding pattern,
but within this there is considerable phraseological change. There are four
different problem-solution patterns of complex stereotypical phraseology
involving of in the Abstract: (effect, loss, number, presence). There does not
165
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
seem to be any evidence to suggest that any such middle frequency item
(often termed sub-technical items: Francis 1993) shares the same phraseology
as any other. In particular, the solution- problem / treatment- disease pattern
seen in the Title does not appear to be fixed for each item in the Abstract. For
example, presence of has a specific pattern if post-modified: the role/
presence of {drug X} in {illness Y}. Other items require more explicit
modification. Effects and effect are usually in subject position and are almost
always pre-modified by a treatment-oriented item (growth-inhibitory,
antitumour, chemopreventive, protective) or a research-observation item
indicating some problem (adverse, side-effect, toxic). On the other hand,
presence is often used in a prepositional phrase functioning as qualifier,
(preceded by in, for, on) or in a subordinate clause where there is no explicit
statement of problem or solution, and where presence of signals an illness-
related specific item where a possible link with cancer is being explored:
retrovirus, ras proto-oncogenes, maternal toxicity.
In addition, the expression use of represents one of the more stereotypical
patterns of the Abstract. It is always preceded by some degree of measure or
a methods-oriented specification of use (daily, widespread, regular,
intensive, combined, clinical, potential) and followed by a specific drug X(1)
and an expansion of the treatment and illness (with drug X(2), in the study of
illness Y, in the treatment of, in the evaluation of Y) and finally followed by
some degree of evaluation or a research process: resulted in..., should be
considered, is discouraged, is discussed.
In a different kind of distribution, the significant collocate loss appears to
have become terminologised in the fixed expression loss of heterozygocity.
Loss also appears in thematic position whereby a research statement is
phrased in the passive or placed after the term (loss of X...was found,
occurred, occurring), although there are reporting instances such as suggest
that .... which form a separate pattern. The pattern occurs more regularly with
effect/s where specific reporting items are sometimes placed as hedges:
(effect/s of X... were found, reduced, appeared to be.., as shown..., and seem
to...). Interestingly, among most of the expressions of measurement-disease
mentioned above, the reporting verb precedes the expression (shows /
confirms / indicates ...the presence of, incidence of, absence of). The final,
fourth pattern is represented by the expression number of which is not
immediately preceded or followed by a reporting discourse item. It may be
that there is a differentiated pattern of phraseology in which of has a role as
constructor of nominalisations of measurement and qualification (i.e. the first
use mentioned above), in conjunction with expressions of research reporting
and evaluation (the second use). The writer can thus choose to emphasise the
‘self evidence’ of the data by evoking phrases involving number of, or may
166
Christopher Gledhill (2000). Collocations in Science Writing.
wish to place the study in the position of sentence theme (that is: as subject or
in front of the subject in English). These patterns also suggest that choice of
expression in Titles is constrained to the extent that the writer must either use
measurement-disease phrases as a statement of research topic, or
alternatively thematicise the results and use an expression with items such as
effects.
‘Of’ in the Introduction serves to qualify empirical process nouns and to form
fixed biochemical or clinical terminology. This is the same function as in
Titles and Abstracts, the difference being that the fixed expressions and
collocations in the Introduction are expanded to longer stretches of
phraseology. In examining the very complex phraseology of of in this less
constrained environment, the assumption is that collocation operates at longer
boundaries than the phrase. The following left / right collocates demonstrate
the variety of collocation:
Right collocates >10: this, these, cells, human, compounds, drug, mice, drugs,
mice, methylene, studies, cancer, Bora, liver, cell, chloride, effects .
168
Christopher Gledhill (2000). Collocations in Science Writing.
169
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
we conclude that
we find that
170
Christopher Gledhill (2000). Collocations in Science Writing.
‘That’ is the most significant salient word in Discussion sections. The word is
listed by Wordlist as one of the least salient words of the other rhetorical
sections however, with the one interesting exception of Abstracts. In
Discussion sections, ‘that’ indicates the primary use of complement that-
clauses which function as projections of research reports and facts (Halliday
1985:244). In terms of rhetorical function, that-clauses reformulate or
evaluate results. That-clauses can be divided into four patterns in Discussion
sections, in order of frequency of occurrence:
The first three lexical left-collocates of ‘that’ are all research processes
involved in the first pattern (verb complement clauses: suggest/s that,
indicate that, show/n that), but they have very different modalities associated
with their subordinate clauses. The first example, ‘suggest/s that’, is
introduced by an empirical measurement as subject, and the verb in the
subordinate clause usually has some degree of modality or phase:
These findings indicate that a cell has become committed to the.. lineage
These results indicate that the cell has been arrested early in..
development
The present study indicates that this parameter is highly correlated with
our data indicate that LIC is less immunogenic than other tumors
171
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The second pattern we find is syntactically the same as the first, except that
the subject tends to be ‘we’ or (depending on the verb) ‘this study’ or the
names of other researchers. The first most frequent pattern of this type
‘showed that’ tends to entail more evaluation or negative results than its
present tense counterpart ‘show that’. Also unlike ‘show that’, it has ‘we’ and
‘experiments’ as possible subjects:
172
Christopher Gledhill (2000). Collocations in Science Writing.
173
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The fourth main pattern for that involves embedded noun phrase
complements, and similarly demonstrates a modality projection between the
noun and its embedded verb. One of the most frequent noun phrase
complements is ‘the fact that’. The expression takes on a very specific
rhetorical role, by first stating negative results and then by setting out an
explanation:
The fact that [[this enhancement does not implies that such oncogenes
occur in females]] were not involved
The fact that [[we cannot demonstrate this suggests that AIN causes
change ]] different effects...
The fact that [[the 150pp treated group was might be due to weakness in
not killed earlier]] the dose monitor
The fact that [[2 MCR lines did not show confirmed that these reagents
higher activity]] were highly specific
The fact that [[sequential accumulation of might be due to early
LOH was not observed ]] monitoring
The expression <might be due to>, as seen in the examples above, is also
related to the complex conjunction: <due to the fact that>. Here the writers
reformulate some anomaly and then explain it, while the new explanation
(which does not appear to be a reformulation of previous material) may
constitute a research result in itself:
The failure of the two mechanisms due to the fact that phenotypic
could be substituents reach complex levels at low
time intervals
These discrepancies were due to the fact that antibodycolumns
are rarely 100% efficient
The ineffectiveness of thiamine may be due to the fact that thiamine has
sizable groups present.
The unexpectedly high concordance is due to the fact that multiple immuno
processes are involved
We can see that the fact that appears to collocate across clause boundaries
with the expression due to in the following example (it also consistently
colligates with a negative expression): The fact that we cannot demonstrate
this degree may be due to insufficient sensitivity of our method. Here we can
174
Christopher Gledhill (2000). Collocations in Science Writing.
#1 We found that.. only anti B1 could #2 This is likely due to the fact that the
mediate specific cytolysis. difference is only one subclass.
The more frequent expression ‘due to’ reveals a regular pattern across
sentence boundaries in other parts of the discussion subcorpus (#1 negative
result or negative research process, #2 possible empirical explanation):
These examples also reveal the important reformulating role of deictic ‘this’
which is discussed later. The phraseology of The fact that differs from
alternative expressions, such as the possibility that where the embedded
clause itself contains the modalised explanation (the main clause, not shown
here, is usually an expansion of the hypothesis expressed in the embedded
complement clause):
175
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
176
Christopher Gledhill (2000). Collocations in Science Writing.
Evaluation: Reformulation
suggest that (+modal) indicate that
(empirical item) is that (+modal) confirmed that
conclude that (+evaluation) demonstrated that
showed that (+ neg. / modal) show that (+/- neg.)
(we) reported that (+modal) (we) reported that
it is possible that (+modal) (we) found that (+quantification)
the possibility that (+ modal) the observation that
the hypothesis that (+modal)
Negative evaluation:
it seems likely that (+neg.)
(adversative) it is clear that
the fact that (+ neg.)
(neg.) due to the fact that
177
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The exclusive use of the past tense is in line with other expressions which
express new results in the research article as a whole. These expressions
typically precede highly significant items within the Abstracts subcorpus
which deal with statistical direction or relation (increased, decreased,
interval, correlated). The one or two exceptions to the pattern (qualitative
empirical items) seem to highlight the preponderance of quantitative
expressions elsewhere in Abstracts:
178
Christopher Gledhill (2000). Collocations in Science Writing.
throughout the corpus that past tense or perfective aspect tend to correspond
to current claims in the research article, whereas the present tense is used to
express established fact or report past research. However, in Results sections
the pattern moves to the present tense (there is / are) and tends to be
embedded after NP or VP complement clauses. The most frequent pattern
involves projection, where the main clause is generally a research process
and introduces empirical observations with some degree of explicit
evaluation:
There is a cluster of grammatical and lexical features which coincide with the
negative ‘There appeared to be no...’ pattern:
1. Existential ‘there’.
2. Modality.
3. The use of the past tense.
We have seen in our discussion above that the simple past is the preferred
tense for presenting the research article’s present methodology and results.
Ironically, the present is used to introduce previous research. This appears to
conflict with previous research (Hanania and Akhtar 1985) and Malcolm’s
(1987) distinction (past for generalisations, present for specific data). In the
PSC we find that ‘was’ generally reports the research article’s {clinical}
methodology and non-quantitative {empirical process} results. In Abstract
sections, ‘was’ can be seen to have a completely phraseological role to is. In
the Abstract, there are two patterns for is:
1) There is... followed by a statement of evidence: no evidence, no
molecular evidence, no indication + that, for this, to suggest etc. (contrast the
180
Christopher Gledhill (2000). Collocations in Science Writing.
present tense with a negative in Abstracts, with the past tense usage in
Results).
2) Extraposed it and that-clauses: it is ...concluded, apparent, desirable,
essential, important, possible, believed, expected, likely that...followed by
explanation.
Was does not share any of these phraseological characteristics, and is instead
involved with statements of qualitative results where the subjects are either
key biochemical entities in the cell (peripherin, protein, nucleus, DNA,
glycoprotein, toxicity) or biochemical items involved with a tumour’s effect
on the metabolism (growth, weight, vasodilatation, expression). As in
Methods sections, was introduces passive participles which are often pre-
modified by a technical (biochemical) adverb:
Was / were have a relative consistent phraseology across the corpus, although
in the expression There was / there were a different phraseology emerges in
Results sections (as discussed above). The significance of was in Methods
sections stems fairly straightforwardly from the prevalence of the passive in
the past tense description of biochemical and empirical observations. Verbs
used in the passive have very fixed collocational uses. A particularly frequent
pattern emerges with ‘detection’ which tends to be either <carried out at>
{measurement item}’ or ‘accomplished + {method}’:
181
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The repetitive nature of some of the methodological details in the corpus also
reveals a number of fixed expressions (and even idiosyncratic idioms)
involving ‘was’. The following examples are common to several different
texts, although of course there is also much repetition within the same text:
182
Christopher Gledhill (2000). Collocations in Science Writing.
The plural ‘were’ tends to be used with plural biochemical entities (mice,
cells, controls etc.) ‘{biochemical entities} were {clinical process verb} by’.
Singular items on the other hand tend to have the following formulation:
‘{usually deictic} {empirical / research process} was {clinical / empirical
process verb}. Thus singular and plural forms of the verb tend to coincide
with different semantic verb classes.
These time expressions may have a role in situating a present tense verb
because the unmarked meaning of the present in articles is more usually to
183
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
185
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
#1 The cellular basis for this association #2 but we believe that comparing this in
is unknown, vivo... is meaningful.
#1 Even if methylene does not interact #2 we believe that the magnitude is not
with hepatocyte... sufficient.
#1 The reasons for the discrepancy are #2 but we believe that our technique of
not entirely clear, assessing transport... offers greater
sensitivity.
#1 The relative LI’s did not differ #2 We believe that methylene-chloride
between methylene-exposed controls. exposure did not provide a selective
growth advantage.
#1 The role of the negative phosphate #2 We believe that improved progress
backbone... is poorly characterized at can be made to enhance understanding in
present. areas such as chemical drug design.
Thus expressions introduced by We conclude that can (as the verb promises)
stand as a summary of the main empirical observations. Expressions
introduced by We believe that are not representative of the results but signal
the perceived significance of the research in the eyes of the researchers.
186
Christopher Gledhill (2000). Collocations in Science Writing.
The data presented in the previous section set out the distribution of uses of
single grammatical items as they are used in the research article. While most
of the observations signal departures from predominant usage in the general
language, certain features of language can be seen to vary relatively
systematically from one grammatical item to the next. This was seen to
particularly affect such general grammatical features such as verbal polarity,
tense and complementation, clausal extraposition and projection and complex
nominal modification. Grammatical items can also be seen to have consistent
patterns in terms of semantic clusters and collocational sets and reveal
consistent correlations between lexical or grammatical form and such
discourse features as modality. Such data also suggest varying range of usage
from one rhetorical section of the article to another. This section of the book
explores this theme in more detail, by examining the specific role of
grammatical items which are found to be statistically salient in one section of
the article alone. I also set out here the statistics used to identify the
grammatical items examined in the previous section (this data is also
included in the Appendices).
3.1 Titles
There are only 2300 words in the PSC titles subcorpus. To study phraseology
in Titles a larger control corpus was needed and so the Medline electronic
database was searched for a diskfull of 572 titles relating to cancer (1 626
words) and, for comparison, their Abstracts were also analysed (58
332words) as detailed in section III.6. However, the items we analyse in the
control corpus are determined by what is found to be salient in the PSC. The
Wordlist programme gives the following data (in the same format as
discussed in Section 2.6 above):
187
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Table 11: Title salient grammatical items from the Wordlist program
‘On’ occurs in expressions that are either the topic of research or the
application of a specific empirical process. A limited set of items introduce
on, and its typical left-collocates have been listed under ‘of’ above (disease
related items):
In Titles ‘on’ is also a key element in fixed modifying expressions which add
embedded information about methodology, as in {research process 1} based
on {research process 2 / clinical process}:
1 The effect of surgical intervention and neck cancer on whole salivary flow.
(Modifier of effect)
2 Blood transfusion does not have adverse effect on survival after operations for
colorectal cancer. A pilot study. (Complement of effect).
189
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
3.2 Abstracts
There are 29 136 words in the PSC Abstracts subcorpus. The Wordlist data
reveal the following salient words:
Table 12: Abstract salient grammatical items from the Wordlist program
190
Christopher Gledhill (2000). Collocations in Science Writing.
The very high significance of but (compared with other grammatical items in
Abstracts) suggests that the reporting of negative results is a fundamental
characteristic of Abstracts. Positive results are announced in a first clause and
then qualified. In particular ‘but’ is an explicit signal of reversal and
evaluation of the direction of quantifiable results (up, down or stable):
191
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
these (according to Appendix C2) here also coincides with Nwogu and
Bloor’s (1991) observation that abstracts tend to employ simple thematic
progression, linearly converting rheme to theme.
192
Christopher Gledhill (2000). Collocations in Science Writing.
mice who took part in the control study were given doxorubicin based
analogues.
193
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
3.3 Introductions
194
Christopher Gledhill (2000). Collocations in Science Writing.
195
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
196
Christopher Gledhill (2000). Collocations in Science Writing.
The third exceptional empirical report in the first pattern has a unique
phraseology, involving a statement about a general research model or
technique as subject:
The difference between the two verbs is that in only follows utilized :
I suggested above that collocational patterns are not due solely to the
grammatical preferences of lexical elements (in this case verbs) but to a
general semantic ‘meaning’ that the collocational framework embodies. A
clear example of this can be seen with ‘show’. Since ‘show’ appears to fit
semantically into several categories of verb (empirical and research-oriented)
it is perhaps no accident that it is the sole verb to be used in both the passive
perfect ‘reporting’ pattern and the extraposed ‘research utterance’ pattern.
Furthermore, its use does not quite coincide with other verbs in terms of
phraseology and lexical collocation. In the first pattern (24 instances), the
expression introduces non-finite clauses in the same way as the verb report.
In this case, however, the clause does not present quantitative results (found
exclusively after has been reported to) but more qualitative findings:
The extraposed pattern for show is similar to other verbs such as establish,
which introduce an explanation rather than a specific quantifiable result. The
difference with other verbs lies in the choice of clause complex, and show is
used almost exclusively in thematically prominent subordinate clauses
introduced by Although:
198
Christopher Gledhill (2000). Collocations in Science Writing.
Although it has been the murine p53 used in all of these studies was
shown that mutated, its mechanisms are not fully understood.
Although it has been p53 gene constructs with many different point
shown that mutations, the gene responsible for the two cancers
has not been identified.
Although it has been the hepatocytes are critical to the survival of the
shown that tumor, .... no correlation has been previously
determined...
Although it has been the cells that mediate cancer induced GVHD,
shown that structural studies of the enzymes have yet to be
published.
As with ‘have’ and ‘been’, ‘has’ plays a key role in the phraseology of report,
taxonomy and evaluation. ‘Has been’ accounts for 60% (188/284) of the
instances of ‘has’, and this usage is detailed above. The remaining phrases
using this item are collocational frameworks with ‘of’: have the _ of’ in
which the whole expression functions as an attributive relational process:
199
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
As seen elsewhere in the corpus, the relational or possessive use of ‘has’ also
involves overt evaluation:
Most of the uses of ‘it’ have been described in the discussion of ‘it is’ and ‘it
has been’ + {research process} above. While the present tense is the
preferred tense in Introductions, with the verbs found, thought (x3),
reasoned, reported, shown the extraposed passive is expressed in the past
tense:
202
Christopher Gledhill (2000). Collocations in Science Writing.
‘It’ is the most Cobuild-salient word in the corpus. The Astec ‘Common’
program shows that in relative frequency (not actual frequency), it is nearly
five times more likely to occur in the Cobuild corpus than in the PSC (the
ratio is 20: 112 per 1000) and this would indicate that extraposed clauses are
a prototypical characteristic of Introductions rather than the rest of corpus.
Extraposed active clauses (in that) are however overtaken in Introductions by
the use of non-finite extraposed to-clauses, such as evaluative research
utterances (it is essential to etc.) and it would be worthwhile to. Such action-
oriented phrases are described below.
203
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The only permanent elements of the phraseology here are the grammatical
items ‘was to’, and the semantics of the surrounding clusters is highly
consistent: {research goal} was to {research process verb}. The only
exception to this seems to be where the aim is to act in a specific
methodology, for example the clinical process ‘generate and trap’. This may
seem unsurprising, but the important point about phraseology is that perfectly
plausible alternatives such as ‘to generate and trap’ are not equally as
prevalent as the research process expressions: they are exceptions. There is
no logical reason why the potential expression {research goal} was to
{empirical / clinical process} should not occur just as frequently in the
corpus. In the case of Introductions, goals are presented as global research
rather than the specific empirical or clinical processes. A possible corollary is
that what would be free or restricted collocation in the general language
becomes fixed either one way or another in the specific language because of
such overriding rhetorical constraints.
However, this does not exhaust the role of to as complementizer in noun
group projections in other salient expressions in Introductions. One
particularly regular projecting clause takes the form: {biochemical process:
possessive} ability to {biochemical process}:
204
Christopher Gledhill (2000). Collocations in Science Writing.
While the present tense is exclusively used for the biochemical / technical
pattern (and can be seen to be used in reporting of results):
205
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
This appears to confirm our findings elsewhere that tense and aspect play a
role in phraseology (we see elsewhere that it does for is / was / have been).
Rather than representing a stance in relation to past and present (current)
research, the past tense appears to correspond to research-oriented
observations (relating to the overt mental or verbal activities of the
researchers) while the present corresponds to biochemical and empirical
observations (covert activity on the part of the researchers).
I have mentioned above that projected ‘to-clauses’ (such as the very
frequent have been found to, designed to) are characteristic of Introductions
while projected ‘that-clauses’ (The possibility that, it has been found that)
become are preferred in Abstracts and Discussions. This may reflect an
increased use of indirect grammatical metaphor later on in the text. In
Introductions, for example mental research processes (in the passive) project
explanatory clauses impersonally:
206
Christopher Gledhill (2000). Collocations in Science Writing.
(drug X) was increased following short exposure to TNF and other solvents
term
(drug X) undergoes induction involving exposure to high concentrations of TNF
Studies have demonstrated permeability
following exposure to non-toxic doses
industrial exposure to methylene chloride
human exposure to higher concentrations
occupational exposure to benzocaine
207
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Table 14: Methods salient grammatical items from the Wordlist program
(1985) who found that the passive in Methods was found to be frequently
present tense (is identified, has been identified). Conversely Heslot (1982)
and Wingard (1981) found that the simple past was prevalent in Methods
sections, which also appears to be contradicted in this corpus. In the
literature, passive expressions in science writing have been characterised as a
novel relationship between subject and verb (Sager et al. 1980, Heslot 1982,
Hanania and Akhtar 1985, Swales 1990). It can be seen that grammatical
subjects correspond consistently with either clinical or empirical verbs (with
some exceptional cross-over):
209
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
210
Christopher Gledhill (2000). Collocations in Science Writing.
Such a use of by for the medium of the sentence rather than the agent changes
our stereotypical view of the passive (in which by signals a grammatical
agent: prepared by the scientists etc.). In a collocational framework with ‘for’
(a Methods salient word) the passive construction is empirically oriented
rather than clinical:
With ‘at’ (another Methods salient word) the passive construction is used to
express some measurement together with clinical process verbs. As with the
patterns above, the collocational cascade only has one step in this pattern
since the phraseological possibilities for circumstantial elements are limited
to times/ temperatures:
{Clinical process}
211
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Prepositions such as by and at have virtually only one use in the cancer
research article as opposed to a wide range of use in the general language.
‘At’ signals empirical measurement or quantification, either of temperature,
duration or increments of time. ‘At’ is necessary after a wide range of
passivised clinical process verbs as we have seen with ‘was / were’, or within
the collocational framework of ‘for (x hours) at (temperature x):
212
Christopher Gledhill (2000). Collocations in Science Writing.
As stated above many of these are repeated several times within the same
text, and listed in the methods section so that certain phrases achieve the
statistical status of idioms. Here is just one example of many, although we
can claim that this is unique in that it involves a triple collocational
framework with an inverted temperature / time expression (as compared with
the expressions above): was (stirred) at (temp.) for (time.) until (empirical /
clinical process item}:
There are also a number of idiomatic uses of ‘at’, for example the expression
‘at risk’ in apposition to either tumors / carcinomas or animals / mice. The
lexical phrase ‘at least’ is perhaps the only exception to this general modifier
pattern, although it also fits into the broader expression of ‘measurement’:
We have seen above that the number of uses listed in Cobuild dictionary for
certain words is usually highly restricted in the PSC. Although then is an
important feature of narrative in English, there is simply no need for
argumentation in this section of the research article and despite being a very
significantly ‘Cobuild-salient’ item, ‘then’ functions here in a restricted way
(it corresponds to 1 out of 10 possibilities in Cobuild (1995 2nd edition): as a
time-specifier before passivised verbs to signal a subsequent incremental step
in the methodology. The most fixed phraseology involves an idiomatic
expression ‘the solution was added dropwise and the suspension was then
heated’ (x4 instances). The following clinical verbs are most frequently used
in this construction:
214
Christopher Gledhill (2000). Collocations in Science Writing.
{Clinical extraction}
{biochemical solution}
were activated with ethanol
were activated with an equal amount of saline
were activated with a cell suspension
were activated with the culture medium
were activated with blank human plasma
{subject-derived serum}
were incubated with a mouse monoclonal antibody
were incubated with monoclonal antibodies
were incubated with antimouse antiserum
216
Christopher Gledhill (2000). Collocations in Science Writing.
{colouring agent}
were stained with 10% ammonium sulphide
were stained with Alcian blue stain
were stained with brilliant crystal blue
were stained with nitro-blue tetrazolium
were stained with monoclonal antibody
Table 15: Results salient grammatical items from the Wordlist program
‘No’ is the most significant salient word in the Results section, and its role in
signalling significant or contradictory data similar to the ‘but...’ pattern in
Abstracts. ‘No’ functions uniquely as a determiner, a usage that is not among
the 12 uses of the word in the Cobuild 1995 dictionary. Its most frequent use
is in the expression ‘there was no significant {difference / correlation}:
The changing preoccupations of the researchers can be seen in the fact that
the passive is preferred for research process verbs rather than the clinical
verbs observed earlier in the Abstract and Methods sections. When the term
‘significant’ is not chosen, another evaluative term is necessary with forms of
‘to be’:
{Empirical evaluation}
Other uses of ‘no’ reveal the delexical nature of verbs used to report findings.
The verb gave collocates regularly with the subject analysis, while revealed
corresponds with specific clinical methods:
The above patterns could have been expressed using an existential ‘there was
no’ (as in the Abstract) but here are used to emphasise the biochemical entity
or clinical process initiating the empirical lack of relationship.
219
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The role of the relational processes ‘is a’ and ‘have a’ is linked with
evaluation in this corpus. ‘Had’ is more restricted however, and in the results
subcorpus, ‘had’ serves to signal some degree of quantification rather than
qualitative evaluation as for has / have in Introductions. The subject often
tends to be a biochemical subject:
This pattern has also been noted in relation to the determiner ‘no’ which can
stand in place of the evaluative quantifier, although this expression is limited
to biochemical compound subjects with empirical item ‘effect’ as head of
complement:
220
Christopher Gledhill (2000). Collocations in Science Writing.
This is further proof that the past tense can be seen as a marked tense,
indicating proximity to current research.
Empirical framework:
by the (addition, method, end, of> <(followed, increased,
presence, production) affected, reflected, mediated)
Clinical framework:
<after the (infusion, administration, end, injection, delivery, of>
221
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
implantation, removal)
Research framework:
<during the (interval, period, intervals, periods) of (study,
observation)>
Measurement framework:
<(consistency, of the (product, mean, of the (first values,
fraction, precision, estimation, loss, values, body
on the basis, time incidence, 21%, weight,
course, grading) accumulation) hyperplasmin,
dose, cell
populations)>
It can be seen that in all of these frameworks (with the exception of the
biochemical sets) all members of the bracketed cluster share some semantic
similarity, even though they may not all fall into our rough 5-part category
system. This is perhaps not surprising - as Renouf and Sinclair (1991) point
out, collocational frameworks depend on their lexical elements to motivate
the structure. The regularity with which some are composed confirms the
view that prepositions are particularly important to the phraseological
specificity of the corpus. The same can also be said of items which have a
wide set of uses in one grammatical role but appear to have a unique
phraseology as prepositions (such as to).
222
Christopher Gledhill (2000). Collocations in Science Writing.
223
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
224
Christopher Gledhill (2000). Collocations in Science Writing.
225
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
{Research process}
226
Christopher Gledhill (2000). Collocations in Science Writing.
should be evaluated
should be investigated
should be mentioned
should be justified
Other expressions share this pattern, such as ‘likely to be’ and ‘found to be’:
The evaluative pattern is in contrast with that associated with the phase-
modal ‘need to be’, which requires a research process as main verb:
230
Christopher Gledhill (2000). Collocations in Science Writing.
231
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
This activity...
I have omitted one high frequency item that is very frequently used to
reformulate results, but is difficult to classify as either research or empirically
oriented on the basis of its intrinsic meaning: this effect. We have already
seen that effect has a complex complement structure, accounting for several
complex collocational frameworks in Titles and Abstracts (in particular in
collocations with in and of). The word can be used to label observable and
measurable phenomena (such as this motion, this reaction) and at the same
time can be construed as a researcher’s interpretation or modelling of results
(this tendency, this frequency). The word appears to lie somewhere in
between this hypothesis (a clear research-orientation) and this activity (an
empirical observation). By reformulating observations as an effect the
researchers simultaneously explain results and comment on previous data
without proposing a new model:
232
Christopher Gledhill (2000). Collocations in Science Writing.
#1 The increased liver weight #2 This effect could be the result of increased
was reversible. intracellular glycogens
#1 Treatment with 8-chloro #2 This effect is even more pronounced in MCF
cAMP drastically reduces R1 LOA cells
levels.
#1 LUMO gap is correlated with #2 This effect is misleading. However, some
downward shift. shifts are involved...
#1 Both approaches resulted in #2 This effect on ECM degradation indicates
80% inhibition. that cell UPA is much more efficient.
#1 EFF cells grew slightly faster #2 This effect was independent of oestrogens.
in MEM.
It can be seen from both of these items that reformulation is not just a process
of lexical selection, but also involves the rest of the clause which
accompanies the reformulating item. It seems that the meaning of
reformulations such as ‘this effect’ and ‘this result’ depend on the orientation
of the following clause. The semantics of a particular word are therefore
thrown into sharp relief by its context of use, but can also be seen to be stable
in rhetorical terms– at least in the context of a particular genre.
233
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The main focus of this book has been to examine the specific context of the
cancer research article. In previous sections, I proposed that grammatical
items are a useful starting point in the analysis of scientific texts. The
collocational behaviour of a selection of grammatical items was set out in the
preceding chapter in order to relate patterns of phraseology to the style and
rhetorical function of the different sections of the research article. I now
summarise the main findings of this study and examine some of the
implications and limitations of the analysis carried out in this book.
235
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Phraseology
Discoursal-rhetorical.
Semantic-syntactic. Lexico-grammar
Statistical-textual.
Collocation
The analysis of grammatical items in the preceding chapters of this book has
revealed a number of interesting properties of the scientific text. From the
point of view of genre analysis and English for Specific Purposes (ESP),
there is much to be said about the role of grammatical collocation and
scientific style. The data I set out above show how statistically significant
grammatical items can be identified using Wordlist (Scott 1993). This
provides a list of ‘salient’ words for each section of the research article (these
are summarised in section 4.3 below). Even this relatively simple,
mechanical step reveals that the distribution of grammatical items varies
236
Christopher Gledhill (2000). Collocations in Science Writing.
237
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
238
Christopher Gledhill (2000). Collocations in Science Writing.
Every text, from the discourses of technocracy and bureaucracy to the television
magazine and the blurb on the back of the cereal packet, is in some way affected
by the modes of meaning that evolved as the scaffolding for scientific
knowledge... In other words, the language of science has become the language of
literacy (Halliday and Martin 1993:11)
239
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
240
Christopher Gledhill (2000). Collocations in Science Writing.
241
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Titles
inhibition effects of chemotherapy on metastases (complex biochemical nominal)
Evaluation of prognostic factors in breast cancer (complex research nominal)
tobacco as a risk factor for lung cancer (nominal with goal)
The relation between clinical and histological outcome... (framework with conjunction)
pS2 is an independent factor of good prognosis in primary breast cancer (evaluation)
Abstracts
the mechanism of action of {compound Y} was shown to {+ empirical process} (complex
nominal expression of findings)
there was a significant increase in toxicity (quantitative report)
It is concluded that propagation did not increase (impersonal expression of quantitative
report)
subjects who receive active management (fixed embedded clause)
both normal and tumor cells (framework with co-ordinate conjunction).
Introductions
p53 gene resistance has been reported (fixed expression of report)
PIMO has received little attention (fixed expression of report)
studies have shown that... (fixed expression of report)
is an effective inhibitor (expression of evaluation)
(Compound X) is stable to the action of (Compound Y) (expression of empirical result)
use of agents such as dismutase (refocusing previous item)
it was also found that (reporting previous research)
242
Christopher Gledhill (2000). Collocations in Science Writing.
Methods
aminids were censored from the organs (idiosyncratic expression of procedure)
was examined for external defects (clinical expression)
at each dose level (procedure)
(Compound Y) was then added dropwise (clinical expression)
was collected and concentrated (clinical sequence)
(data set) calculated from the bootstrap samples 24h after exposure to (fixed expression of
procedure)
Results
There was no significant change in radiosensitivity (qualitative report)
controls did not show RT activity (qualitative report)
mice had a decreased number of formations (quantitative report)
it appears that there are considerable differences (qualitative report)
after the infusion of (clinical framework)
no activity was observed when (X) was incubated (qualitative research report of clinical
process).
243
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Discussion
data suggests that reactive oxygen would be important (modified report of results)
This result may be related to bleeding tendency (modified explanation)
It is interesting to note that (modified research report)
increasing data does not result in any further enhancement (qualitative report)
This evidence suggests that (including reformulation)
we have found that (report)
One of the more fundamental findings to emerge in our study is that the
phraseology in the corpus tends to correspond very consistently to a small set
of dominant semantic categories. In the Pharmaceutical Sciences Corpus
most lexical items were found to belong to four main process types:
RESEARCH, EMPIRICAL, CLINICAL and BIOCHEMICAL. These four
dimensions form a continuum in which they represent the relative
involvement of the author in the scientific activity (either in experimentation
or writing up). RESEARCH processes can be seen as the most overt
expressions of an author’s mental or behavioural involvement, and
BIOCHEMICAL processes are seen as the most distant from the author
(representing a chemical, material process with no overt external agent).
248
Christopher Gledhill (2000). Collocations in Science Writing.
249
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
We have seen in chapter 2 that there is a body of linguistic theory that sees
such patterns as central to the way discourse is construed, or to reformulate
Halliday (1985), how we build and interpret the world through discourse. The
neo-Firthian view of language set out throughout this book sees the semantics
of the word as textually distributed and syntax as intimately linked with
lexical knowledge. In the specific context of cancer research articles,
knowledge of phraseology involves knowing which tense to use in expressing
biochemical and research processes and, to give a very specific example,
even a subconscious knowledge of duality in the discipline in the use of basic
co-ordinating conjunctions. Phraseological knowledge can be seen as a
central factor in the process of writing and reading in this specialist field. In
this regard, Francis (1993) has argued that such knowledge is a key
mechanism by which we move from ideas to linguistic form:
Given this view, that meanings acquire their own wordings, we can therefore
conceive of the broader system of phraseology as the set of linguistic forms
motivated by rhetorical aims and which further shape the discourse. It
follows that the collocational patterns we have identified are formulated in
previous text and must have a role in the processing of the text as a whole.
The intertextual function of collocation is therefore apparent. Clearly any
250
Christopher Gledhill (2000). Collocations in Science Writing.
posit generic features of a text with much more certainty than earlier work.
There has recently been a considerable amount of research on lexical
collocation in technical genres (as in the work of Howarth 1996 and Pearson
1998) or on syndromes of inter-related grammatical categories in the
comparison of broader registers (Biber, Conrad and Reppen 1998). Only a
small number of studies have begun to examine the distribution of
grammatical collocations in a specialised genre, and none have established a
comparative analysis of collocation in sub-sections of a text. While the study
presented here shares similar methods with many computer-based studies of
authorship and information retrieval (for example Ager et al. 1979,
Moskovitch and Caplan 1979, Harris 1985, Phillips 1989, Ahmad et al. 1991
and Ide 1993), few of these have focused on grammatical collocation as a
means of ‘trawling’ or fishing out the phraseological properties of the text.
The aim of my analysis is therefore to balance those studies of genre which
concentrate on the macro-structure of texts (especially within ESP), and also
to provide an alternative contribution to mainstream work on the language of
science, which has tended to see collocations as an extension of terminology
rather than as a feature of text.
Recent studies of corpora of the general language (Sinclair 1991) have
begun to challenge the traditional way of seeing grammatical items. Whereas
lexical items vary in frequency and distribution across a variety of topics and
genres, high frequency grammatical items are assumed to remain the same.
Yet much of the evidence I have presented in this book suggests that this
picture is misleading. The interaction between a grammatical item and a
cluster of semantically-related lexical items suggests that grammatical words
should be seen not only as closed-class or high-frequency items, but also as
the fundamental elements of organisation in phraseological units. Many
grammatical items do of course lack propositional meaning when considered
in isolation, but it is important to consider the role of grammatical words
within longer phrases and their function in the grammatical reformulation of
the text. I have suggested above that grammatical items provide an efficient
way of arriving at a description of the most typical phraseology of the genre.
And we have also seen that grammatical items and grammatical
reformulation have an important role to play in Halliday’s theory of
grammatical metaphor, that is to say in the formation of textual meaning.
When considered from this perspective, it becomes clear that grammatical
items and their attendant phraseology have an important role to play in the
textual and interpersonal functions of the text.
We have seen that grammatical items are present in the most fundamental
phraseology of the Pharmaceutical Sciences Corpus, including such basic
expressions as we conclude that..., [compound X] has been shown to
252
Christopher Gledhill (2000). Collocations in Science Writing.
253
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
254
Christopher Gledhill (2000). Collocations in Science Writing.
which serves to enhance rather than distract from Halliday and Hasan’s
notion of textual cohesion.
It is worth admitting at this point that some features of phraseology which
do not involve isolated grammatical items may have escaped our statistical
trawling. It is fair to say that the reduced relative clauses mentioned in our
sample Discussion section above would be missed by a preliminary analysis
using Wordlist. Although reduced relatives involve a complex syntax and
consistent morphology, this is one aspect of lexical collocation which is
likely to be missed by our surface-based analysis. Generally speaking, there
is no a priori reason why lexical collocations should not form part of the
predominant phraseology of a textual genre. There is also no reason why
morphological features of the text can not be taken into account. However,
the fact remains that grammatical collocation is involved in an immense
portion (if not a majority) of the typical kinds of expression to be found in a
particular text.
These observations suggest that although collocational patterns must be an
important first step in genre analysis, a closer reading of the text is also
required. Typical grammatical phraseology clearly needs to be compared
with other important lexical expressions. As we have seen in the sample text
above, non-typical formulations are likely to have significant roles to play in
the text. Another example from the corpus involves the unusual sentence
adverb ‘Forefront’ in the Introduction of Text JNCI: Forefront in this role is
tumor necrosis factor TNF... Since the text is written by a native-speaker, it
might be assumed that this is a rather marked expression, perhaps used to
signal that this sentence, above all others, is worthy of notice (in popularised
versions of this article TNF is hailed as a new discovery in our understanding
of cancer, as we see below). Such interesting and significant features of the
text should not be ignored, as they are also significant in terms of the text as a
whole. But it is also clear that the idiosyncratic nature of individual texts can
be only be demonstrated by establishing in the first instance those elements
which are generic or salient in the broader corpus and ultimately in the
general language as a whole.
Such exceptions to the rule also indicate that while the global analysis of
collocation is essential in order to establish the major idiomatic
characteristics of the corpus, statistical collocations can only be considered to
be a limited area of style in which all the texts appear to overlap. Thus
generic collocations are important in the sense that they lay bare those areas
of the text which are truly individual or deviant. Such considerations have
long been recognised in the statistical analysis of authorship (in science
writing, Harris 1985), in forensic linguistics (Gibbons 1994) and studies on
information retrieval (Sparck-Jones 1971, Choueka et al. 1985, Frohman
255
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
As I suggested in the previous section, the research set out in this book leaves
a number of questions unanswered. It is not clear, for example, how
phraseology in science is determined and propagated within the discourse
community. There is no indication as yet whether the phraseological patterns
we have seen in a very specific genre are replicated in disciplines other than
cancer research. And there has been no space to discuss the historical
dimension of phraseology. For example, a collocational account would
certainly enhance the useful work carried out already by Biber and Finegan
(1988) and Atkinson (1992) on the history of the research article genre. I
have suggested above that the language of science can be defined in terms of
mechanisms of reformulation and phraseology, in particular by the
underlying tendency towards grammatical metaphor. But it must also be the
case that the research article creates its own new phraseology, and that one
aspect of successful research lies in the extent to which the new phraseology
has been able to penetrate (or be accepted by) the existing discourse and be
replicated as part of the established order. Studies such as Choueka et al.
(1985) and Busch (1992) argue that slight variation in the use of common
lexical collocations is an important indicator of novelty in technical writing.
This suggests a future research programme which explores the possibility that
language has a role to play in the natural selection of scientific ideas. I have
previously proposed a phraseological view of logogenesis (the evolution of
phrases within the text, Gledhill 1997), and would like to suggest that future
work be applied to ontological development (the acquisition of phraseology
in the individual) and phylogenic development (the evolution of phraseology
over time).
256
Christopher Gledhill (2000). Collocations in Science Writing.
257
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
The reason for depletion of host tissues is not known, but is thought to arise
from differences in metabolism in the tumour-bearing state. (Biochemistry
Journal)
From 12 newspaper clippings in the local and national press, the first
sentence of the Independent suffices to show the processes of reformulation
which may take place:
The report displays several examples of phraseology which would not be out
of place in the Pharmaceutical Sciences Corpus: nominal compaction (the
use of ‘of’ and reduced relative clauses) as well as hedging with ‘may’. In
addition, there are a number of grammatical metaphors (underlined),
expressing impersonal ideas (treatment of..., new evidence that..., weight loss
associated with...).. There is therefore a striking similarity between this
discourse and that of the original research articles. Since the journalists
themselves use press releases produced by the cancer research charities, this
is presumably reflected in the language of the popular report. Despite similar
phraseological features, the press reports are never quite the same as each
other, which leads to an interesting range of variable expressions. The
consequences of this are not yet clear. But it would seem to suggest that
stereotypical features of scientific writing such as nominalisation,
passivisation and general complexity of grammatical metaphor are just as
much a part of the popularised genre of science writing as the original
technical text. Science writing becomes less bound to an original text or
genre, and takes on a more abstract existence as a mode of meaning.
258
Christopher Gledhill (2000). Collocations in Science Writing.
Beyond the corpus analysis carried out in this study, there is further work to
be done in genre and discourse analysis in general. Despite the immense
growth of specialised language corpora, there remains considerable scope for
the analysis of collocation in both descriptive and applied linguistics. Very
little work has been done for example on the comparative analysis of lexico-
grammars in languages other than English. While much work in corpus
linguistics has recently been devoted to language teaching (for example,
Johns and King 1993, Van Halteren 1994), Barnbrook (1996) points out that
corpora are long way from being properly exploited as reference tools in
general linguistics. There is in contrast a strong tradition of corpus analysis in
literary and authorship studies (more recently including Potter 1991 and Ide
1993) and there have been interesting developments in forensic linguistics
and in the automatic detection of plagiarism (Coulthard 1994). But in each
case there remains much to be said about the comparative analysis of
collocation and phraseology. A large text corpus produced by second-
language learners of English has been examined extensively by Granger
(1996), and this research has shown that it is possible to examine
collocational differences between apprentice writers and professionals in
order to pin-point learners’ difficulties and design teaching materials. A
corpus of ‘apprenticeship’ texts may not only be a useful analytical tool in
monitoring the linguistic progress of apprentice writers, but also in analysing
how texts are edited and changed in their process of production, and how
coherence develops chronologically throughout the text (such work has been
taken on by Kouřilova, forthcoming). And in this respect, there are many
dimensions of the Pharmaceutical Sciences Corpus which remain unexplored,
for example the potential differences between single-author and team-
authored texts, between native-speaker and non-native texts, or between
papers on biology and those on structural chemistry. These fascinating
possibilities belong, of course, to another book.
259
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
261
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Journals are alphabetically listed according to the Science Citation Index mnemonic code
(CCP, CL etc) and not according to title. The Journal’s rank in the SCI (1988) impact factor
table (compared with 1000 other journals) is listed as an approximate indicator of prestige.
The relative size of the journal as a percentage of the corpus is also noted. A Unix-based
word count has been used for this list, where the total corpus is of 150 papers, and 519 201
running words. For each paper one of several field classifications is noted (generally: cancer
research / medicinal chemistry / pharmacology / structural chemistry). Only asterisked
authors (usually the lead writer) are noted in the case of multiple author papers.
BJC1:The influence of the schedule and the dose of gemcitabine on the anti- tumour efficacy
in experimental human cancer [Cancer Chemotherapy] Author: TB. Source: Brit J. Can
68/1 1993
BJC2:Regulation of cytochrome P450 gene expression in human colon and
breast tumour xenografts [Carcinogenesis] Author: MP, JR. Source: Brit J. Can 65/4
1992
BJC3: Allele loss from 5q21 (APCIMCC) and 18q21 (DCC) and DCC mRNA expression in
breast cancer [Carcinogenesis] Author: GH Source: Brit J. Can 65/5 1992
BJC4:Comparative radioimmunotherapy using intact or F(ab’)2 fragments of 13lI anti- CEA
antibody in a colonic xenograft model [Cancer Radioimmunology] Author: FS. Source:
Brit J. Can 65/6 1992
BJC5:Characterization of n-inedsine-resistant human sarcomas. [Cancer Chemotherapy]
Author: ML, OD,YD. Source: Brit J. Can 65/7 1992
BJC6:Strong HLA-DR expression in large bowel carcinomas is associated with
good prognosis [Etiology/Histopathology] Author: CV, NB, OP. Source: Brit J. Can 65/8
1992
BJC7:Response to adjuvant chemotherapy in primary breast cancer: no
correlation with expression of glutathione S-transferases [Cancer Chemotherapy] Author:
AL. Source: Brit J. Can 68/3 1993
262
Christopher Gledhill (2000). Collocations in Science Writing.
CAR - Carcinogenesis.
[SCI 1988 Rank=326 Corpus %=8.475]
CAR1:Sensitivity to tumor promotion of SENCAR and C57BL/6J mice
correlates with oxidative events and DNA damage. [Tumour Promotor Carcinogenesis]
Author: NH. Car. 4/5 1993
CAR2: Ras protooncogene activation of methylene chloride. [Carcinogenesis]
Author: CK. Car. 5/5 1993
CAR3:Characterization of p53 mutations in methylene chloride-induced lung tumors
from B6C3F1 mice [Cancer Histology] Author: NE. Car. 1/6 1993
CAR4:Inhalation exposure to a hepatocarcinogenic concentration of methylene chloride does
not induce sustained replicative DNA synthesis in hepatocytes of female B6C3F1 mice
[Cancer Histopathology] Author: RS. Car. 2/6 1993
263
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
CCP11: Effect of toremifene on antipyrine elimination in the isolated perfused rat liver.
Author: TD 29/4 1992
CCP12:A limited sampling method for estimation of the carboplatin area under
the HNR curve. Cell-growth inhibition by and cytotoxicity of anthracyclines
in doxorubicin-sensitive and -resistant F4-6 cells. [Cancer Chemotherapy] Author: PI.
29/3 1992
CCP13:Pharmacokinetics of 10-ethyl-10-deaza- aminopterin, edatrexate, given weekly
for non- small-cell lung cancer [Cancer Chemotherapy] Author: KH. 29/2 1992
CCP14:Phase I clinical evaluation of [SP-4-3(R)]-[1,1-cyclobutanedicarboxylato(2-)] (2-
methyl-1,4-butanediamine-N,Nl) platinum in patients with metastatic solid
tumors [Cancer Chemotherapy] Author: VE. 29/1 1992
CCP15:Phase II study of high-dose ifosfamide in hepatocellular carcinoma
[Cancer Chemotherapy]
Author: RW. 28/6 1992
CCP16: Ifosfamide in advanced epidermoid head and neck cancer [Cancer Chemotherapy]
Author: SI. 28/5 1992
265
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
CR2: Monoclonal Antibodies to the Myogenic Regulatory Protein MyoD1: Epitope Mapping
and Diagnostic Utility. [Cancer Immunohistochemistry] Author: TW Vol 53/23 1992
CR3:Therapy with Unlabeled and 13lI-labeled Pan-B-Cell Monoclonal Antibodies
in Nude Mice Bearing Raji Burkitt’s Lymphoma Xenografts [Cancer
Immunohistochemistry] Author: ET Vol 53/24 1992
CR4: Inhibition of Cellular Proliferation by Peptide Analogues of Insulin-like Growth Factor
[Cancer Chemotherapy] Author: LK Vol 53/25 1992
CR5:Expression of the Endogenous 06-Methylguanine-DNA-
methyltransferase Protects Chinese Hamster Ovary Cells from Spontaneous G:C to A:T
Transitions1 [Cancer Carcinogenesis] Author: PS Vol 54/26 1993
CR6:Tumor-associated Mr 34,000 and Mr 32,000 Membrane Glycoproteins That Are Serine-
Phosphorylated Specifically in Bovine Leukemia Virus-induced Lymphosarcoma Cells’
[Cancer Carcinogenesis] Author:PR Vol 54/27 1993
CR7:Antitumor Effect of Interferon plus Cyclosporine A following
Chemotherapy for Disseminated Melanomal [Cancer Immunology] Author: SH Vol
54/28 1993
CR8: Tumorigenic Suppression of a Human Cutaneous Squamous Cell Carcinoma Cell Line
in the Nude Mouse Skin Graft Assay. [Cancer chemotherapy] Author: GU Vol 54/29
1993
CR9:A Retrovirus in Chinook Salmon (OncoYhynchus tshawytscha)
with Plasmacytoid Leukemia and Evidence for the Etiology of the Disease.
[Carcinogenesis] Author: AL Vol 52/17 1991
CR10: Expression and CpG Methylation of the Insulin-like Growth Factor II Gene in Human
Smooth Muscle Tumors [Carcinogenesis] Author: HT Vol 52/18 1991
CR11:Loss of Heterozygosity Involves Multiple Tumor Suppressor Genes in
Human Esophageal Cancers [Carcinogenesis] Author: YF Vol 54/19 1991
CR12:Induction of c-fos Gene Expression by Exposure to a Static Magnetic Field in HeLaS3
Cells1 [Carcinogenesis] Author: KH Vol 54/20 1991
266
Christopher Gledhill (2000). Collocations in Science Writing.
267
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
269
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
270
Christopher Gledhill (2000). Collocations in Science Writing.
271
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
TL: Synthesis of Antiviral Nucleosides from Crotonaldehyde. Part 3.1,2 Total Synthesis
of Didehydrodideoxythymidine (d4T) [Organic Chemistry] Author: JE, JG. Tetr Let Vol.
33/27 1992
T.P.S. - Trends in Pharmaceutical Sciences.
[SCI 1988 Rank= 94. Corpus %=0.231]
TPS: Newly identified factors that alter host metabolism in cancer cachexia [Cancer
Histopathology]
Author: MT. Source: JNCI Vol. 82/ 24
272
Christopher Gledhill (2000). Collocations in Science Writing.
Titles PSC
RANK WORD Freq. % Freq. % Chi2 Probability
2
Some items were mis-scanned in the original corpus. I have marked them sic
273
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
33 INEDSINE 1 1 28.0
34 MELANOMAL 1 1 28.0
35 MOIETIEY 1 1 28.0
36 SUBLCONES [sic] 1 1 28.0
37 ASSAY1 1 1 28.0
38 LYMPHOBLASTIC 1 1 28.0
39 AANALYSIS [sic] 1 1 28.0
40 PYRENEINDUCED 1 1 28.0
41 ARCHETAL 1 1 28.0
42 IMPORTANCEL 1 1 28.0
43 ANTLTUMOUR [sic] 1 1 28.0
44 ASPEYGILLUS 1 1 28.0
45 DISEASE1 1 1 28.0
46 DELOCALIZE 1 1 28.0
47 PREDICTABILITY 1 1 28.0
48 TRIAMINE 1 1 28.0
49 PREDICTABILITY 1 1 28.0
50 TRIAMINE 1 1 28.0
Titles PSC
RANK WORD Freq. % Freq. % Chi2 Probability
274
Christopher Gledhill (2000). Collocations in Science Writing.
Abstracts PSC
RANK WORD Freq. % Freq. % Chi2 Probability
1 ABSTRACT 32 (0.1%) 32 234.6
2 SUMMARY 39 (0.1%) 63 203.3 0.000
3 DOXORUBICIN 26 97 54.7 0.000
4 5FU 14 45 34.1
5 MYOD1 9 19 33.2
6 DOXO 16 59 33.0
7 KG 43 (0.1%) 303 30.4 0.000
8 SUGGEST 30 (0.1%) 177 30.3 0.000
9 HN9 5 5 29.9
10 H691VDS 5 6 26.4
11 HETEROZYGOSITY 13 50 24.8
12 ESTERS 12 44 24.2
13 MAMMARY 26 161 23.7 0.000
14 ACTIVE 33 (0.1%) 231 23.4 0.000
15 DOSES 29 193 22.8 0.000
16 STUDIED 26 164 22.8 0.000
17 RESISTANEE [sic] 4 4 22.4
18 SPIRAMYEIN 4 4 22.4
19 TUMOR 114 (0.4%) 1235 (0.2%) 21.8 0.000
20 INHIBITED 21 121 21.7 0.000
21 IOA 6 12 21.7
22 EXPRESSION 63 (0.2%) 582 (0.1%) 21.6 0.000
23 PATIENTS 63 (0.2%) 584 (0.1%) 21.3 0.000
24 CORRELATED 13 56 21.0
25 MHB 16 80 20.8 0.000
26 ACYLOXYBENZYL 9 29 20.7
27 ANTHRACENE 13 57 20.5
28 INDUCED 57 (0.2%) 521 (0.1%) 20.1 0.000
29 OA 4 5 19.2
30 NDENT 5 9 19.0
31 BUT 67 (0.2%) 663 (0.1%) 18.1 0.000
32 IMMORTALIZED 13 62 17.9
33 SHOWED 43 (0.1%) 375 17.4 0.000
34 INCREASED 43 (0.1%) 376 17.2 0.000
35 INTERVAL 12 56 16.9
36 PDL 4 6 16.7
37 GROWTH 69 (0.2%) 707 (0.1%) 16.4 0.000
38 DECREASED 23 161 15.9 0.000
39 CANCER 54 (0.2%) 522 (0.1%) 15.7 0.000
40 CONTRACTIONS 5 11 15.7
275
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
41 AZIDE 10 43 15.7
42 HAEMORRHAGE 8 29 15.5
43 THESE 119 (0.4%) 1399 (0.3%) 15.3 0.000
44 MANAGEMENT 17 104 15.3 0.000
45 ETHOXY 3 3 15.0
46 PROFICIENT 3 3 15.0
47 NONNAL 3 3 15.0
48 BENZOCAINE 12 61 14.7
49 PAA 4 7 14.6
50 TUMORS 82 (0.3%) 903 (0.2%) 14.4 0.000
Abstracts PSC
RANK WORD Freq. % Freq. % Chi2 Probability
276
Christopher Gledhill (2000). Collocations in Science Writing.
Introductions PSC
RANK WORD Freq. % Freq. % Chi2 Probability
1 ET 692 (1.2%) 1987 (0.4%) 652.5 0.000
2 AL 670 (1.1%) 1933 (0.4%) 626.3 0.000
3 BEEN 346 (0.6%) 966 (0.2%) 341.1 0.000
4 HAS 283 (0.5%) 741 (0.1%) 310.3 0.000
5 HAVE 359 (0.6%) 1127 (0.2%) 285.4 0.000
6 INTRODUCTION 83 (0.1%) 97 234.8 0.000
7 IS 643 (1.1%) 3169 (0.6%) 156.3 0.000
8 RECENTLY 52 102 84.3 0.000
9 STUDIES 135 (0.2%) 494 76.6 0.000
10 CANCER 140 (0.2%) 522 (0.1%) 76.0 0.000
11 SUCH 113 (0.2%) 388 73.7 0.000
12 GENES 82 (0.1%) 242 71.9 0.000
13 EFFECTS 112 (0.2%) 414 61.8 0.000
14 VARIETY 37 72 59.9 0.000
15 CAN 120 (0.2%) 468 58.1 0.000
16 ROLE 56 152 56.4 0.000
17 REPORT 37 79 53.0 0.000
18 IT 207 (0.3%) 1006 (0.2%) 52.2 0.000
19 WE 200 (0.3%) 972 (0.2%) 50.4 0.000
20 SUPPRESSOR 39 92 48.5 0.000
21 HUMAN 167 (0.3%) 784 (0.2%) 47.4 0.000
22 IMPORTANT 55 170 43.7 0.000
23 MANY 50 150 41.9 0.000
24 SYNTHESIS 61 (0.1%) 204 41.5 0.000
25 OF 2874 (4.8%) 21309 (4.3%) 41.4 0.000
26 CHIRAL 26 51 41.0 0.000
27 ARE 332 (0.6%) 1920 (0.4%) 39.7 0.000
28 BE 317 (0.5%) 1825 (0.4%) 38.8 0.000
29 SEVERAL 75 (0.1%) 284 38.7 0.000
30 REPORTED 95 (0.2%) 395 38.6 0.000
31 CLINICAL 48 151 36.7 0.000
32 TO 1233 (2.1%) 8631 (1.7%) 36.6 0.000
33 COMPOUNDS 76 (0.1%) 296 36.6 0.000
34 MECHANISMS 45 138 36.1 0.000
35 ITS 88 (0.1%) 365 36.0 0.000
36 OFTEN 29 68 35.9 0.000
37 SYSTEMS 37 104 34.5 0.000
38 CANCERS 36 100 34.3 0.000
39 SOME 77 (0.1%) 310 34.0 0.000
277
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
278
Christopher Gledhill (2000). Collocations in Science Writing.
Methods PSC
RANK WORD Freq. % Freq. % Chi2 Probability
1 WERE 2795 (2.0%) 5162 (1.0%) 876.5 0.000
2 H 1281 (0.9%) 1961 (0.4%) 620.2 0.000
3 WAS 2877 (2.1%) 6146 (1.2%) 576.7 0.000
4 ML 850 (0.6%) 1097 (0.2%) 562.8 0.000
5 C 1303 (0.9%) 2303 (0.5%) 454.8 0.000
6 MIN 506 (0.4%) 725 (0.1%) 277.5 0.000
7 MM 401 (0.3%) 540 (0.1%) 245.9 0.000
8 MMOL 282 (0.2%) 302 245.4 0.000
9 ADDED 295 (0.2%) 340 231.6 0.000
10 M 582 (0.4%) 973 (0.2%) 231.2 0.000
11 X 597 (0.4%) 1045 (0.2%) 212.4 0.000
12 G 520 (0.4%) 878 (0.2%) 201.7 0.000
13 D 487 (0.4%) 821 (0.2%) 189.5 0.000
14 SOLUTION 304 (0.2%) 428 171.7 0.000
15 HZ 240 (0.2%) 294 171.5 0.000
16 S 620 (0.5%) 1203 (0.2%) 166.9 0.000
17 WASHED 179 (0.1%) 190 157.0 0.000
18 THEN 282 (0.2%) 420 142.9 0.000
19 BUFFER 232 (0.2%) 313 141.2 0.000
20 AT 1324 (1.0%) 3287 (0.7%) 140.3 0.000
21 PH 304 (0.2%) 483 134.8 0.000
22 USING 412 (0.3%) 752 (0.2%) 131.2 0.000
23 PBS 143 (0.1%) 153 123.8 0.000
24 INCUBATED 184 (0.1%) 237 120.9 0.000
25 FOR 1919 (1.4%) 5224 (1.0%) 120.1 0.000
26 DESCRIBED 269 (0.2%) 436 114.0 0.000
27 WATER 209 (0.2%) 305 109.9 0.000
28 PERFORMED 181 (0.1%) 250 105.3 0.000
29 SODIUM 142 (0.1%) 173 101.7 0.000
30 EACH 323 (0.2%) 595 (0.1%) 100.2 0.000
31 CONTAINING 229 (0.2%) 370 97.6 0.000
32 V 288 (0.2%) 515 (0.1%) 96.5 0.000
33 I 828 (0.6%) 2029 (0.4%) 93.1 0.000
34 USED 391 (0.3%) 790 (0.2%) 92.7 0.000
35 SIGMA 100 102 91.7 0.000
36 CH 100 106 87.2 0.000
37 COLUMN 152 (0.1%) 212 86.7 0.000
38 DRIED 102 113 83.7 0.000
39 MEDIUM 221 (0.2%) 376 83.6 0.000
40 DISSOLVED 90 92 82.1 0.000
41 TEMPERATURE 145 (0.1%) 204 81.3 0.000
42 MIXTURE 137 188 80.4 0.000
43 MHZ 92 101 76.3 0.000
279
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
280
Christopher Gledhill (2000). Collocations in Science Writing.
281
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
282
Christopher Gledhill (2000). Collocations in Science Writing.
283
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Discussions PSC
RANK WORD Freq. % Freq. % Chi2 Probability
1 THAT 1381 (1.2%) 3357 (0.7%) 341.8 0.000
2 BE 788 (0.7%) 1825 (0.4%) 225.6 0.000
3 MAY 383 (0.3%) 658 (0.1%) 223.2 0.000
4 IS 1167 (1.0%) 3169 (0.6%) 193.1 0.000
7 OUR 222 (0.2%) 381 129.0 0.000
9 IN 3991 (3.5%) 14349 (2.9%) 116.0 0.000
11 NOT 662 (0.6%) 1798 (0.4%) 108.9 0.000
12 THIS 704 (0.6%) 1997 (0.4%) 96.2 0.000
13 WE 395 (0.3%) 972 (0.2%) 92.9 0.000
14 HAVE 442 (0.4%) 1127 (0.2%) 92.1 0.000
284
Christopher Gledhill (2000). Collocations in Science Writing.
IX. References
285
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Austin J.L. 1962 * 1975 (eds. Urmson J.O. and Sbisà M). How to Do Things with Words
London: Oxford University Press
Baker D.B., Horiszy J.W. and Metanomski W.V. 1980. ‘History of Abstracting at Chemical
Abstracts Service.’ in Journal of Chemical Information and Computer Science Vol. 20:
193-201
Baker M., Francis G. and Tognini-Bonelli E. (eds.) 1993. Text and Technology Amsterdam:
John Benjamins
Barber C.L. 1962. ‘Some Measurable Characteristics of Modern Scientific Prose.’ in
Almquist and Wikwell (eds.) Contributions to English Syntax and Philology: 21-43
Barnbrook G. 1996. Language and Computers Edinburgh University Press: Edinburgh
Barthes R. 1966. Mythologies. Paris: Seuil
Basili R., Pazienza M.T. and Velardi P. 1992. ‘A Shallow Syntactic Analyser to Extract
Word Associations from Corpora.’ in Literary and Linguistic Computing Vol.7/2: 113-
123
Banks D. 1994a ‘Clause Organization in the Scientific Journal Article’. Alsed-Lsp
Newsletter Vol. 17/2: 4-16.
Banks D 1994b. Writ in Water: Aspects of the Scientific Journal Article. E.R.L.A.:
Université De Bretagne.
Banks D. 1997. ‘The Things We Make’. In Language Sciences. 19/4: 303-308.
Banks D. 1998. ‘Vague Quantification in the Scientific Journal Article.’ in Anglais de
Spécialité. GERAS: Presses de l’Université Victor-Segalen, Bordeaux No. 19/22: 17-27.
Bauer L. 1979. ‘On the Need for Pragmatics in the Study of Nominal Compounding.’ in
Journal of Pragmatics. 3/1: 45-50.
Béjoint H. 1988. ‘Scientific and Technical Words in General Dictionaries.’ in International
Journal of Lexicography Vol. 1/4: 354-368
Benson M. 1989. ‘The Collocational Dictionary and the Advanced Learner.’ in M.L. Tickoo
(ed.) Learner’s Dictionaries: State of the Art Singapore: SEAMO Regional Language
Centre: 84-93
Benson. M., Benson., E. and Ilson R. 1986 The Lexicographic Description of English
London: John Benjamins
Bernier C.L. 1972. ‘Terse Literatures 1: Terse Conclusions.’ in Journal of the American
Society for Information Science Vol. 21: 316-319
Bernier C.L. 1985. ‘Abstracts and Abstracting.’ in DYM: 423-444
Berry-Rogghe G. 1970. ‘Collocations: Their Computation and Semantic Significance.’
Unpublished Ph.D Thesis, UMIST, Manchester
Biber D.1986. Variation across Speech and Writing Cambridge: Cambridge University Press
Biber D. 1989. ‘A Typology of English Texts.’ in Linguistics 27: 3-43
Biber D. 1992a. ‘On the Complexity of Discourse Complexity: A Multidimensional
Analysis.’ in Discourse Processes Vol. 15 133-163
Biber D. 1992b. ‘Using Computer-Based Text Corpora to Analyze the Referential Strategies
of Spoken and Written Texts.’ in J. Svartvik (ed.) 1992: 215-252
Biber D. 1993. ‘The Multidimensional Approach to Linguistic Analyses of Genre Variation:
An Overview of Methodology and Findings.’ in Computers and the Humanities Vol. 26:
331-345.
Biber D. Conrad S. and Reppen R. 1994. ‘Corpus-Based Approaches to Issues in Applied
Linguistics.’ in Applied Linguistics Vol. 15/2: 169-189
Biber D., Conrad S., and Reppen R. 1996. ‘Corpus-Based Investigations of Language Use’.
In Annual Review of Applied Linguistics. 16: 115-136.
Biber D., Conrad S., Reppen R. 1998. Corpus Linguistics: Investigating Language Structure
and Use. Cambridge: Cambridge University Press.
286
Christopher Gledhill (2000). Collocations in Science Writing.
Biber D. and Finegan E. 1988. ‘Drift in Three English Genres from the 18th to the 20th
Centuries: A Metadiscoursal Approach.’ in M.Kytö et al. (eds.): 83-99
Biber D. and Finegan E. (eds.) 1994. Sociolinguistic Perspectives on Register Oxford:
Oxford University Press
Blackwell S. 1987. ‘Problems in the Automatic Parsing of Idioms.’ in R. Garside et al. (eds)
Syntax Versus Orthography: 110-119
Bloor T. and Bloor M. 1985. ‘Language for Specific Purposes: Practice and Theory’. CLCS
Occasional Papers: Trinity College, Dublin.
Borko H. and Chatman S. 1963. ‘Criteria for Acceptable Abstracts: A Survey of Abstractors’
Instructions.’ in American Documentation Vol. 14: 175-184
Boyer E. 1994. the Academic Profession: An International Perspective. California: Princeton
Press
Brekke M. 1991. ‘Automatic Parsing Meets the Wall.’ in S. Johansson and A.B. Strenström
(eds.): 83-103
Brett P. 1994. ‘A Genre Analysis of the Results Sections of Sociology Articles.’ in English
for Specific Purposes Journal Vol.13/1: 47-59
Briscoe T. 1990 ‘English Noun-Phrases Are Regular: A Reply to Professor Sampson.’ in J.
Aarts and W. Meijs 1990: 45-60
Britt M.A. Perfetti C.A. and Garrod S. 1992. ‘Parsing in Discourse: Context Effects and
Their Limits.’ in Journal of Memory and Language Vol.31: 293-314
Burnard L. 1992. ‘Tools and Techniques for Computer-Aided Text Processing.’ in C. Butler
(ed.): 1-28
Busch G. 1992. ‘Search and Retrieval.’ in BYTE, June: 274-282. New Yorx: Bix Publishers
Butler C. 1985a. Computers in Linguistics Oxford: Basil Blackwell
Butler C. 1985b. Statistics in Linguistics Oxford: Basil Blackwell
Butler C. (ed.) 1992. Computers and Written Texts Oxford: Basil Blackwell
Butler C. 1993. ‘Between Grammar and Lexis: Collocational Frameworks in Spanish’
Unpublished Paper Presented at the 5th International Systemic Workshop on Corpus-
Based Studies, Universidad Complutense De Madrid, 26-29 July 1993
Buxton A.B. and Meadows A.J. 1978. ‘Categorisation of Information in Experimental
Papers and Their Author Abstracts.’ in Journal of Research Communication Studies Vol.
1: 161-182
Cahn, R. S. 1979. Introduction to Chemical Nomenclature. New York Press.
Carter R. 1998. Vocabulary. Applied Linguistic Perspectives. (2nd Edition). London:
Routledge.
Cavalli-Sforza L. and Felman M. 1989. Cultural Transmission and Evolution Princeton New
Jersey: Princeton University Press
Chafe W. 1992. ‘The Importance of Corpus Linguistics to Understanding the Nature of
Language.’ in Svartvik 1992a: 79-97
Chesterman A. 1997. Memes of Translation. the Spread of Ideas in Translation Theory.
Amsterdam: John Benjamins.
Choueka Y., Klein T. and Neuwitch E. 1983. “‘Automatic Retrieval of Idiomatic and
Collocational Expressions in A Large Corpus.’ in Journal for Literary and Linguistic
Computing Vol. 4: 34-38
Church K. W. and Hanks . P 1989. ‘Word Association Norms, Mutual Information and
Lexicography.’ in Computational Linguistics 16/1: 22-29
Church K. W. and Mercer R.L. 1993. ‘Introduction to the Special Issue on Computational
Linguistics Using Large Corpora.’ in Computational Linguistics Vol. 19/1: 1-24
Clarke D. F. and Nation I. S. P. 1980. ‘Guessing the Meanings of Words from Context:
Strategy and Techniques’. In System 8/3: 211-220.
287
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Clear J. 1987. ‘Overview of the Role of Computing in Cobuild.’ in J.McH. Sinclair (ed.)
1987: 41-61
Clear J. 1993. ‘from Firth Principles. Collocational Tools for the Study of Collocation.’ in M.
Baker et al. (eds.) 1993: 271-292
Cleveland D.B. and Cleveland A.D. 1983. Introduction to Indexing and Abstracting
Princeton Colorado Libraries Unlimited
Collins P. and Peters P. 1988. ‘The Australian Corpus Project.’ in M. Kytö et al. (eds.): 103-
120
Collot M. 1991. ‘Electronic Language. A Pilot Study of A New Variety of English.
Computer Corpora des Englischen in Forschung, Lehre und Anwendungen (CCE
Newsletter, Berlin) Vol. 5 (1/2): 13-31
Coulmas F. 1979. ‘On the Sociolinguistic Relevance of Routine Formulae’. In Journal of
Pragmatics. 2/3: 223-235.
Coulthard M. (ed.) 1994. Advances in Written Text Analysis London: Routledge.
Cowie A.P. (ed). 1998 Phraseology: Theory, Analysis, and Applications. Oxford: Oxford
University Press.
Cremmins E.T. 1982. The Art of Abstracting Philadelphia ISI Press
Cruse D.A. 1986. Lexical Semantics Cambridge University Press
De Beaugrande R. 1991. Linguistic Theory. the Discourse of Fundamental Works. Longman:
London.
De Beaugrande R. and Dressler W. 1981. Introduction to Text Linguistics London:
Longman
DeCarrico, J. and Nattinger, J. 1988. `Lexical Phrases for the Comprehension of Academic
Lectures’, ESP Journal, 7/2, 91-101
Derewianka B. 1994. ‘Grammatical Metaphor and Fuzzy Boundaries’. Unpublished MS,
Presented at the 21st International Systemic Functional Comgress, 1-5 August 1994.
Diodato V. 1982. ‘The Occurrence of Title Words in Parts of Research Papers: Variations
Among Disciplines.’ in Journal of Documentation Vol. 38/3: 192-206
Dobrovol’skij, D. 1992. ‘Phraseological Universals: Theoretical and Applied Aspects’. In M.
Kefer (ed.) Meaning and Grammar: Cross-Linguistic Perspectives. Berlin.
Dopkins S. and Morris R.K. 1992. ‘Lexical Ambiguity and Eye Fixation in Reading: A Test
of Competing Models of Lexical Autonomy Resolution.’ in Journal of Memory and
Language Vol.31: 461-476
Dronberger G.B. and Kronitz G.T. 1975 ‘Abstract Readability As A Factor in Information
Systems.’ in Journal of the American Society for Information Science Vol. 26: 108-111
Drury H. 1991. ‘The Use of Systemic Linguistics to Describe Student Summaries at
University Level.’ in E. Ventola (ed.) 1991: 431-456
Dubois B. L. 1981. ‘The Construction of Noun Phrases in Biomedical Journal Articles.’ in J.
Hoedt et al. (eds) Pragmatics and LSP Copenhagen: : 49-67
Dubois B. L. 1997. The Biomedical Discussion Section in Context. London: Ablex
Publishing Corporation.
Endres-Niggemeyer B. 1985. ‘Referierregeln Und Referate- Abstracting Als
Regelgesteuerter Textverarbeitungsprozeß.’ in Nachtrichten Für Dokumentaristen Vol.
36/1: 38-50
Enkvist N. 1964. ‘On Defining Style: An Essay in Applied Linguistics.’ in J. Spencer (ed.)
Linguistics and Style London: Oxford University Press.
Enkvist N. 1989. ‘From Text to Interpretability: A Contribution to the Discussion of Basic
Terms in Text Linguistics.’ in W. Hyedrich et al. (eds.) 1989: 369-382
Escarpit R. 1976. Théorie Générale de l’Information et de la Communication Paris Hachette
288
Christopher Gledhill (2000). Collocations in Science Writing.
Everaert M., Van Der Linden E., Schenk A., and Schreuder R. (eds.) 1995. Idioms:
Structural and Psychological Perspectives. Hillsdale, NJ: Lawrence Erlbaum Associates.
Fernando C. 1996. Idioms and Idiomaticity. Oxford: Oxford University Press.
Fidel R. 1986. ‘Writing Abstracts for Free-Text Searching.’ in Journal of Documentation
Vol. 42/1: 11-21
Fillmore C.J. 1992. ‘Corpus Linguistics, or Computer-Aided Armchair Linguistics.’ in
Svartvik (ed) 1992a: 35-60
Fillmore C.J. and Atkins S. 1994. ‘Starting Where the Dictionaries Stop: the Challenge of
Corpus Lexicography.’ in S. Atkins and Zampolli (eds.) Computational Approaches to
the Lexicon Oxford: Oxford University Press
Fillmore C.J., Kay P. and O’Connor M.C. 1988. ‘Regularity and Idiomacy in Grammatical
Constructions.’ in Language Vol. 64: 501-538
Firth J.R. 1935. ‘The Techniques of Semantics.’ in Transactions of the Philological Society.
36-72.
Firth J.R. 1957. Papers in Linguistics 1934-1951. Oxford: Oxford University Press.
Fischer R. 1998. Lexical Change in Present-Day English. Tübingen: Gunter Narr Verlag.
Fløttum K. 1985. ‘Methodological Problems in the Analysis of Student Summaries.’ in Text
Vol. 5/4: 291-308
Fontenelle T. 1994. ‘What on Earth are Collocations?’. In English Today No. 40 Vol. 10/4:
42-48.
Fox G. 1993. ‘A Comparison of ‘Policespeak’ and ‘Normalspeak’: A Preliminary Study.’ in
J. McH. Sinclair et al. (eds.) 1993: 184-195
Foucauld M. 1972. the Archaeology of Knowledge London: Tavistock.
Francis G. 1985. ‘Anaphoric Nouns.’ Discourse Analysis Monograph No. 11: Birmingham:
Birmingham University English Language Research
Francis G. 1993. ‘A Corpus-Driven Approach to Grammar.’ in Baker et al. (eds.) 1993: 137-
156
Francis G. and Kramer-Dahl A. 1991. ‘From Clinical Report to Clinical Story: Two Ways of
Writing About A Medical Case.’ in E. Ventola (ed.) 1991: 339-368
Francis G. and SINCLAIR J. 1994. ‘I Bet He Drinks Carling Black Label. A Riposte to
Owen on Corpus Grammar.’ in Applied Linguistics Vol.15/2: 188-200
Fuller G. ‘Cultivating Science: Negotiating Discourse in the Popular Texts of Stephen Jay
Gould’. In J. R. Martin , R. Veel (eds). 1998. Reading Science: Critical and Functional
Perspectives on Discourses of Science. London: Routledge. 35-62.
Gadamer H.G. 1976. ‘On the Scope and Function of Hermeneutical Reflection.’ in D.E.
Linge (ed. and Trans.) Philosophical Hermeneutics University of California Press.
Gerbert M. 1970. Besonderheiten der Syntax in der Technischen Fachsprache des
Englischen Berlin: Halle.
Gerson S. 1989. ‘From ...to as an Intensifying Collocation.’ in English Studies Vol. 70: 360-
371
Gibson T.R. 1992. ‘Towards a Discourse Theory of Abstracts and Abstracting.’ Unpublished
Ph.D. Thesis, English Language Department: Nottingham
Gibbons J. 1994. Language and the Law. London: Addison Wesley.
Gläser, R. 1989. ‘Gibt Es Eine Fachsprachenphraseologie?’, in Fachsprache - Fremdsprache
- Muttersprache, VIIth International Conference ‘Angewandte Sprachwissenschaft Und
Fachsprachliche Ausbildung’: Technische Universität Dresden
Gläser R. 1991. ‘The LSP Genre Abstract - Revisited.’ in ALSED - Newsletter Vol. 13/4: 3-
11
Gläser R. 1992. ‘A Multi-Level Model for a Typology of LSP Genres.’ in Fachsprache Vol.
15/1-2: 18-26
289
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Gläser R. 1998. ‘The Stylistic Potential of Phraseological Units in the Light of Genre
Analysis’. In A. P. Cowie (ed.): 125-143.
Gledhill C. 1995a. ‘Collocation and Genre Analysis. ‘ In Zeitschrift für Anglistik und
Amerikanistik Vol. 1:11-36
Gledhill C. 1995b. ‘Scientific Innovation and the Phraseology of Rhetoric. Posture,
Reformulation and Collocation in Cancer Research Articles.’ PhD thesis, University of
Aston.
Gledhill C. 1996. ‘Science as a Collocation. Phraseology in Cancer Research Articles’. in
Botley S., Glass J, McEnery T and Wilson A (eds.) Proceedings of Teaching and
Language Corpora 1996. Lancaster. UCREL Technical Papers Volume 9: 108-126.
Gledhill C. 1997. ‘Les collocations et la construction du savoir scientifique.’ in Martin J.
Anglais de Spécialité (ASp). No. 15-18 :85-104.
Gledhill C. 1999. ‘Towards a phraseology of English and French’. In C. Beedham (ed.)
Language and Parole in Synchronic and Diachronic Perspective. Proceedings of
Societas Linguistica Europaea XXXI. Oxford: Pergamon: 221-37.
Gledhill C. (forthcoming) ‘The phraseology of rhetoric, collocations and discourse in cancer
research abstracts’ in C. Barron and N. Bruce (eds.) . ‘Knowledge and Discourse’
Proceedings of the International Multidisciplinary Conference. Hong Kong: 18-21 June
1996. University of Hong Kong, Hong Kong. April 1999.
Gnutzmann L. and Oldenburg H. 1992. ‘Contrastive Text Linguistics in LSP Research:
Theoretical Considerations and Some Preliminary Findings.’ in Schneider (ed.): 103-136
Godley T. 1993. ‘Terminological Principles and Methods in the Subject Field of Chemistry’
in B. Sonneveld and Loening (eds.): 141-163
Godman A. and Payne E.M.F. 1981 ‘A Taxonomic Approach to the Lexis of Science.’ in
Selinker et al. (eds.) 23-39
Gopnik M. 1972. Linguistic Structures in Scientific Text Den Haag: Mouton
Grätz N 1985. ‘Teaching EFL Students to Extract Structural Information from Abstracts.’ in
J.M. Kline and A.K. Pugh (eds.) Reading for Professional Purposes: Methods and
Materials in Teaching Languages: 225-335
Granger S. 1998. ‘Prefabricated Patterns in Advanced EFL Writing: Collocations and
Formulae’. In Cowie A. (ed) 1998: 1-21.
Grice H.P. 1975. ‘Logic and Conversation’ in P. Cole and J.Morgan (eds.) Syntax and
Semantics III New York: Academic Press
Guba E.G. and Lincoln Y.S. 1982. ‘Epistemological and Methodological Bases of
Naturalistic Inquiry’ in Educational Communication and Technology Journal Vol. 30/4:
233-252
Gunawardena C.N. 1989. ‘The Present Perfect in the Rhetorical Divisions of Biology and
Biochemistry Journal Articles.’ in English for Specific Purposes Vol. 8/3: 265-273.
Halliday MA.K. 1961. Categories of the Theory of Grammar. Department of English
Language and General Linguistics Monographs. (Pp241-292). Edinburgh: Edinburgh
University Press.
Halliday M.A.K. 1966. ‘Lexis As A Linguistic Level’ in Bazell et al. (eds.) 1966 in Memory
of J.R.Firth London: Longman
Halliday M.A.K. 1976. ‘Functions and Universals of Language.’ in G. Kress (ed.) 1976
Halliday: System and Function in Language London: Oxford University Press
Halliday M.A.K. 1977. ‘Language Structure and Language Function.’ in J.Lyons (ed.) 1977
New Horizons in Linguistics Harmonsworth: Penguin Books
Halliday M.A.K. 1985 Introduction to Functional Grammar London: Edward Arnold
Halliday M.A.K. 1988. ‘On the Language of Physical Science.. In M.Ghadessy 1988: 162-
177
290
Christopher Gledhill (2000). Collocations in Science Writing.
291
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
292
Christopher Gledhill (2000). Collocations in Science Writing.
Koch C. 1991. ‘On the Benefits of Interrelating Computer Science and the Humanities: the
Case of Metaphor.’ in Computers and the Humanities Vol. 25: 289-295
Kouřilova M. (Forthcoming) ‘Interactive Functions of Language in Peer Reviews of Medical
Papers Written By Non-Native Speakers of English’ Unpublished MS.
Kretzenbacher H.L. 1990. Rekapitulation: Textstrategien der Zusammenfassung von
Wissenschaftlichen Fachtexten Tübingen: Gunter Narr Verlag
Krishnamurthy R. 1987. ‘The Process of Compilation.’ in J.McH. Sinclair (ed.) 1987: 62-85
Kučera H. and Francis W. N. 1967. Computational Analysis of Present Day American
English Providence: Brown University Press
Lackstrom S., Selinker L. and Trimble L. 1972. ‘Grammar and Technical English.’ in
English Teaching Forum Sept-Oct.: 3-14
Lackstrom S., Selinker L. and Trimble L. (eds.) 1973. ‘Technical Principles and
Grammatical Choice.’ in TESOL Quarterly¨Vol. 7: 127-136
Latour B. and Woolgar S. 1986. Inside the Laboratory. the Construction of Scientific Facts
New York: Garland Press
Lakoff G. 1987. Women, Fire and Dangerous Things. What Categories Reveal about the
Mind. University of Chicago Press: California
Leech G. 1991. ‘The State of the Art in Corpus Linguistics.’ in K. Aijmer and B. Altenberg
1991: 8-29
Leech G. 1992. ‘Corpora and Theories of Linguistic Performance.’ in J. Svartvik (ed) 1992a:
105-125
Leech G. and Fligelstone S. 1992. ‘Computers and Corpus Linguistics.’ in C. Butler (ed.):
115-140
Lehrberger J. 1982. ‘Automatic Translation and the Concept of Sublanguage.’ in R.
Kittredge and J. Lehrberger (eds.) Sublanguage: Studies of Language in Restricted
Semantic Domains, Berlin: Walter De Gruyter: Chapter 3.
Lemke J.L.1991. ‘Text Production and Dynamic Text Semantics.’ in E. Ventola (ed.) 1991:
23-37
Lemke J. L. 1998 ‘Multiplying Meaning. Visual and Verbal Semiotics in Scientific Text’. In
J. R. Martin, R. Veel (eds) 1998 Reading Science: Critical and Functional Perspectives
on Discourses of Science. London: Routledge. 87-113.
Lévi-Strauss C. 1962 La Pensée Sauvage Paris: Plon
Liddy E., Bonzi S., Katzer J., and Oddy E. 1987. ‘A Study of Discourse Anaphora in
Scientific Abstracts.’ in Journal of the American Society for Information Science Vol.
38: 255-261
Linstromberg S. 1991. ‘Metaphor and ESP: A Ghost in the Machine? English for Specific
Purposes Vol. 10/3: 207-225
Ljung M. 1991. ‘Swedish TEFL Meets Reality.’ in S. Johansson and B. Stenström (eds.):
245-256
Love A. 1993. ‘Lexico-Semantic Features of Geology Textbooks’. In English for Specific
Purposes Vol.12/3: 197-218
Louw B. 1993. ‘Irony in the Text Or Insincerity in the Writer? the Diagnostic Potential of
Semantic Prosodies.’ in Baker et al. (eds.) 1993: 157-176
Luhn H.P. 1968. ‘Key-Word-in-Context Information Index for Technical Literature.’ in C.K.
Schultz (ed.) H.P.Luhn: Pioneer of Information Sciences: Selected Works New York:
Spartan
Lundquist L. 1992. ‘Some Considerations on the Relations Between Text Linguistics and the
Study of Text for Specific Purposes.’ in Schröder (ed.): 231-243
Lundquist L. 1989. ‘Coherence in Scientific Text.’ in W. Heydrich et al. (eds.): 122-149
293
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Luzon-Marco, M.J. 1999. ‘Corpus Analysis and Pragmatics: A Study of the Negative
Structure Fail to’ in ITL-Review of Applied Linguistics. 123/124: 37-55
Lyne A.A. 1975 ‘A Word-Frequency Count of French Business Correspondence.’ in IRAL
Vol. 13/2: 95-110
Lyne A. A. 1983. ‘Word Frequency Counts: Their Particular Reference to the Description of
Languages for Special Purposes and A Technique for Enhancing Their Usefulness’. In
Nottingham Linguistic Circular. 12/2: 130-140.
McCarthy M. 1984. ‘A New Look at Vocabulary in EFL’. In Applied Linguistics 5/1: 12-21.
McCarthy M. and Carter R. 1994. Language As Discourse. Perspectives for Language
Teaching New York: Longman
McEnery T. and Wilson A. 1996. Corpus Linguistics Edinburgh University Press:
Edinburgh
McKinlay J. 1983. ‘An Analysis of the Discussion Section of Medical Journal Articles.’
Unpublished MSc Thesis. ESP Collection, Language Studies Unit, Aston University
McKinney M. 1991. ‘Experimenting on and Experimenting with: Polywater and
Experimental Realism.’ British Journal of the Philosophy of Science Vol. 42: 295-307
Makkai A. 1992. ‘The Challenge of the Virtual Dictionary and the Future of Linguistics.’ in
International Journal of Lexicography Vol. 5/4: 252-269
Malcolm L. 1987. ‘What Rules Govern Tense Usage in Scientific Articles?’ in English for
Specific Purposes Journal Vol. 6/1: 31-43
Malinowski B. 1923. ‘The Problem of Meaning in Primitive Languages.’ Supplement to
C.K. Ogden and I.A.Richards (eds.) the Meaning of Meaning New York: Harcourt Brace
Jovanovich
Martin J.R. 1989. Ideation: the Company Words Keep Cambridge: Cambridge University
Press
Martin J.R. 1991. ‘Nominalization in Science and Humanities: Distilling Knowledge and
Scaffolding Text.’ in E. Ventola (ed.) 1991: 307-337
Master P. 1987. ‘Generic the in Scientific American’. In English for Specific Purposes Vol.
6/3: 165-186
Master P. 1991. ‘Active Verbs with Inanimate Subjects in Scientific Prose.’ in English for
Specific Purposes Vol. 10/1: 15-33
Mauranen A. 1993. ‘Theme and Prospection in Written Discourse.’ Baker et al. (eds.) 1993:
95-114
Mel’čuk I. 1995. ‘Phrasemes in Language and Phrasemes in Linguistics’. In Everaert et al.
(eds.): 167-232.
Mel’čuk I. 1998. ‘Collocations and Lexical Functions’ in Cowie (ed).: 23-54.
Meijs W. (ed.). 1987. Corpus Linguistics and Beyond Amsterdam: Rodopi
Meijs W. 1992. ‘Computers and Dictionaries’ in C. Butler (ed.): 141-165
Meyer P.G. 1988. ‘Statistical Text Analysis of Abstracts: A Pilot Study on Cohesion and
Schematicity.’ in Computer Corpora Des Englishen Vol. 3: 17-40
Miall D.S. 1992. ‘Estimating Changes in Collocations of Key Words across A Large Text: A
Case Study of Coleridge’s Notebooks.’ in Computers and the Humanities Vol. 26: 1-12
Moon R. E. 1987. ‘The Analysis of Meaning.’ in J. McH. Sinclair (ed.) 1987: 86-103.
Moon R. E. 1992. ‘There Is Reason in the Roasting of Eggs. A Comparison of Fixed
Expressions in Native Speaker Dictionaries.’ in Euralex ‘92 Proceedings Oxford
University Press: 493-502
Moon R.E. 1994. ‘The Analysis of Fixed Expressions in Text’. In M. Coulthard (ed). Pp117-
135.
Moon, R.E. 1998a. Fixed Expressions and Idioms in English: A Corpus-Based Approach.
(Oxford Studies in Lexicography and Lexicology) Oxford: Oxford University Press.
294
Christopher Gledhill (2000). Collocations in Science Writing.
Moon R.E. 1998b. ‘Frequencies and Forms of Phrasal Lexemes in English’. In A. P. Cowie
(ed).: 79-100.
Moskovitch G.M. and Caplan A. 1979. ‘Distributive Statistical Techniques in Linguistic and
Literary Research. “ in D.E.Ager, F.E. Knowles and J. Smith (eds.): 245-263
Muller C. 1968. Essai de Statistique Léxicale Paris: Librairie Klincksieck
Muller C. 1977. Principes et Méthodes de Statistique Léxicale Paris: Hachette Université
Myers G. 1989. ‘The Pragmatics of Politeness in Scientific Articles.’ in Applied Linguistics
Vol. 10 / 1: 1-35
Myers G. 1990. Writing Biology: Texts in the Social Construction of Scientific Knowledge
Milwaukee: University of Wisconsin Press
Myers G. 1991. ‘Lexical Cohesion and Specialized Knowledge in Science and Popular
Science Texts.’ in Discourse Processes Vol. 14/1: 1-26
Myers G. 1992. ‘Textbooks and the Sociology of Scientific Knowledge.’ in English for
Specific Purposes Vol. 11: 3-17
Nattinger J.R. and DeCarrico 1992. Lexical Phrases and Language Teaching Oxford: Oxford
University Press
Nattinger J.R. and DeCarrico 1989. ‘Lexical Acts and Teaching Conversation.’ in
Vocabulary Acquisition: AILA Review 6: 118-139
Nwogu K.N. 1989. ‘Discourse Variation in Medical Texts: Schema, Theme and Cohesion in
Professional and Journalistic Accounts.’ Unpublished Phd. Thesis, Language Studies
Unit, Aston University.
Nwogu K. N. and Bloor T. 1991. ‘Thematic Progression in Professional and Popular Medical
Texts.’ in Ventola (ed) 1991: 369-384
Nystrand M. 1982. What Writers Know. The Language, Process and Structure of Written
Discourse New York: Academic Press
Nystrand M. 1986. The Structure of Written Communication: Studies in Reciprocity Between
Writers and Readers Orlando Fl.: Academic Press
Oakes M. 1996. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Oppenheim R. 1988. ‘The Mathematical Analysis of Style: A Correlation-Based Approach.’
in Computers and the Humanities Vo.22: 241-253
Oster S. 1981. ‘The Use of Tenses in Reporting Past Literature in EST.’ in English for
Academic and Technical Purposes: Studies in Honour of Louis Trimble L. Selinker, E.
Tarone and V. Hanzeli (eds.), Massachussets: Newbury House: 76-90
Papegaaij and Schubert R. 1988. A Corpus-Based Bilingual Knowledge Bank for
Distributed Language Translation DLT Publications Amsterdam.
Pavel S. 1993a. ‘Neology and Phraseology as Terminology-in-the-Making.’ in H.B.
Sonneveld and K.L.Loening (eds.) 1993: 21-34
Pavel S. 1993b. ‘La Phraséologie en Langue de Spécialité. Méthodologie de Consignation
dans les Vocabulaires Terminologiques.’ Unpublished MS, Secrétariat d’État du Canada:
Direction de la Terminologie et des Services Linguistiques.
Pavel S. and Boileau P. 1994. Systèmes Dynamiques et Imagerie Fractale. Vocabulaire
Français-Anglais. Secrétariat d’État Du Canada: Direction De La Terminologie Et Des
Services Linguistiques. Canada
Pawley A. and Syder F.H. 1983. ‘Two Puzzles for Linguistic Theory: Naturelike Selection
and Naturelike Fluency.’ in Richards and Schmidt (eds.) 1985 Language and
Communication London: Longman: 191-226.
Pearson J. 1998. Terms in Context. Amsterdam: John Benjamins.
Pettinari C. 1982. ‘The Function of A Grammatical Alteration in 14 Surgical Reports.’ in W.
Frawley (ed.) 1982: 145-183.
295
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
297
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
Sinclair J. McH., Jones S. and Daley R. 1969. English Lexical Studies. UB Report for the
Office of Science and Technology Information.
Smadja F. 1993. ‘Retrieving Collocations from Text: Xtract.’ in Computational Linguistics
Vol19/1: 143-177
Smadja F. 1996. ‘Translating Collocations for Bilingual Lexicons: A Statistical Approach’.
In Computational Linguistics 22/1 Pp1-38.
Sonneveld H.B. and Loening K.L. (eds.) 1993. Terminology. Applications in
Interdisciplinary Communication. John Benjamins: Amsterdam
Souter C. 1990. ‘Systemic-Functional Grammars and Corpora.’ in Aarts and Meijs (eds.)
1990: 179-211
Sparck-Jones K. 1971. Automated Keyword Classification for Information Retrieval London:
Butterwoth
Stubbs M. 1982. ‘Written Language and Society: Some Particular Cases and General
Observations.’ in M. Nystrand (ed.) 1992: 31-55
Stubbs M. 1987. ‘An Educational Theory of (Written) Language.’ in T. Bloor and J. Norrish
(eds.) BAAL 2: Papers from the Annual Meeting of the British Association for Applied
Linguistics London, CILT: 3-38
Stubbs M. 1993. ‘British Traditions in Text Analysis. from Firth to Sinclair.’ in M. Baker et
al. (eds.) 1993 1-33
Stubbs M. 1994. ‘Grammar, Text and Ideology: Computer-Assisted Methods in the
Linguistics of Representation’. In Applied Linguistics Vol.15/2: 201-223
Stubbs M. 1996. Text and Corpus Analysis Routledge: London.
Svartvik J. (ed.) 1992a. Directions in Corpus Linguistics Proceedings of the Nobel
Symposium 82: Stockholm 4-8 August 1991.
Svartvik J. 1992b. ‘Corpus Linguistics Comes of Age.’ : 7-13 in J. Svartvik 1992a
Svartvik J. 1993. ‘Lexis in English Language Corpora.’ in Zeitschrift Für Anglistik Und
Amerikanistik Vol. XLI: 1/1: 13-31
Swales J. 1981a. Aspects of Article Introductions Aston ESP Research Report No.1,
Language Studies Unit: Aston University
Swales J. 1981b. ‘Definitions in Science and Law: A Case for Subject Specific ESP
Materials.’ in Fachsprache Vol. 81/3: 106-112
Swales J. 1981c. ‘The Function of One Type of Particle in A Chemistry Textbook.’ in
Selinker et al. (eds.): 40-52
Swales J. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge:
Cambridge University Press.
Swales J. 1998. Other Floors, Other Voices. A Textography of A Small University Building.
Mahwah, N.J. Lawrence Erlbaum.
Swales J. and Najjar H. 1987. ‘The Writing of Research Article Introductions.’ in Written
Communication Vol. 4: 175-192
Tarasova T. 1993. ‘Non-Verbal Elements in Scientific Text.’ Unpublished Ph.D. Thesis,
Language Studies Unit, Aston University.
Thomas P. 1993. ‘Choosing Headwords from LSP Collocations for Entry Into A
Terminology Data Bank (Term Bank).’ in Sonneveld H.B. and Loening K.L. (eds.) 1993:
46-68.
Thomas H. and Waxman J. 1995. ‘Oncogenes and Cancer.’ in J. Waxman and K. Sikera
(eds.) the Molecular Biology of Cancer: 1-17.
Thompson G. and Yiyun Y. 1991. ‘Evaluation in the Reporting Verbs Used in Academic
Papers.’ in Applied Linguistics Vol. 12/4: 365-382
Traugott E. and Heine H. 1991. Approaches to Grammaticalisation. Vol. II. Amsterdam:
John Benjamins.
298
Christopher Gledhill (2000). Collocations in Science Writing.
Ure J. 1971. ‘Lexical Density and Register Differentiation.’ in G. E. Prerren and J.L.M. Trim
(eds.) Applications of Linguistics Cambridge: Cambridge University Press
Van Der Wouden T. 1997. Negative Contexts. Collocation, Polarity and Multiple Negation.
Routledge: London.
Van Dijk T. 1979. Macrostructures: An Interdisciplinary Study of Global Structures in
Discourse Hillsdale New Jersey: Lawrence Erlbaum
Van Dijk T. and Kintsch W. 1983. Strategies of Discourse Comprehension New York;
Academic Press
Van Dijk T. and Kintsch W. 1978. ‘Cognitive Psychology and Discourse: Recalling and
Summarizing Stories. “ in W. Dressler (ed.) Current Trends in Textlinguistics. Berlin: De
Gruyter.
Van Halteren H. 1994. ‘Syntactic Databases in the Classroom.’ in Wilson and McEnery
(eds.): 17-28
Van Roey J. 1990. French-English Contrastive Lexicology: An Introduction. Louvain-La-
Neuve: Peeters.
Varttala T. 1999. ‘Remarks on the Communicative Function of Hedges in Popular Scientific
and Specialist Research Articles.’ in English for Specific Purposes. 18/2: 177-200.
Ventola E. (ed.) 1991. Functional and Systemic Linguistics: Approaches and Uses Den
Haag: Mouton De Gruyter
Ventola E. and Mauranen A. 1991. ‘Non-Native Writing and Native Revising of Scientific
Articles.’ in E. Ventola (ed.): 457-492
Verschueren J. 1999. Understanding Pragmatics. London: Arnold.
Vidalenc J-L. 1997. ‘Quelques remarques sur l’emploi de la métaphore comme outil de
dénomination dans un corpus d’histoire des sciences.’ in Boisson C. and Thoiron P. (eds.)
1997. La Dénomination. Paris: Presses Universitaires De Lyon.: 1-11.
Vossen P., den Broeder M. and Meijs W. 1986. ‘The LINKS Project: Building A Semantic
Database for Linguistic Applications.. In Aarts and Meijs (eds.) 1986: 277-293
Weil B.H., Zarember I. and Owen H. 1963. ‘Technical Abstracting Fundamentals. Part II.
Writing Principles and Practices.’ in Journal of Chemical Documentation Vol. 3/1: 125-
132
West G.K. 1980. ‘That-Nominal Constructions in Traditional Rhetorical Divisions of
Scientific Research Papers.’ in TESOL Quarterly Vol. 14: 483-489
Wikberg K. 1990. ‘Topic, Theme and Hierarchial Structure in Procedural Discourse.’ in J.
Aarts and W. Meijs (eds.) 1990: 281-254
Wilbur W.J. and Sirotkin K. 1992. ‘The Automatic Identification of Stop Words.’ in Journal
of Information Science Vol. 18/1: 45-55
Williams I. 1996. ‘Ifs and Buts. Impact Factors of Journals may Affect Decisions on
Resource Allocation’. In Chemistry in Britain, February 1996: 31-33
Williams I. A. 1996. ‘A Contextual Study of Lexical Verbs in Two Types of Medical
Research Article.’ in English for Specific Purposes. Vol 15/3: 175-198.
Willis D. 1990. the Lexical Syllabus London: Collins ELT
Willis D. 1993. ‘Grammar and Lexis: Some Pedagogical Implications.’ in Sinclair et al.
(eds.) 1993: 83-93
Wilson A. and McEnery T. (eds) 1994. Corpora in Language Education and Research: A
Selection of Papers from Talc94. UCREL Technical Papers 4., Lancaster University.
Wingard P. 1981. ‘Some Verb Forms and Functions in Six Medical Texts.’ in L. Selinker, E.
Tarone and V. Hanzeli (eds.) English for Academic and Technical Purposes: Studies in
Honour of Louis Trimble: 53-64
Winter E. 1977. ‘A Clause Relational Approach to English Texts: A Study of Some
Predictive Lexical Items in Written Discourse’. In Instructional Science. Vol. 6/1:1-92.
299
Language in Performance Series No. 22, Tübingen, Gunter Narr Verlag, 270pp.
300