Document
Document
AUTHOR ""
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
General Editor
Elena Tognini-Bonelli
Consulting Editor
Wolfgang Teubert
Advisory Board
Volume 9
Using Corpora to Explore Linguistic Variation
Edited by Randi Reppen, Susan M. Fitzmaurice and Douglas Biber
Using Corpora
to Explore
Linguistic Variation
Edited by
Randi Reppen
Susan M. Fitzmaurice
Douglas Biber
Northern Arizona University
AUTHOR ""
"rep">
"poo">
"far">
"car">
"ken">
"haa">
"oak">
"cor">
"mey">
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Table of contents
Introduction vii
Part I
Exploring variation in the use of linguistic features
1. Cross-disciplinary comparisons of hedging: Some findings from
the Michigan Corpus of Academic Spoken English 3
Deanna Poos and Rita Simpson
2. Would as a hedging device in an Irish context: An intra-varietal
comparison of institutionalised spoken interaction 25
Fiona Farr and Anne O’Keeffe
3. Good listenership made plain: British and American
non-minimal response tokens in everyday conversation 49
Michael McCarthy
4. Variation in the distribution of modal verbs in the British
National Corpus 73
Graeme Kennedy
5. Strong modality and negation in Russian 91
Ferdinand de Haan
6. Formulaic language in English academic writing: A corpus-based
study of the formal and functional variation of a lexical phrase in
different academic disciplines 111
David Oakey
7. Lexical bundles in Freshman composition 131
Viviana Cortes
8. Pseudo-Titles in the press genre of various components of the
International Corpus of English 147
Charles F. Meyer
< /R/TREARGET
E FF
"hun">
"chan">
"cso">
"fit">
"gei">
"index">
"toc">
vi Table of contents
Part II
Exploring dialect or register variation
10. Syntactic features of Indian English: An examination of written
Indian English 187
Chandrika K. Rogers (formerly Balasubramanian)
11. Variation in academic lectures: Interactivity and level of
instruction 203
Eniko Csomay
Part III
Exploring historical variation
12. The textual resolution of structural ambiguity in
eighteenth-century English: A corpus linguistic study of patterns
of negation 227
Susan M. Fitzmaurice
13. Investigating register variation in nineteenth-century English: A
multi-dimensional comparison 249
Christer Geisler
Index 273
<TARGET "rep" DOCINFO
AUTHOR ""
TITLE "Introduction"
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Introduction
lexical expressions that occur with high frequencies in academic writing, what
she refers to as ‘lexical bundles’. Specifically, Cortes focuses on the use of lexical
bundles in freshman composition essays, contrasting these patterns of use with
those found in published academic writing and in conversation. Interestingly,
Cortes finds that freshman composition texts often use the same structural
lexical bundles found in academic prose, but that they are often used with
different functions.
Finally, the study of variation in the use of linguistic features can be
extended to grammatical constructions. The papers by Meyer and Hunston
illustrate studies of this type. Chuck Meyer focuses on variation in the use of
pseudo-titles, comparing the patterns of use in newspapers from the different
subcorpora of the International Corpus of English (ICE). Despite the British-
based stigma associated with the use of pseudo-titles in the press, Meyer finds
that their use appears to be increasing in frequency as well as becoming increas-
ingly formulaic, both in Britain and the US and in the journalistic discourse of
English in East Africa, New Zealand, the Philippines and Jamaica. Susan
Hunston takes a broader lexico-grammatical perspective, referred to as ‘pattern
grammar’, which studies the transitivity and complementation characteristics
of particular verbs. In addition, Hunston discusses how corpus-driven analyses
can be applied for pedagogical purposes in language teaching, especially for
illustrating both the patterns and the range of variation that students will
encounter in natural language.
The two papers in Part 2 of the book — ‘Exploring dialect and register
variation’ — were carried out to describe salient characteristics of dialects or
registers. These papers also focus on particular linguistic features, but the
primary research goal is to identify salient characteristics of the dialect/register,
rather than description of the linguistic feature itself.
The paper by Chandrika Rogers is an example of a dialect study. Rogers
examines a range of linguistic features in written Indian English registers to
investigate the extent to which Indian English differs from British and Ameri-
can English in its syntax as well as its lexis and phonology. Her corpus findings
indicate that written Indian English does not appear to differ markedly from
other standard varieties in its syntax, but concludes that this research needs to
be extended to spoken Indian English to investigate the issue further.
In contrast, the paper by Eniko Csomay illustrates a corpus-based study of
register variation. Csomay focuses on the linguistic description of academic
lectures, a hybrid register with a marked informational function coupled with
the restrictions of real-time production circumstances. Csomay considers a
</TARGET "rep">
Introduction xi
References
Biber, D., Conrad, S. & Reppen, R. 1998. Corpus linguistics: Exploring language
structure and use. Cambridge: Cambridge University Press.
Biber, D. & Reppen, R. 2002. What does frequency have to do with grammar
teaching? Studies in Second Language Acquisition, 24, 2 199–208.
<TARGET "p1"
</TARGET "p1">DOCINFO
AUTHOR ""
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Part I
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 1
1. Introduction
The linguistic strategy known as hedging has received a good deal of attention
over the past twenty-five years, especially by scholars interested in language and
gender. In her 1975 work Language and Woman’s Place, which started a boom
in research on language and gender, Robin Lakoff listed hedges as one of nine
qualities particular to feminine speech, viewing them as expressions of defer-
ence. In this tradition, Holmes (1986, 1998) has successfully challenged the
association of hedging with powerlessness, focusing instead on its politeness
functions, but she has nevertheless continued to find support for the notion
that hedging is more characteristic of women’s language then men’s. In
experimental studies, however, Meyerhoff (1992) and Dixon and Foster (1996)
failed to find significant gender differences in hedge usage, leading the latter to
claim that “if they do exist, gender differences in hedging are subtle and subject
to marked variation across speakers and contexts of use” (1996, 90). Our data,
taken from a corpus of natural interaction, support this claim. Like that of
Dixon and Foster, our research is in part a reaction against the “wanton
frequency counts and generalizations about women’s subordination in conver-
sation” that have plagued the field (p. 95).
In addition to the gender-related research, there have also been numerous
studies of hedging in written discourse from English for Academic Purposes
scholars (Crompton 1997, 1998; Hyland 1996, 1998a, 1998b, 2000; Kreutz and
Harres 1997; Markkanen and Schröder 1997; Salager-Meyer 1994; Varttala
1999; Vassileva 1997). Hyland in particular has written extensively on the
4 Deanna Poos and Rita Simpson
2. The Corpus
The data for this analysis come from the Michigan Corpus of Academic Spoken
English (MICASE) (Simpson, et al. 1999). This corpus has been under develop-
ment at the University of Michigan’s English Language Institute for the past two
years, and at the time of this research consisted of almost 900,000 words and
over 100 hours of recordings — a little over half the target size of 1.5 million
words. Some of those transcripts are currently available on a searchable Web
site (http://www.hti.umich.edu/m/micase), and by mid-2002, all the transcripts
from the corpus will become publicly available at that location. The approxi-
mately 90 transcripts currently transcribed in MICASE include both classroom
speech events, such as lectures, discussion sections, seminars, and labs, as well
as non-classroom speech events, such as office hours, research group meetings,
and advising sessions, to name a few. With the exception of those speech events
that are not related to a specific discipline, such as undergraduate advising or
staff meetings, all of the speech events are classified according to one of the
graduate school’s four academic divisions; that is, biological and health sciences,
physical sciences and engineering, social sciences and education, and humani-
ties and arts.1
For this study, we relied primarily on those transcripts that were classified
into one of these four academic divisions, omitting non-discipline-specific
speech events. That subcorpus consisted of 64 speech events, totaling 722,423
words. Table 1 shows the distribution of speech event types for all four divi-
sions. Because the corpus is still under development, at this stage the four
academic divisions do not contain equal numbers of each speech event type or
equal numbers of total words. Looking at the total word counts for each
division, we see that the biological and health sciences division is significantly
smaller, and thus presumably not as representative, with only 122,762 words,
compared to the other three, each with close to, or over, 200,000 words. In
addition, the social sciences division contains more large lectures or colloquia
than any of the other three divisions.
However, the breakdown by speech event is not an entirely reliable indica-
tor of comparability of the speech, since some lectures can be quite interactive
compared to others and some discussion sections and seminars can be largely
monologic. A better indicator of comparability for these four subcorpora of
academic divisions is a category assigned to each event in MICASE called the
‘primary discourse mode’. There are four discourse mode categories, which for
our purposes can be condensed into three: monologic, interactive, and mixed.
6 Deanna Poos and Rita Simpson
Table 1.Breakdown of Speech Events and Word Counts, Subcorpus 1: 64 speech events
across four academic divisions
hum/arts social sci bio/ health phys sci/
sci engin
Speech Event S W S W S W S W
Type
Advising 1 8,010
Interview 1 4,869
Discourse S W % S W % S W % S W %
Mode
divisions have higher percentages of interactive speech than the other two (51%
and 45%, compared to 36% and 22%). In spite of the unequal distribution of
speech types, this corpus is still useful for preliminary research into the primary
research questions, as long as these imbalances are kept in mind. So, for
example, if we assume that the hedges kind of and sort of are likely to be more
prevalent in interactive speech,2 then we would expect to see more hedges in the
biological and physical sciences and fewer in the social sciences. As we will see
below, however, these trends are exactly the opposite.
To investigate the gender variable, we had to use a smaller subcorpus.
Because we cannot at this point automatically identify the speaker of each token
in the quantitative analysis unless there is only one speaker in the transcript or
all the speakers are of a single sex, we limited the gender comparison to primari-
ly single-speaker monologic speech events such as lectures, including some
non-discipline-specific events. Unfortunately, the corpus at present does not yet
contain enough monologic speech events in all of the four academic divisions
to compare male versus female speakers within each division. As shown in
Table 3, the subcorpus of male speakers includes five speech events from the
physical sciences, two from the biological sciences, four from the social sciences,
four from the humanities, and one non-disciplinary event. The subcorpus of
female speakers includes two from the physical sciences, two from biology, six
from the social sciences, one from the humanities, and two non-disciplinary
events. In sum, we have investigated gender differences by comparing a
subcorpus of 13 female speech events (146,738 words) with a subcorpus of 16
male speech events (160,077 words).
8 Deanna Poos and Rita Simpson
Table 3.Breakdown of speech events and word counts, subcorpus 2: 13 female and 16
male speech events
hum/arts social sci bio/ health phys sci/ non-disci- total
sci engin plinary
Gender S W S W S W S W S W S W
3. Quantitative findings
In the following section, we present the results of the quantitative portion of the
study. In calculating the frequency of kind of and sort of tokens, we did not
include tokens that are synonymous with type of, editing out clear non-hedging
examples such as:
that’s not the kind of mall that people socialize in
you can find them by the sort of reasoning I’ve been doing
Instances of kind of and sort of that are ambiguously synonymous with type of,
but are part of a noun phrase that conveys inexactitude, remain in the data set.
Thus examples such as:
any real coherent sort of way
that kind of thing
this kinda stuff
I need some sort of verb (to govern the infinitive)
are counted as hedges. Although weeding out the non-hedging sense of the
words seems like an obvious first step for this kind of study, none of the previous
researchers in the studies we found mentions doing this, which is further
evidence of the problem of “wanton frequency counts” mentioned earlier.
Figure 1 summarizes the results comparing male and female speakers. The
points in the graph represent the overall frequencies of kind of and sort of for
males versus females in subcorpus two, calculated as a rate per 1,000 words.
These results show little or no difference in the kind of/sort of frequencies for the
total subcorpus of males versus females, and for the social science and physical
science divisions. Of those two divisions, the social science division has the most
balanced number of words and transcripts for both genders, therefore providing
Cross-disciplinary comparisons of hedging 9
Female Speakers
Male Speakers
4.5
4
Frequency/1000 Words
3.5
3 3.12
2.43 2.68
2.5
2.26
2 2.13 2.27
1.38 1.76 2.01
1.5
1.33
1
0.5
0
Phys Bio/Hlth Soc Hum/Arts Total
Sci/Engin Sci Sci/Ed
Academic Division
Figure 1.Frequency of kind of/sort of Hedges for Female vs. Male Speakers
a more reliable basis for comparing gendered use of the phrases than the
physical sciences. That is to say, if female speakers are underrepresented, the
frequency of hedging is more likely to be attributable to individual speaker
variation. The other two divisions — biological science and humanities/arts —
show opposite tendencies; i.e., female speakers have a higher hedging frequency
in the biological sciences, and male speakers have a higher frequency in the
humanities. These numbers cannot necessarily be interpreted as strong evidence
for those trends, because of the relatively small word counts and number of
speakers being compared. However, based on the minimal differences in both
the overall hedging frequencies (2.26 for males versus 2.01 for females), and the
frequencies in the more balanced social sciences subset (1.38 versus 1.33), this
limited data set for gender comparisons definitely does not show a correlation
between gender and frequency of hedging in academic speech.
Now we turn to the frequency counts for subcorpus one comparing the
four academic divisions. As illustrated in Figure 2-a, these results show a clear
trend: hedging frequencies are lowest in the physical sciences, slightly higher in
the biological sciences, highest in the social sciences and second highest in the
humanities. The only surprising finding here is that the average frequency for
10 Deanna Poos and Rita Simpson
the social sciences is higher than for the humanities. However, we note here that
the social science subcorpus is anomalous in one respect: one speaker from the
longest transcript, an anthropology office hours session, uses an extremely high
number of sort of/kind of tokens — 12.01/1,000. (In order to speculate on
interactive motivations for sort of/kind of use, we look more closely at this
speaker’s hedging strategies in Section 5.) Therefore, we recalculated the
numbers without that transcript, resulting in a much lower average frequency
for the social sciences, 2.66/1,000, which is slightly higher than the average for
the biological sciences, as shown in Figure 2-b. Omitting this outlier from the
picture, we see a definite increase in hedging frequencies going from left to
right, along the continuum of hard to soft disciplines, with the humanities
disciplines at the top.
4.5
4 4.1
3.73
Frequency/1000 Words
3.5
3 2.94
2.5 2.56
2
1.5 1.36
1
0.5
0
Phys Bio/Hlth Sci Soc Sci/Ed Hum/Arts Total (All
Sci/Engin Divisions)
Academic Division
Figure 2a.Frequency of kind of/sort of Hedges Across Academic Divisions
Figure 3 shows the range of hedging frequencies for each individual speech
event across the disciplines. This graph illustrates at least two important trends
that further support the above findings. First, the highest and lowest frequencies
for the physical sciences category are lower than the highest and lowest frequen-
cies for all three other divisions. Second, the average frequencies for all the
humanities transcripts going from lowest to highest are consistently higher than
the corresponding averages for all other divisions — with the exception of the
highest frequency transcript, which is surpassed by the aforementioned outlier
in the social sciences, and one in the biological sciences.
Cross-disciplinary comparisons of hedging 11
4
3.73
3.5
Frequency/1000 Words
3 2.94
2.66
2.5
2.56
2
1.5 1.36
1
0.5
0
Phys Bio/Hlth Sci Soc Sci/Ed Hum/Arts Total (All
Sci/Engin Divisions)
Academic Division
Figure 2b.Frequency of kind of/sort of Hedges Across Academic Divisions, without
outlying speaker/transcript in Social Sciences
14
Frequency/1000 Words
12
Phys Sci/Engin
Bio/Hlth Sci
10
Soc Sci/Ed
8 Hum/Arts
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Individual Speech Events
Figure 3.Range of Frequencies of kind of/sort of Across Academic Divisions, from
Highest to Lowest
clearer picture of the trend we see emerging here. A wordlist frequency count of
two-word phrases in each of the four disciplinary sub-corpora produced the
results shown in Table 4.
In the humanities, kind of was the seventh most frequent two-word phrase,
comprising .19% of that subcorpus, and sort of was the eighth most common
phrase, at .18%. Similarly, in the social sciences, sort of was seventh (.20%) and
kind of was eighth (.18%). In contrast, in the biological sciences subcorpus, sort
of was 16th (.14%) and kind of was 18th (.12%). Finally, in the physical sciences
and engineering subcorpus, kind of was only the 42nd most common two-word
phrase, comprising .08% of the total, and sort of was ranked 126th, at .05%.
Furthermore, a keyword comparison3 of these two-word phrases generated with
WordSmith Tools (Scott 1999) comparing the physical sciences with both the
humanities and the social sciences reveals the results shown in Table 5. When
the wordlist of two-word phrases from the humanities is compared against the
physical sciences, the number one ranked keyword phrase for the humanities
corpus is sort of, and kind of is ranked fourth. In comparing the social sciences
with the physical sciences, sort of is the second highest keyword phrase (after
you know) and kind of is the tenth highest. Neither kind of nor sort of showed up
as keyword phrases for the physical sciences.4 Furthermore, although the
present study has focused only on the hedges kind of and sort of, it is worth
noting that several other hedging-type phrases appear in the keyword lists for
the soft disciplines. In the humanities subcorpus, it seems is ranked fifth, and
you know, I mean, and I think are all in the top five for the social sciences,
whereas none of these or any similar phrases emerge as key word phrases in the
physical sciences.
To summarize the quantitative findings, from the preceding analyses, there
is a wealth of evidence in support of our hypothesis that academic discipline,
broadly defined, is a stronger predictor of frequency of hedges with kind of and
sort of than is gender in this corpus. Naturally this finding begs the question of
why these correlations exist, so we consider here some possible explanations for
this trend. Is it possible that humanists are by nature more tentative, less
confident people, while scientists are more assertive and sure of themselves? In
other words, one might propose as an explanation that it is the personalities of
people who are drawn to the humanities and to the sciences that predisposes
them to be “hedgers” or “non-hedgers.” In response to this proposal, we cite
evidence from an experimental study by psychologists Schachter et al. (1991,
1994), which showed that disciplinary differences in the filled pauses um and uh
were not a result of individual speaking styles, but rather of the subject matter
being discussed.5 Filled pauses are related to hedging in that at least some
hedges appear to function primarily as another kind of filled pause. In fact, this
finding is corroborated by frequency counts of these two filled pauses in the
four disciplinary subcorpora of MICASE; like kind of and sort of, these filled
pauses occur most frequently in the humanities and least frequently in the
physical sciences.
A more plausible explanation that may account for some of the differences
in frequencies is one put forth by Schachter et al., which attributes the differ-
ences in frequency of filled pauses to the vocabulary range available to the hard
versus the soft disciplines. They argue that speech about a humanities subject is
more likely to be punctuated with filled pauses than that on a scientific subject
because vocabulary in humanities fields is less standardized and speakers must
choose the best possible word from a field of possibilities. For example, they
write, “there are no synonyms for molecule or atom or ion… In contrast,
consider the alternatives for love, beauty, group structure, prejudice, or style”
(Schachter 1994: 37). Their point is that language in the social sciences or
humanities is characterized by richer vocabularies than in the sciences, and is
therefore more likely to include pauses and filled pauses uttered by speakers
14 Deanna Poos and Rita Simpson
searching for the right word among many possibilities. Because sort of and kind
of can also act as filled pauses for a speaker searching for the precise expression
(as we discuss further in Section 5), this explanation may help account for some
of the disciplinary frequency differences.
A closely related explanation is that content in the humanities and social
sciences is by nature more open to multiple interpretations than content in the
hard sciences, which deals more with discrete, observable data, facts, and
processes. Simply put, there is more to hedge about in the softer disciplines than
in the sciences. Norms of interaction in the humanities and social sciences call
for presenting alternate points of view, stating and eliciting opinions, carefully
crafting arguments, and allowing for multiple possibilities — all of which can
and do involve the use of various hedging strategies. Finally, it must also be
considered that there are other hedging devices that occur more commonly in
the sciences, particular those having to do with imprecise numerical expres-
sions, such as about, around, approximately, etc.
4. Pragmatic analysis
As Dixon and Foster likewise note (1997: 103), too often researchers assume
a one-to-one relationship between tokens of hedging and expressions of
tentativeness. Hedges have in fact been defined as words that convey inexact-
itude, uncertainty or tentativeness — or, to quote George Lakoff (1973) in his
Cross-disciplinary comparisons of hedging 15
Note that in this decontextualized setting, the utterance of these phrases does
not seem to reflect the speaker’s self-confidence or even expertise, but is rather
used, particularly in the first two examples, to describe a state of inexactitude.
In examples (f) and (g), sort of seems to function in parallel with the suffix — ish.
Closely related to the prototypical function of conveying inexactitude, sort
of and kind of can be used in an effort to soften the force of a stance or opinion,
as in the following examples. In these contexts — i.e. making an assertion or
stating an opinion — the semantic fuzziness conveyed through the use of
hedges begins to serve the pragmatic function of politeness, in that speakers
mitigate the force of their opinions or assertions.
i. okay, so, the fact that those people don’t consider planning is kind of
irrelevant right? (PS: dissertation defense)
j. well, I sort of a- I I sort of agree more with Paul on that (BS: graduate
student meeting)
k. well we kind of uh are reasonably sure of that (PS: engineering seminar)
l. in, the journal-equilibrium approach, um the kind of the ideal situation is
to end up with a competetive equilibrium… (PS: graduate student meeting)
Similarly, speakers often use sort of and kind of to mitigate a criticism or request,
as in examples (m) through (p). These types of examples are found with some
frequency in speech events that by definition involve a high degree of feedback,
such as office hours, dissertation defenses, or tutorials.
16 Deanna Poos and Rita Simpson
m. I was just kinda hoping you’d read over this and say this has to be changed,
or you know whatever (SS: student, anthropology office hours)
n. I mean I’ve told you this before the way you write is sort of chatty… you
can’t let your argument kind of disappear as you kind of tell me this little
story (SS: graduate student instructor, anthropology office hours)
o. but somehow I feel that surface needs to be broken up. And one way would
be doing with big roof and you notice this could extend down somehow
maybe to kind of acknowledge, the exterior of the building (HA: architec-
ture critique)
p. …ask us in office hours to see how much we’ll tell you and then, kind of just
write a paragraph that says (PS: office hours)
In the above examples, kind of and sort of modify “the iconography of the
jewelry,” “to nuance that view,” “Faustian bargain,” “situated in this political
and economic context,” and “the artisanal uses of wood.” It is true that kind of
and sort of hedges are here mitigating the force of the phrases they precede, but
they seem to do so not to modify the semantic meanings or force of the terms
(as in examples (f)–(h) of “typical” inexactitude hedges), but rather to modify
their pragmatic effects. The speakers seem to use kind of and sort of in these
Cross-disciplinary comparisons of hedging 17
Like the hedges preceding sophisticated vocabulary, these hedges also seem to
function as metapragmatic markers — drawing the listeners’ attention to the
non-literal terminology, and attesting to the speakers’ self-consciousness about
using an overt metaphor.
18 Deanna Poos and Rita Simpson
As mentioned earlier, our corpus included a clear outlier in the social sciences.
This anthropology teaching assistant used kind of and sort of hedges an average
of 12 times per thousand words during an office hours interaction (compared
with the social science division average of 2.66/1,000, excluding her file). This
office hours interaction, a discussion of term paper progress shortly before the
end of the semester, provides an especially rich setting for the use of hedges;
young students tentatively explain their ideas and solicit feedback and the
teacher diplomatically offers her suggestions. The consequently high number of
hedges in this transcript allows us to examine the main interactional functions
kind of and sort of may have in the context of a series of one-to-one encounters
with one speaker remaining constant. Additionally, the authors’ own familiarity
with the field of anthropology, and more importantly, with the conversational
norms for making assertions in the context of that discipline, aided their
Cross-disciplinary comparisons of hedging 19
“but um, also especially the Friends and um, the, housing activist ones are
kind of, um, none of those are organizations that are actually centered in
the African-American community…”
The speaker has used kind of here to keep control of the floor as she determines
the conclusion of her utterance. It is clear that this apparent hedge functions as
a filled pause because the speaker in fact changes the direction of the utterance
in order to make her point — kind of never modifies an adjective, as the
beginning of the utterance leads one to expect.
Perhaps this outlying speaker uses kind of and sort of frequently because she
speaks in a “feminine” way. This could be, but it seems equally likely that her
distinctive use of this form is due to other factors such as her role as social
scientist, her disciplinary convictions, and her position as a young person in a
teaching position — as well as personal style. In other words, she isn’t just
“doing” gender — she’s doing a lot of other things having to do with the
presentation of her identity. The central concern should not be negotiation of
gender role, because in this situation she has several other roles that supercede
her position as a woman — in particular, anthropology teacher.
6. Conclusion
This study has offered clear evidence that in the domain of academic speech,
there is no significant gender-related effect on speakers’ hedging frequencies,
but rather that there is a noticeable difference in hedging frequencies depending
on the academic division. In particular, looking at the two ends of the academic
division continuum, the physical sciences versus the humanities, there is a
conspicuous difference in kind of/sort of uses. Further, we have posited some
possible explanations for the finding that hedging frequencies are lowest in the
physical sciences and highest in the humanities.
These findings are significant not only for their implications in the study
of variation across registers of spoken English, but also because they offer food
for thought to EAP practitioners, who until now have had a wealth of informa-
tion available on the nature of hedging in academic writing, but no compara-
ble data about hedging strategies and distributions in academic speech.
Learning to express and interpret hedges appropriately is important for
advanced learners of English, because of the important interactional and social
functions they perform in addition to their role in conveying shades of
certainty or commitment.
Cross-disciplinary comparisons of hedging 21
Because of the relatively small corpus used for this study, there are of course
limitations to the generalizability of the data, and we realize the need for
additional research in this area. Beside replicating the current study with a
larger corpus and more tightly controlled subcorpora, other possible avenues
for further investigation include the effects of speech event type as well as
speaker age and/or academic position on the use of hedging devices. There is
also, of course, a need for studies examining a wider range of hedging devices in
speech, comparable to Hyland’s (1998) work on scientific research articles.
In this paper we have also shown that the phrases kind of and sort of, while
identified primarily as hedges, are multifunctional and serve a variety of often
overlapping sociopragmatic purposes in spoken interaction. They can be used
to reduce the force of an utterance or to convey inexactitude; they can be used
to mitigate criticisms, requests, and directives for the sake of politeness,
minimizing threats to the listener’s face; they can also be used as a subtle form
of accommodation to interlocutors who may not be familiar with technical
jargon or metaphorical references used by the speaker; finally, these phrases can
function as filled pauses, or floor-holding devices, in some speakers’ styles.
The two essential aspects of our argument are, first, that hedging is not
necessarily — or not always, at any rate — a gender-based phenomenon, and
second, that so-called hedge words like kind of and sort of are not merely, and
indeed not always, indicators of tentativeness. The argument that hedging is not
necessarily an indicator of gender is not new, nor is the argument that hedging
is not necessarily an indicator of linguistic tentativeness. However, we want to
reiterate these arguments together with a plea for context in the investigation of
language difference and for an understanding of speaker identity as multiplex
and not reducible, however convenient, to check-marked boxes. Even when
working with data that necessarily appear as essentialized categories, such as the
divisions organizationally necessary in a corpus such as MICASE, it is possible
to pursue a more nuanced understanding of the individual speaker and why he
or she might speak in a particular way. An attention to the most basic elements
of context demonstrates that use of particular linguistic forms does not index a
single element of identity. When thinking about gender and its construction
through language, one must look well beyond use of a single word, and beyond
that word’s supposed semantic meaning.
22 Deanna Poos and Rita Simpson
Notes
1. Although this division of academic fields of study into larger groupings is not universal
and there are a number of departments and courses within certain departments that are
interdisciplinary by design, we believe that on a macro level, the categories are useful —
especially when viewed as a continuum. Furthermore, the number of disciplines — like
English, philosophy, sociology, physics, chemistry, or biology — that fall clearly into one
category or another are more numerous than those whose boundaries are less discrete, such
as linguistics, history, women’s studies, or biopsychology.
2. A frequency count comparing hedging rates in the interactive transcripts to those in the
monologic ones in each division (omitting events in the mixed category) indicates that this
assumption holds true for all four divisions.
3. Key words are those whose frequencies are unusually high in comparison to another text,
and thus characterize the text in question. The key word function of WordSmith Tools works
by comparing two existing word lists, calculating the frequencies of every word in both
corpora, and using a log likelihood test to determine the statistical significance of the
difference in frequencies, and to rank the words in order of keyness.
4. The keyword list for the physical sciences is consolidated from the two lists comparing the
physical sciences once against the humanities and once against the social sciences. There were
few substantive differences in the two lists.
5. They determined this by collecting and analyzing speech samples from professors both in
disciplinary lectures and in one-on-one interviews on a topic of equal familiarity to all of
them; they found significant differences for the disciplines only in the lecture speech.
References
Aijmer, K. 1986. “Discourse variation and hedging.” Corpus Linguistics II: New Studies in the
Analysis and Exploitation of Computer Corpora, J. Aarts and W. Meijs (eds) 2–18.
Amsterdam: Rodopi.
Biber, D., Reppen, R., Clark, V., and Walter, J. 2001. “Representing spoken language in
university settings: The design and construction of the spoken component of the
T2K-SWAL Corpus.” In Corpus Linguistics in North America: Selections from the 1999
Symposium, R.C. Simpson and J.M. Swales (eds), Ann Arbor: University of Michigan Press.
Crompton, P. 1998. “Identifying hedges: definitions or divination.” English for Specific
Purposes 17(3): 303–313.
Crompton, P. 1997. “Hedging in academic writing: some theoretical aspects.” English for
Specific Purposes 16(4): 271–289.
Dixon, J. A., and Foster, D. H. 1997. “Gender and Hedging: From Sex Differences to Situated
Practice.” Journal of Psycholinguistic Research 26(1): 89–107.
Holmes, J. 1998. “Signaling gender identity through speech.” Moderna Sprêak 92(2): 122–128.
</TARGET "poo">
Holmes, J. 1986. “Functions of you know in women’s and men’s speech.” Language in
Society, 15(1): 1–22.
Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. Harlow,
England: Pearson Education Limited.
Hyland, K. 1998a. Hedging in Scientific Research Articles. Amsterdam: John Benjamins
Publishing Company.
Hyland, K. 1998b. “Boosting, hedging and the negotiation of academic knowledge.” Text,
18(3): 349–382.
Hyland, K. 1996. “Nurturing hedges in the ESP curriculum.” System, 24: 4, pp. 477–490.
Kreutz, H. and Harres, A. 1997. “Some observations on the distribution and function of
hedging in German and English academic writing.” In Culture and Styles of Academic
Discourse, A. Duszak (ed), 181–203. Berlin: Mouton de Gruyter.
Lakoff, G. 1973. “Hedges: A study of meaning criteria and the logic of fuzzy concepts.”
Journal of Philosophical Logic 2: 458–508.
Lakoff, R. 1975. Language and Woman’s Place. New York: Harper and Row.
Markkanen, R. and Schröder, H. 1997. Hedging and Discourse. Approaches to the Analysis of
a Pragmatic Phenomenon in Academic Texts. Berlin: Walter de Gruyter.
Meyerhoff, M. 1992. “A sort of something — hedging strategies on nouns.” Working Papers
on Language, Gender and Sexism 2(1): 59–73.
Salager-Meyer, F. 1994. “Hedges and textual communicative function in medical English
written discourse.” English for Specific Purposes 13(2): 149–170.
Schachter, S., Rauscher, F., Christenfeld, N., and Crone, K. T. 1994. “The vocabularies of
academia.” Psychological Science 5(1) : 37–41.
Schachter, S., Christenfeld, N., Ravina, B., and Bilous, F. 1991. “Speech disfluency and the
structure of knowledge.” Journal of Personality and Social Psychology 60(3): 362–367.
Scott, M. 1999. WordSmith Tools, Version 3.0. Oxford: Oxford University Press.
Simpson, R. C., Briggs, S. L., Ovens, J., and Swales, J. M. 1999. The Michigan Corpus of
Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan.
http://www.hti.umich.edu/micase
Varttala, T. 1999. “Remarks on the communicative functions of hedging in popular
scientific and specialist research articles on medicine.” English for Specific Purposes
18(2): 177–200.
Vassileva, I., 1997. “Hedging in English and Bulgarian academic writing.” In Culture and
Styles of Academic Discourse, A. Duszak (ed), 203–221. Berlin: Mouton de Gruyter.
<TARGET "far" DOCINFO
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 2
1. Introduction
on this interactional language feature that it has warranted review articles and
volumes (e.g., Clemen 1997, and Schröder and Zimmer 1997). Inevitably, with
such publication density comes diversity and conflict and this will become more
evident as our discussion unfolds. For present purposes, we have decided to
review hedging in terms of definition, gender, culture, genre, psycho-affective
aspects, and pedagogic application.
Throughout the research literature, hedges have eluded any widely-accepted
definition. Fundamental to the problem of definition is the divergence in
approach to the nature and realisation of hedging. Traditionally, hedges were
considered to be semantic modifiers or approximators in the spirit of the
original definition by Lakoff (1972: 195), who coined the term ‘hedge’ to
describe a word or phrase ‘whose job it is to make things fuzzier or less fuzzy’.
Lakoff is concerned with hedges in terms of the semantic contribution they
make to the statements in which they occur (Loewenberg 1982: 196), in that
hedges can weaken or strengthen category membership. This is in keeping with
Rosch (1978) who developed the prototype theory and views hedges as linguis-
tic devices that modify prototypical category membership e.g. A penguin is a
kind of bird. Such an approach is rooted in cognitive science where “semantic
grasp” has preceded analysis at the level of discourse, and therefore discounts
language function (Clemen 1997: 235).
Concurrent with this is the emergence of research which focuses on the
pragmatic aspect of hedges in discourse. Within this approach, research
questions focus more on why hedges are used and offer reasons such as polite-
ness, indirectness, vagueness and understatement — to name but a few. The
work of Brown and Levinson (1978) on politeness strategies has provided a
framework for investigating the role of hedging in domains such as mitigation
and indirectness. In this approach, hedges are context- dependent and are
integral to face saving strategies. Channell (1990), Clemen (1997), and Markan-
nen and Schröder (1997) examine pragmatic strategies and their linguistic
components in terms of hedges from various perspectives.
In addition to these approaches, many researchers have attempted to
reclassify and subcategorise what have traditionally been collectively called
hedges. Prince et al. (1982), for example, suggest that hedges should be divided
into shields (those performing a pragmatic function) and approximators (those
performing a semantic function) and Rounds (1982) adds diffusers to this.
Hübler (1983) proposes understatements and hedges while Fraser (1975)
examines in some detail hedged performatives, and subsequently differentiates
between hedging and mitigation (Farser 1980). Not surprisingly then, in the
Would as a hedging device in an Irish context 27
words of Markannen and Schröder (1997: 15), “through extension the concept
has lost some of its clarity and sometimes seems to have reached a state of
definitional chaos, as it overlaps with several other concepts”.
Gender has long been considered integral to the nature and use of hedges.
Preisler (1986), following in the Lakoff (1975) tradition, maintains that women
hedge more than men because their speech is more tentative and less assertive.
However this viewpoint has become contentious and much research based on
naturally occurring speech data has failed to support such conjecture (see
Bradac et al. 1995, Dixon and Foster 1997, and Holmes 1986, 1990, 1993).
Lakoff’s original proposals, based primarily on hypothesis and personal
observation, have not only been challenged, but many findings now suggest that
the contrary may in fact be true. As with many areas of inquiry, evidence
remains inconclusive on the effect of gender on the use of hedges. On the other
hand, more recent work expands the sphere of investigation into the cultural
constraints on the use of hedges, for example, Crismore et al. (1993) cross
sociocultural boarders by comparing the American and Finnish contexts.
Hinkel’s (1995) innovative study examines the use of modals on a comparative
and contrastive basis between native and non-native users of English in a
written context and finds that there are considerable sociocultural constraints
on the pragmatics associated with modality. Cultural values and norms also
form a central tenet of our present study.
Several researchers have examined the effects that the use of hedges and
intensifiers have on the listener in terms of features such as attractiveness,
authority, credibility etc. Results are conflicting and not easy to compare due to
dissimilarities in empirical procedures adopted by researchers such as Bradac et
al (1995), Holmes (1990) and Hosman (1989). Furthermore, specific language
domains have formed test-beds for how and why native speakers employ
hedges. There has been considerable research into the use of hedges in academic
texts (Myers 1989, 1992; Fahnestock 1986, Hyland 1994, 1996, Salager-Meyer
1994, and Rounds 1982). Another significant corpus-based study into the use
of hedging in a professional spoken context is that of Prince et al. (1982). In
their corpus of 12 hours of physician to physician talk, they note that the most
salient linguistic feature, in terms of frequency, is that of hedges. Using Lakoff’s
(1972: 195) definition of a ‘hedge’ as a word or phrase ‘whose job it is to make
things fuzzier’, they identified between 150 and 450 hedges per hour, more than
one every fifteen seconds.
Some researchers address the practical application that their findings may have
in pedagogic terms (Hinkel 1995, Holmes 1988, Markannen and Schröder 1997,
28 Fiona Farr and Anne O’Keeffe
Table 1.
and Skelton 1988). However, most of the research in this area engages at either
a theoretical or descriptive level exclusively and it appears that much territory
remains unexplored in relation to pedagogic implications and applications in
foreign language teaching.
For the purposes of our analysis, we have isolated two sub-corpora of spoken
data from the Limerick Corpus of Irish English (L-CIE). The datasets were
selected primarily on the basis of comparability, in that both comprise asym-
metrical dyadic interactions in institutional settings.
Sub-corpus A (RPI): 55,000 words of radio phone-in conversations from
Liveline, a national Irish radio programme on Radio Telifís Éireann.
Sub-corpus B (POTTI): 52,000 words of post-observation teacher trainee
interaction — feedback on teaching practice which took place as part of the
Master of Arts in Teaching English as a Foreign Language programme at the
University of Limerick, Ireland.
A striking feature of both the RPI and POTTI data is the pervasive use of the
modal verb would as a hedging device. Table 1 profiles the frequency of a
commonly occurring cluster: I would say, and its contracted form I’d say, is
compared across three Irish, British and American English: the Limerick
Corpus of Irish English1, the Cambridge and Nottingham Corpus of Discourse
in English (CANCODE) and a corpus of American spoken data from the
Cambridge International Corpus.2
Would as a hedging device in an Irish context 29
Table 2.
RADIO PHONE-IN POST OBSERVATION TT
INTERACTION
Register Register
– Mode: Spoken: voice only – Mode: Spoken: face to face
– Spoken Genre Range: (diverse but finite) – Spoken Genre Range: (less diverse
narrative, argumentative, expository, than RPI) directive, observation-
directive, opinion-giving/seeking. comment, expository, reflection/self-
direction, motivational.
4. A Quantitative Analysis
An initial search of would in the RPI and POTTI registers yielded the results
illustrated in Table 3 below. The distribution of would from a quantitative
perspective is strikingly similar, with RPI data producing 3930 occurrences, and
POTTI data 3942 occurrences per million words. At this level of analysis, it
seems that there may be nothing of significance or interest to note about the use
of would as a hedge between these registers.
Table 3.
RPI POTTI
50000
40000
30000 RPI
20000 POTTI
10000
0
I You He She We They
Figure 1.
32 Fiona Farr and Anne O’Keeffe
Table 4.
Item RPI POTTI
genre range of these registers (expository, directive, and so on), that is to say,
the speaker habitually seeks to downtone or make fuzzy when explaining,
directing etc. in the first person. The nil result for the pronoun he in POTTI is
explained by the female gender bias among the trainer and student cohorts. The
result for impersonal you, especially in the POTTI data, and the result for we,
are linked to the strategic use of ‘other attribution’ which we will return to at a
later stage. Table 5 offers a breakdown of other nominal colligates.
These results again relate to how speakers distance themselves from the
content of their utterance, for example:
Extract 1 (POTTI)
Trainer: Am is that what you meant by this?
Trainee: Am yes.
Trainer: Well now you see that is that would not be clear to me when I look
at this as a lesson plan+
Trainee: Right.
Trainer: +and see it as an aim I thought that you were going to look at
maybe differences in eating in various cultures.
Table 5.
Item RPI POTTI
We see that the trainer has chosen to change the pronoun from this to that
between her first and second turn. That can be used as a means of referring in
a non-central, marginalised manner (McCarthy 1994), in other words, by using
that the speaker seeks to put the criticism at a safe distance. In the speech genre
of argumentation (see Extract 2 below), the caller uses that would as a subtle
means of raising opposition or scepticism as if from a distance, whereas the direct
implication of the caller’s utterance is: I don’t agree with what you have said.
Extract 2 (RPI)
Presenter: …I mean people are still getting married and they’re getting
married for all best of reasons and they’re mad about one
another and they want to live happily ever after.
Caller: Yes. That would be the that would be the the pretty picture that’s
painted but as time goes on it’s cool these days ah and pardon
me for using that word because it’s a slang word I don’t like. But
as they say it’s cool to say “I’m separated”. It’s attractive.
Similarly in Extract 3, the trainer is trying to bring the trainee around to self-
direction. Her attempt at elicitation fails and consequently her utterance that
would be offers a palatable front for the implication that the trainee’s answer
was wrong. The more direct, face-threatening version would be: No, that’s at the
correction stage.
Extract 3 (POTTI)
Trainer: …if you’re not sure which words they’re not going to know and
which words they are going to know?
Trainee: Ask them well like ask concept questions or something?
Trainer: That would be at the correction stage but before they start to do the
activity so you might not know for example that am oh excuse me
that “cook” and “cooker” are going to be+
It is also worth noting that the high results for I, they and it plus would, in both
registers, is in keeping with the findings of Luukka and Markkanen (1997), who
say that this is representative of spoken interaction where levels of involvement
with oneself, the audience and what is being talked about are high.
Table 6.
Item RPI POTTI
From the above table, you, they and we represent the pronoun domains
which are hedged using would and they are reflective of the results for Pronoun +
would in Table 4 (apart from the obvious lack of I in questioning). Table 7
shows the results for Wh- questions.
These results characterise the marked contrast in the communicative goals
of each register. In all cases, the POTTI data shows much higher frequencies.
The use of what would underpins the importance of reasoning and rationalising
in POTTI and how would indicates the essential place of methodological issues
involved. When and which would suggest that temporality, precision and decisive-
ness are vital and clearly illustrate the necessity to make hard and fast choices.
Extract 4 (POTTI)
Here the trainer is trying to bring about trainee reflection.
Trainer: Okay now if you had to change some general aspects of not
necessarily of the lesson because I’m not sure how much of that you
could change+
Trainee: Mm.
Trainer: +of either the lesson or your planning what would it be?
Trainee: Am+
On the other hand the narrative speech genre within the RPI speech genre range
does no involve hedged questioning using what, when, how and which — if such
a question type is used, it functions to seek clarification, rather than validation
or self-direction, and so on, from the story teller, and so it does not need to be
hedged. Also, it is more incremental in RPI to use declarative questions, which
function as formulations to be accepted or rejected, and these are frequently
hedges, for example.
Would as a hedging device in an Irish context 35
Table 7.
Item RPI POTTI
Extract 5 (RPI)
In an opinion-giving unit, the presenter offers the following formulation:
Presenter: And you would think not all for the better?
Extract 6 (RPI)
Presenter: There would be the smallest little bit of prejudice in that no?
Table 8.
TOTAL 37
135
TOTAL 91
38
TOTAL 91
19
Would as a hedging device in an Irish context 37
The Irish national broadcasting station, Radio Telefís Éireann, and an Irish
university are the institutional settings from where the RPI and POTTI data
originate respectively. These interactions are set within institutionally defining
parameters. As is the case in any conversation, participants enter into a ‘conver-
sational contract’ (after Fraser 1980: 343) where each party brings an under-
standing of some set of rights and obligations vis à vis the other (for example,
Clark and Carlson 1982 refer to the Principle of Responsibility at a sociocultural
level, and Thomas 1983 talks about pragmatic ground rules). Within institution-
al settings these rights, obligations and norms are fixed to a greater degree than
in everyday conversation, and this is largely due to the institutionalised roles of
the participants. Specific to this study are the roles of presenter and caller, and
trainer and trainee. These exogenous roles are not symmetrical in terms of
rights, obligations, and power. The presenter and the trainer, by virtue of role,
are bestowed more power in the interaction. In the case of the presenter, the
power semantic is, to a degree, less asymmetrical than in the case of the trainer-
trainee dyad, because it is mitigated by the caller being the ‘primary knower’
(term adapted from Berry 1981) of his or her own experience, problem or
opinion (O’Keeffe 1999). The trainer, on the other hand, is both the power role
holder and the ‘primary knower’ in terms of professional expertise.
Would is used strategically within these institutional conditions on a
relational or interpersonal level to redress the asymmetry of the power semantic
within the dyads, and on an transactional level to mitigate or downtone the
perlocutionary force of the utterances in ‘difficult’ or threatening speech genre
units, and to frame the focus of the talk into a safe hypothetical band. These
strategies are dealt with in greater detail in the following discussions.
(see other attribution — Halliday and Hasan 1976, McCarthy 1994). That
functions similarly in Extract 8 below (that’s what the theory tells us). Here we
see a further strategic use of would by the power role holder.
Interesting to note here that the trainee is also hedging, using appropriate
devices marked in the above extract, such as kind of. We see that the trainee is
being self-directive and the inversion of advice-giver role accounts for the
hedging in this utterance, which could have been asserted as a statement: I
think that was appropriate. We see therefore that the trainee attends to the
negative face of the trainer, since advice normally comes from the trainer, the
power-role holder.
most+
Presenter: Yeah but but it wasn’t
Caller: +but I think most people worldwide would when they would say
Ireland they would see see the whole island right? Now I think
everybody be wo= would be aware that there’s been a conflict on
our island for a hell of a long time. But as as its territory I think
most people would would define that the island as being the
whole island.
Presenter: Well one million people in Northern Ireland wouldn’t.
Caller: Oh yes I know isn’t that the ongoing conflict?
This strategy frequently occurs in POTTI when the trainer seeks to initiate
reflection on the part of the trainee, and self-criticism is safely transposed to the
hypothetical band. Here is a typical example.
42 Fiona Farr and Anne O’Keeffe
Extract 15 (POTTI)
Trainer: Yeah now what other instruction would you need ah?
Trainee: I should have told them that there were four words that wouldn’t
have been used that would not necessarily fit into the…
The main data for this paper are not only defined by respective institutional
settings and register issues, they are also rooted in a sociocultural context, that
is to say, both sets of data are from institutional settings within Irish society. It
is accepted by many researchers that the linguistic manifestations of hedging are
not only complex but that the functions they express cannot be identified in ‘a
social and textual vacuum’ (Holmes 1990: 186). In order to fully understand
hedging, we feel that sociocultural context needs to be considered as one of
critical factors in explaining why speakers hedge in discourse. In Irish society,
directness is very often avoided and this is attested throughout our data. We
suggest that ‘forwardness’, which ranges from being direct to being self-pro-
moting is not valued within Irish society. That Irish society does not place a
high value on powerful or direct speech is borne out by some of the above
results for the use of would as a means of downtoning assertiveness and
directness in asymmetrical interactions. In the extracts below, a notable feature
is indirectness in answering polar questions. Irish people will rarely answer a
polar question with a single word answer (yes or no), it is considered too direct
and impolite (Asián and McCullough 1998: 49). This is consistent with the
sociocultural norm of avoiding over-assertiveness.
Would as a hedging device in an Irish context 43
Extract 17 (POTTI)
Here the trainer is giving advice to the trainee on how a particular exercise
should have been conducted based on the trainee’s performance in the
classroom
Trainer: Do you think it would have been possible at all to just leave
them work through them all? ·pause: two secondsÒ like it w= it
was always going to be a better idea to split up the sentences was
it?
Trainee: I would say so.
Trainer: Mm.
Trainee: Given your time I would say so.
Trainer: Umhum.
Trainee: Yeah maybe not into such small sections maybe into just three
different groups “can you work on the first two can you work on
the fourth to the eighth can you work on the eighth to the
twelfth”+
Extract 18 (RPI)
Here the topic is boarding schools. The caller phones the programme to talk
about his memories of being at a boarding school.
Presenter: Did you find it wo= girls very alien beings when you came up in
contact with them?
Caller: Well yeah I would. Ah yeah there was that sorry for the noise
there there was that ah there was that element am I mean when
I came to university first you know you’re used to an all out male
atmosphere so okay girls it’s kind of wo= what’s that creature
over there is that a girl…
Very frequently in our data, we find that when speakers talk about themselves,
they try to mitigate directness by using would as an epistemic downtoner, even
where the propositional content is undisputed.
Extract 19 (RPI)
Caller: I told him he could have piercing I mean no problem any organ
of the body he wanted anything you could undo but tattoos they
frighten me and as regards that lady yeah I would be really in
sympathy with her I would be saying ’’do you know what you’re
doing? Do you know who you are identifying yourself with?’’
Extract 20 (RPI)
Here the caller was convicted of murdering her colleague in Saudi Arabia. After
an extended period, she was released and cleared of the charge.
Presenter: You figure you were stitched up?
Caller: Oh yes very definitely.
44 Fiona Farr and Anne O’Keeffe
Presenter: Why?
Caller: Am again ah I would have many theories on this.
We see that the caller expresses certainty in her agreement with the presenter’s
formulation, but in her next turn when it comes to asserting the reasons why,
we find would is used to downtone the assertion. In Extract 24 below, the topic
is the problem of female facial hair and we find would used systematically by
the caller to downtone facts about the past and present colour of her facial
hair.
Extract 21 (RPI)
Caller: …two years ago I discovered waxing+
Presenter: Yeah.
Caller: +and I thought it was brilliant. The best thing ever. Now I get it
done every about every two weeks and it takes longer to grow
back and it’s am brighter than it was. I would have had black
hair you know my hair would be brownish now but it was+
Presenter: Right.
Caller: +black in the teenage years.
7. Conclusion
broader pragmatic function for speakers of Irish English in our data. That is to
say that by framing a fact in a hedged way a speaker can attend to face needs in
a given context at a sociocultural level. This conclusion gives rise to a three-
tiered model for the analysis of hedging in spoken interaction (see Table 9).
Table 9.
Level 1
– Mode
– Interactive online production
– Shared immediate situation
– Main communicative purpose/content
– Audience
– Dialect domain
– Spoken genre range
Level 2
– Institutional context
– Institutional role of speaker
– Institutional power semantic
Level 3
– Sociocultural norms
Notes
1. The Limerick Corpus of Irish English is broadly based on the framework outlined in
McCarthy (1998: 8–12).
2. Results from CANCODE and CIC are based on an oral presentation by Prof. Michael
McCarthy, University of Limerick 1999, and reproduced here with his kind permission.
46 Fiona Farr and Anne O’Keeffe
References
Hinkel, E. 1995. “The Use of Modal Verbs as a Reflection of Cultural Values”. TESOL
Quarterly 29(2): 325–343.
Holmes, J. 1984. “Hedging Your Bets and Sitting on the Fence: Some Evidence for Hedges as
Support Structures”. Te Reo 27: 47–62.
Holmes, J. 1986. “Functions of you know in Women’s and Men’s Speech”. Language in
Society 15(1): 1–21.
Holmes, J. 1988. “Doubt and Certainty in ESL Textbooks”. Applied Linguistics 9(1): 21–44.
Holmes, J. 1990. “Hedges and Boosters in Women’s and Men’s Speech”. Language and
Communication 10(3): 185–205.
Holmes, J. 1993. “New Zealand women are good to talk: An Analysis of politeness strategies
in interaction”. Journal of Pragmatics 20: 91–116.
Hosman, L.A. 1989. “The Evaluative Consequences of Hedges, Hesitations, and Intensifiers.
Powerful and Powerless Speech Styles”. Human Communication Research 15(3): 383–406.
Hübler, A. 1983. “Understatement and Hedges in English”. Pragmatics and Beyond IV(6).
Amsterdam: John Benjamins.
Hyland, K. 1994. “Hedging in Academic Writing and EAP Textbooks”. English for Specific
Purposes 13(3): 239–256.
Hyland, K. 1996. “Writing without Conviction? Hedging in Science Research Articles”.
Applied Linguistics 17(4): 433–454.
Lakoff, G. 1972. “Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts”.
Papers from the eight regional meeting Chicago Linguistic Society, 183–228.
Lakoff, G. 1973. “Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts”.
Journal of Philosophical Logic 2(4): 458–508.
Lakoff, R. 1975. Language and Women’s Place. New York: Harper and Row.
Loewenberg, I. 1982. “Labels and Hedges: The Metalinguistic Turn”. Language and Style
XV(3): 193–207.
Luukka, M. R. and Markkanen, R. 1997. “Impersonalisation as a Form of Hedging”. In
Hedging and Discourse: Approaches to the Analysis of a Pragmatic Phenomenon in Academic
Texts, R. Markkanen and H. Schröder (eds), 168–187. Berlin: Walter de Gruyter.
Lysvag, P. 1975 “Verbs of Hedging”. In Syntax and Semantics (vol. 4), J. Kimball (ed),
125–154. New York: Academic Press.
Markkanen, R. and Schröder, H. 1997. “Hedging: A Challenge for Pragmatics and Dis-
course Analysis”. In Hedging and Discourse: Approaches to the Analysis of a Pragmatic
Phenomenon in Academic Texts, R. Markkanen and H. Schröder (eds), 3–18. Berlin:
Walter de Gruyter.
McCarthy, M. J. 1994. “It, This and That” In Advances in Written Text Analysis, M.
Coulthard (ed), 197–208. London: Routledge.
McCarthy, M. J. 1998. Spoken Language and Applied Linguistics Cambridge: Cambridge
University Press.
Myers, G. 1989. “The Pragmatics of Politeness in Scientific Articles”. Applied Linguistics
10(1): 1–35.
Myers, G. 1992. “Textbooks and the Sociology of Scientific Knowledge”. English for Specific
Purposes 11: 3–17.
</TARGET "far">
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 3
Michael McCarthy
University of Nottingham
1. Introduction
Sinclair and Coulthard’s labelling system also allows for a single speaking turn
to include a new initiating move by the respondent, as in the following extract.
The new initiation, which transfers the status of ·$2Ò from listener to speaker,
is, in its turn, responded to by the initial speaker.
·$2Ò And his mum paid for them as well so. Initiation 1
·$1Ò Good. Great. Good good good. //
Did he buy them both? Response 1 // Initiation 2
·$2Ò Yep. Response 2
The kinds of response items this chapter deals with also occur in the follow-up
(third) slot of the three-part exchange:
Good listenership made plain 51
In the present chapter I shall treat both the responding move and the follow-up
as types of responses, and refer to response moves to cover both cases.
Sinclair and Coulthard’s tripartite classification is highly relevant to the
study of spoken corpora, where the sequential positioning of words within the
initiation Æ response Æ follow-up framework says a great deal about their
typical environments of occurrence and their associated conversational func-
tions. Thus the interpretation of a word may be affected not only by its syntactic
function or by its basic lexical meaning, but by where it most typically occurs in
the conversational exchange structure. In this chapter, I wish to examine a set
of words which display a proclivity to occur as single-word responding or
follow-up moves or as the first word in extended responding or follow-up
moves, or to be a lexical element in those moves alongside functional particles
such as yes, no, oh and okay. The words under scrutiny, I shall argue, play a key
role in how effective listeners act verbally.
Duncan and Niederehe (1974) reassert the basic notion that the back-
channel encodes an understanding between speaker and listener that the turn
has not been given up, but they point to uncertainties over the boundary
between brief utterances and proper turns. Drummond and Hopper (1993a)
and Zimmerman (1993) take up just this issue in debating the range of roles of
acknowledgement tokens vis-à-vis retention of listenership or claim to speaker-
ship. And indeed it is probably the inherently scalar nature of the options that
listeners exercise, ranging from non-vocal acknowledgement (e.g. through body
language), through minimal responses (including non-lexical vocalizations),
tokens such as yes and okay, single lexical tokens, brief clauses and more
extended responses, that has resulted in the more stable and clearly circum-
scribed territory of non-lexical vocalisations being the nexus of more research
than in the other areas.
Duncan (1974) expands the typology of backchannel responses from non-
lexical vocalisations and yeah, and includes items such as right and I see, sentence
completions, clarification requests, brief restatements and head nodding and
shaking. Duncan’s range of items is indicative both of the potential range of
behaviour that may be considered relevant to the study of listenership and, once
again, of the difficulty in establishing the boundary between backchannelling,
turn-taking and floor-grabbing (e.g. whether a brief clarification request is a case
of the listener assuming the floor, albeit only momentarily).
A key paper in the interpretation of listener responses is Schegloff (1982).
Schegloff asserts that the turn-taking system is fundamentally designed to
‘minimize turn size’ (p. 73), that is to say there is an economy immanent in
communication: speakers say no more than is barely essential (although this
condition may be overridden by any speaker). Indeed, brief responsive turns of
various kinds which occur in everyday talk would seem to confirm the concept
of communicative economy. But central to my argument in this chapter is that
the ‘additional’ content which regularly occurs in response moves indicates that
listeners direct their attention as much towards the interactional/relational
aspects of the talk as to the transactional content and the need to keep the back
channel open, to mark boundaries and to acknowledge the incoming talk.
‘Economy’, therefore, takes both transactional and relational needs into
account. Schegloff recognises the role of vocalisations such as mm hmm, yeah
and uh huh and the importance of research paying attention to listeners in
general. To neglect the listener, and to focus only on the main speaker,
Schegloff states, leads to the unfortunate tendency to regard discourse as ‘a
single speaker’s, and a single mind’s, product’ (p. 74). And it is true that
Good listenership made plain 53
British English speakers, and why this and the variation in items may be so,
Tottie is more hesitant, rightly concluding that her two relatively small samples
simply cannot provide reliable answers.
In the studies reported above, much important insight is offered, but
problems also emerge. One difficulty is the problem of terminology, which can
vary considerably (see Fellegy, 1995, for a brief discussion). In the present
chapter I will use the term non-minimal response to refer to the single-word
response moves exemplified, in order to reinforce the argument that speakers
regularly use tokens that more than satisfy the minimal requirements of keeping
the backchannel open, acknowledging and showing understanding of the
incoming talk, and marking discourse boundaries. In most cases, yes/yeah, no,
okay, or a conventionalised vocalisation would be sufficient to maintain the
economy and efficiency of the talk, to show agreement and/or acquiescence,
and to function as an appropriate response move. Nonetheless, listeners
regularly choose to do more, to orientate affectively towards their interlocutors,
and to create and consolidate interactional/relational bonds.
3. Corpus data
For the purposes of this chapter, the top-2000 word frequency lists for both the
British and American corpora were examined and most likely candidates (based
on the previous studies reviewed above, and on observation and intuition) for
occurrence as single-word responses were listed. As a limiting criterion, at least
100 occurrences in each corpus was set as the level below which items would be
excluded from consideration, and after the initial count, a maximum of 1000
entries from each corpus was analysed for each item (via random sampling
options in the analytical software). This initial investigation produced the
following items (Tables 1 and 2), in descending order of frequency, for the
British and American data, respectively.
These are simply gross frequencies, and tell us little about actual occurrenc-
es as responses. However, it is notable that there are differences in overall
Good listenership made plain 57
Table 1.Total frequency of items occurring more than 100 times in the British corpus
Item Occurrences
Right 20823
Really 11731
Good 6351
Quite 4816
Sure 1690
Great 1292
Lovely 1039
Fine 956
Exactly 831
Absolutely 726
True 592
Definitely 549
Certainly 532
Brilliant 363
Wonderful 387
Excellent 187
Cool 192
Gosh 188
Wow 175
Perfect 122
Marvellous 104
frequency which exclude some items from the American list which occur in the
British list (viz. brilliant and marvellous), both of which fall below the 100 lower
limit in the American data. The 100 lower limit is not an otiose choice; in both
corpora, an occurrence of 100 or more places a word within the core vocabulary
which occurs within the ‘hump’ of high frequency words before the marked
fall-off in frequency occurring round about the 1700–1800th word in the rank
order frequency lists for the corpora (see McCarthy 1999 for further details). All
of these items occur as single-word responses. As a next step to gross frequency,
it is necessary to measure what percentage of the total occurrences for each item
realise the response-token function. If a very high percentage of the occurrences
of any word occur in that function, then this feature will be an important
component of the lexical profile of that word, and any description of that word
not taking its response-prone behaviour into account will be inadequate.
Let us consider, then, how many occurrences of each word are in fact
single-word responses, and calculate what percentage this represents. As stated
above, the maximum number of occurrences analysed for any individual token
is 1000 from each corpus, generated by random sampling of the total number
58 Michael McCarthy
Table 2.Total frequency of items occurring more than 100 times in the American
corpus
item occurrences
Really 17738
Right 15445
Good 10091
Great 2437
True 2392
Quite 1872
Exactly 1459
Wow 1265
Wonderful 844
Certainly 773
Gosh 746
Fine 742
Cool 574
Absolutely 642
Sure 638
Definitely 563
Excellent 231
Perfect 164
Lovely 106
of occurrences. Tables 3 and 4 show the occurrences and the respective percent-
ages, in descending order of percentage frequency, for both corpora.
There are some interesting differences between the two corpora, some of
which can be explained by the make-up of the data. Figure 1 brings together
items where there is a marked difference in distribution between the American
and British data.
Right, fine and good all occur more frequently in the British data, and this
could be explained by the fact that the British corpus contains more service
encounters than does the American corpus, and these three tokens serve
characteristically as transactional boundary markers in such encounters, or it
could relate to a broader difference in preferences. Such a claim could only be
borne out, however, by further research into corpora more carefully balanced
in terms of genre. Lovely is a different matter, and on the basis of firm anecdotal
support and public reaction whenever British corpus extracts are presented to
American audiences, is perceived as a very British, and non-American, response
token. Absolutely has no obvious explanation for its uneven distribution and
may be a case of genuine dialect variation. The figures certainly confirm my
Good listenership made plain 59
Right 771 77
Exactly 275 33
Fine 267 28
True 266 70
Great 233 23
Defenitely 203 37
Good 198 20
Lovely 189 17
Absolutely 161 22
Gosh 144 76
Wow 121 69
Really 118 12
Sure 91 9
Cool 85 44
Brilliant 85 23
Excellent 78 42
Wonderful 57 15
Certainly 49 9
Marvellous 16 15
Perfect 14 11
Quite 8 0.8
own subjective impression that American speakers use absolutely more fre-
quently than British speakers in the response slot. Similarly, the absence of
brilliant and marvellous from the American data is almost certainly indicative of
a feature of dialect variation. Wow and sure would be felt by many British
speakers to be very typically American, and their greater frequency in the
American data would seem to confirm this. Cool might equally be thought of as
classically American, and indeed, most older generation British speakers would
consider it a relatively recent American import into British English via pop
culture. And yet its distribution is unexpectedly higher in the British data. This
is probably explained by the demographic spread of the two corpora; the
CANCODE British corpus contains a good deal of data collected by and
involving university students aged 18–25, while the American data has a much
broader age spread. Other items in Tables 3 and 4 are more closely matched:
true, great, definitely, gosh, really, excellent, wonderful, certainly, perfect and quite
all display consistency across the two varieties of English in terms of frequency
as response tokens.
60 Michael McCarthy
Wow 978 98
True 614 61
Gosh 602 81
Exactly 597 60
Absolutely 433 67
Right 379 38
Sure 258 40
Great 260 26
Definitely 162 28
Cool 144 25
Wonderful 138 16
Excellent 122 53
Good 115 11
Really 96 10
Fine 81 11
Certainly 52 7
Perfect 19 11
Lovely 7 7
Quite 0 0
Tables 3 and 4 show more than just the frequency with which the listed
items operate as response tokens; the percentage figures show to what extent
each item is what might be called ‘response-prone’. For example, gosh and wow
are overwhelmingly used as reactive responses, with only a very small number
of their occurrences accounted for by other contexts (mostly in speech reports).
Quite is the opposite: it is a frequent word in both corpora (in its role as an
modifying adverb), but its occurrence as a response token is minimal in the
British corpus and non-occurring in the American. Intuition and subjective
impressions suggest that quite as a single word response token is at the very least
rather formal in contemporary British speech, and may be on the verge of being
perceived as an archaism.
In the British list, 13 words have over 20% of their occurrences as single-
word response tokens, and for eight of the words over 30% of all occurrences
are single-word responses. In the American list, 11 words have over 20% of
their occurrences as single-word response tokens, and, as in the British list, for
eight of the words over 30% of all occurrences are single-word responses. The
Good listenership made plain 61
two lists for occurrences in the response slot in excess of 30% of all occurrences
are as in Table 5.
Expressing percentage figures as in Table 5 also enables at least a glimpse to be
obtained of possible degrees of pragmatic specialization which the words may
have undergone or are undergoing. There are interesting differences displayed
in the table, which may relate to different degrees of pragmatic specialisation in
each variety; for instance, absolutely and sure would seem to be more exclusively
pragmatically specialized as responses in American English than in British, with
right and definitely representing the opposite case, though again caution must
be expressed as regards possible imbalances in generic contexts.
In this section I consider the environments in which the response tokens occur
and illustrate the kinds of functions they typically fulfil. Each extract is labelled
according to its variety, British (Br.) or American (Am.), and items for com-
ment are in bold.
100
90
80
70
60
50
40
30
20
10
0
w ely tly e ht e ely
wo lut c sur rig Wn o d
lov
so exa go
ab
Br. Am.
Figure 1.Items showing a marked difference in distribution between the two corpora
62 Michael McCarthy
Table 5.Occurrences in the response slot in excess of 30% of all occurrences: British
and American data
British % American %
right 77 wow 98
gosh 76 gosh 81
true 70 absolutely 67
wow 69 true 61
cool 44 exactly 60
excellent 42 excellent 53
definitely 37 sure 40
exactly 33 right 38
Extract 3 illustrates sociable agreement asserted with right and reinforced with
definitely. Duplicated and clustered tokens will be returned to in Sections 5.5
and 5.6.
Extract 3 (Am.)
·$2Ò Everybody has some idiosyncrasy.
·$1Ò Right.
·$3Ò ·$EÒ laughs ·$\EÒ
·$1Ò Right. Yeah definitely. Definitely.
·$3Ò ·$EÒ laughs ·$\EÒ Oh I can vouch for that. I can vouch for that.
The tokens may often be used for ironic effect. In Extract 4, the response token
functions as acknowledgement but also (it may be inferred) with an intended
irony. The irony is inferable from the lack of a response indicating surprise or
horror (e.g. wow, or gosh or really!), which might have been predicted had the
main speaker’s initiating move been taken seriously, and the use instead of a
response token normally associated with assent/acquiescence:
Extract 4 (Am.)
[Casual conversation]
·$1Ò I made that myself. It’s for signing your name to get sick soon cards and
and hate mail and poison pen letters.
·$2Ò Sure.
Wow and gosh, in both varieties, represent strong affective reactions of surprise,
incredulity, delight, shock, horror, etc., by the listener:
Extract 5 (Am.)
·$2Ò This would not be the tournament. But she says we can get it. Would be
six hundred and four dollars. And so it’s almost a hundred dollars
more.
·$1Ò Wow.
·$2Ò To, not do the tournament but if just the two of us go.
Extract 6 (Br.)
·$1Ò Well we left about seven in the morning. Went home at seven at night
then seven at morning. Then had to go home and then start and milk
the cows by hand.
·$2Ò ·$EÒ laughs ·\$EÒ Gosh.
64 Michael McCarthy
·$2Ò Yeah.
·$3Ò Yeah. Fine. Okay Laura. That sounds fine.
Intensifying the tokens in this way is reminiscent of what Antaki (2000, 2002)
calls ‘high-grade assessments’, which in his data (phone calls and interview
data) relate to either institutional power or speakers claiming ‘ownership’ of the
stages of the interaction, in other words an assertion of power or (in our case
here) at the very least of confident equality of participation. Extracts 11 and 12
hardly manifest powerless or diffident speakers.
Examples include:
Extract 14 (Br.)
[Colleagues talking]
·$2Ò We’re not environmentally sound in the Union.
·$1Ò Oh definitely not.
·$2Ò Even though we’ve got an environmental officer.
·$1Ò No I know.
Extract 15 (Am.)
·$1Ò It was like twelve hundred calories or something.
·$2Ò Yeah and um
·$1Ò Don’t go to a hospital if you want to get well.
·$3Ò Yeah.
·$2Ò Absolutely not.
Extract 18 (Br.)
·$1Ò And how is Elaine?
·$2Ò Erm now she was very well about a month ago.
·$1Ò Right. Right.
·$2Ò And then she had a really bad couple of days.
·$1Ò Yeah.
·$2Ò And now she’s sort of picking up again slowly.
·$1Ò Right. Right.
Occasionally, triplets occur, which clearly serve greatly to intensify the response:
Extract 19 (Br.)
·$1Ò So I’ll I’ll just phone you when you know and ask er where you want it
sent or whatever and erm we’ll sort it out from there.
·$2Ò Right. Fine. Fine. That’s great.
Many of the items that occur as single words also occur frequently in short
clauses with that’s.
Extract 23 (Am.)
[Giving street directions]
·$4Ò You’ll come to a stop sign take a right and just follow it all the way out.
·$2Ò Oh. Perfect.
·$1Ò Great.
68 Michael McCarthy
The minimal clause is particularly evident with true, which seems to show a
preference for the clausal environment over the single-word option:
Extract 24 (Am.)
·$1Ò I know there’s a lot of salt in bread.
·$1Ò That’s true.
·$1Ò This doesn’t have much salt though.
Other words with a high occurrence of that’s clauses are good, great and fine,
for example:
Extract 25 (Br.)
[Assistant and customer in car spare parts department]
·$2Ò Yeah. Bob’s up in erm Manchester tomorrow so he can’t come
tomorrow.
·$1Ò That’s fine.
·$2Ò Er but I’ll get one of the lads in to come and do it for you.
·$1Ò Lovely.
6. Conclusion
The analysis of non-minimal response tokens shows a set of items in British and
American English which occur within the core, first 2000 frequency lists for each
variety. Although there are differences between the two varieties (see Figure 1
above), they have more in common than that which separates them. Their
commonality is seen in the examination of local contexts and functions, and both
varieties show a strong orientation by listeners towards relational aspects of the
discourse. In both varieties, ‘good listenership’ seems to demand more than just
acknowledgement and transactional efficiency, and listeners orientate towards the
creation and maintenance of sociability and affective well-being in their responses.
Cross-corpora comparisons of this inter-varietal type and comparisons
across languages are useful in two major respects. In the first place they may
confirm or may refute functional interpretations made from one variety or
language as not being idiosyncratic to one culture or speech community.
Spoken corpora as a locus for research into human communication always run
the risk that features of talk may be culture-bound, and it is only in intervarietal
and interlingual studies that one can find safer ground for generalisations.
70 Michael McCarthy
Transcription conventions
·$1Ò, ·$2Ò, etc. used to label each speaker consecutively in each conversation.
·$EÒ and ·\$EÒ beginning and end of non-verbal language event (e.g. laughter)
+ ‘latched’ turn, where a speaker continues a turn as if uninterrupted after
an interruption or overlap.
References
TITLE "Variation in the distribution of modal verbs in the British National Corpus"
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 4
Graeme Kennedy
Victoria University of Wellington
In his major study of English modal verbs, F. R. Palmer (1979: 1) made the
claim that “there is, perhaps, no area of English grammar that is both more
important and more difficult than the system of the modals”. It is not entirely
clear what led Palmer to accord such status and difficulty to modals verbs, nor
the extent to which he had in mind either linguistic description or language
learning. However, like most other studies of modals since the 1970’s, it was
the semantics of modal use which was the central focus of his rich and detailed
analysis. Palmer’s study made use of the Survey of English Usage Corpus, the
last major pre-electronic corpus. Modern electronic corpora now make it
possible to explore the nature and use of linguistic phenomena in a much
wider variety of texts. Such descriptions go beyond exploring what is grammat-
ically and semantically possible, and add a distributional dimension which
characterizes linguistic features in terms of probability of occurrence. Corpus-
based distributional analysis also makes it possible to extend our understand-
ing of linguistic variation across different genres in different domains of use.
This will be illustrated in the present paper by exploring the distribution of
modal verbs and the complex verb phrase structures in which they occur in the
British National Corpus.
Verbs constitute about 20 per cent of all the word tokens used in English,
and in written texts, modal verbs typically constitute about 8 per cent of all verb
forms. Modals form a small, semi-closed set of nine auxiliary verbs, most of
which express both core ‘deontic’ meanings such as ‘obligation’, ‘intention’ or
‘permission’ (e.g. ‘You must be home by 10 pm’) or ‘epistemic’ meanings
associated with truth conditions and assessment of degrees of certainty (e.g.
‘You must be our new neighbours’). The ‘central’ English modals are usually
considered to be will, would, can, could, may, might, shall, should, must. In
74 Graeme Kennedy
from the last two decades of the 20th century makes it possible to explore
variation in a wide range of domains and genre types and to extend our
understanding of how modals are used. The BNC consists of 10 million words
of spoken English sampled from many domains of use by participants from four
socio-economic groupings in some 38 geographical locations in Great Britain.
There are also 90 million words of written English prose in the BNC, selected
from nine genres. Eighty percent of the written texts are ‘informative’ and
twenty percent ‘imaginative’. The written texts from informative genres were
selected from Natural and Pure Science, Applied Science, Social Science, World
Affairs, Commerce and Finance, Arts, Leisure, Belief and Thought. The project
has been sponsored by major British universities and publishers and the British
government.
This paper is based on an analysis of all 1.45 million words which are
grammatically tagged as modals in the BNC, with the exception of a very small
number of forms such as shalt, wilt and mayst. In such a large corpus there is
inevitably a margin of error in word counts because of incorrect tagging. For
example, word breaks with forms such as toucan, or can-can sometimes lead to
tagging errors. Occasional tagging of the month of May as a modal appears to
be one of the most persistent errors. Analysis of a carefully checked two-
million-word sample from the corpus shows that because of tagging errors, the
number of tokens of the modals may and will may be slightly exaggerated,
whereas the numbers of can, could, must and should may be slightly under-
estimated. However, mistagging is not considered to have had an undue
influence on the present analysis and this is supported in that the overall rank
ordering of the central modals in the BNC as shown in this study is identical to
that in the 40-million-word LSWE Corpus ( Biber et al 1999: 486).1
Columns 4 and 5 show that modals occur with much greater frequency in
spoken texts than in written. In the spoken texts in the corpus there are 215,485
modals in 10 million words. Whereas column 3 shows that overall in the BNC
modals occur at the rate of 14.6 per thousand words, they occur in spoken texts
at a rate of 21.5 modals per thousand words. Will, would and can are all more
frequent proportionately in spoken than in written texts. In the written texts of
the BNC there are 1,242,236 modals in 90 million words, occurring at the rate
of 13.8 modals per thousand words. In Coates’ 1983 study of modals in LOB
and the LLC, with texts from the 1960’s, she estimated that spoken British
English had 17.7 modals per thousand words, and written British English had
14.6 per thousand words. The difference between spoken and written texts in
the frequency with which modals are used is thus shown in the BNC to be even
greater than Coates estimated.
Column 3 in Table 2 shows that overall in the BNC, 10.7% of all the modals
occur in elided or contracted forms (not counting the contraction of not).
However, there is clearly a much higher proportion in spoken English. In the
spoken texts in the corpus (column 4), 29.2% of the modals occur in contracted
forms. In the written texts (column 5), the proportion of modals occurring in
contracted forms is 7.4%.
Column 7 shows that 172,882 of the modal tokens in the BNC (11.9%)
occur in negative contexts. Overall, columns 6 and 7 show that there are about
eight times as many modals in affirmative contexts as compared with negative
contexts, with 12.9 affirmative modals per thousand words, and 1.7 negative
modals per thousand words. But here too, there is obviously substantial
variation. If columns 6 and 7 are compared, the four most frequent modals
account for 71% of the tokens in column 6 for affirmative contexts, whereas
these same four account for 83.5% of the negative tokens. Further, in negative
contexts (column 7), almost half of the modal tokens come from just two
modals, can(not)/can’t and could (not).
Columns 8 and 9 show the distribution of modals in negative contexts in
the spoken and written texts. The 34,445 tokens in column 8 are almost 16% of
the total spoken modals and represent 3.4 negative modals per thousand words.
This should be compared with the 138,436 negative modals in the written texts
(column 9). Only 11.1% of the modals in written texts are in negative contexts,
occurring at 1.5 negative modals per thousand words. Thus negative modals are
over twice as frequent in spoken British English as in written in the 1990s —
especially can’t, won’t, wouldn’t. In the spoken texts of column 8, can’t/cannot
is more frequent than won’t/will not and wouldn’t combined.
Variation in the distribution of modal verbs 79
The rank ordering of individual modals in the BNC clearly varies consider-
ably according to whether the medium is spoken or written and whether or not
negation is involved. In spoken texts, the rank order of the modals is somewhat
different from the order in the written corpus. In spoken texts, can rises in the
ranking in negative contexts. The importance of examining the occurrence of
modals in negative contexts is well-illustrated with can. Columns 7 and 8 show
that where there is negation users of British English clearly prefer to say can’t, or
couldn’t, rather than won’t, mustn’t or shouldn’t, whereas in affirmative contexts
there is proportionately lower use of can and could. The high incidence of can in
negative contexts suggests that perhaps in speech, which is typically face-to-face,
external constraints prevail, or are preferred, rather than expressions of volition
or obligation (I can’t help you rather than I won’t help you). Unpleasant things
like refusals and prohibitions are perhaps best left to a faceless written medium
or are more appropriately expressed through the use of the passive voice, (e.g.
Something must be done rather than You must do it), where external constraints
beyond the control of the speaker can also be implied.
1 All BNC 22.9 19.9 18.3 11.6 7.8 4.2 1.4 7.6 4.9 0.4 0.2 0.1 0.8 100
2 All BNC spoken 26.5 21.5 23.1 9.4 2.3 3.9 1.3 5.7 2.8 0.6 0.1 0.1 2.8 100
3 All BNC written 22.2 19.6 17.4 11.9 8.7 4.3 1.4 8.0 5.3 0.4 0.3 0.1 0.4 100
4 Imaginative 19.1 26.8 12.7 19.0 2.5 5.3 1.8 5.4 5.9 0.6 0.2 0.1 0.6 100
prose
5 Natural and pure 17.6 11.8 27.3 7.5 17.4 4.0 1.2 7.3 5.4 0.2 0.3 0.0 0.1 100
science
6 Applied science 27.5 12.2 22.6 8.0 12.2 3.2 0.4 8.3 5.0 0.2 0.2 0.0 0.2 100
7 Social science 19.8 14.5 19.5 7.7 14.8 4.4 1.8 10.5 5.8 0.4 0.4 0.0 0.5 100
8 World affairs 18.9 27.0 11.7 13.7 7.3 4.6 1.8 8.8 5.1 0.4 0.2 0.1 0.3 100
9 Commerce and 26.3 14.3 18.0 6.6 13.6 3.7 1.7 10.2 5.0 0.3 0.3 0.0 0.2 100
finance
10 Arts 22.1 18.6 21.0 11.2 7.8 4.7 1.1 7.2 4.9 0.3 0.2 0.1 0.9 100
11 Belief and 16.6 16.9 22.6 9.7 11.6 5.2 2.0 7.2 6.7 0.6 0.4 0.1 0.4 100
thought
12 Leisure 28.1 15.0 23.0 9.7 7.2 2.9 0.5 8.4 4.3 0.2 0.2 0.0 0.6 100
Variation in the distribution of modal verbs 81
At a formal level of analysis, the modal verbs might seem to be used in quite a
simple canonical paradigm followed by the bare infinitive (including be and
have) of a lexical verb. (e.g. Fred and Sue will/ would/ can/ etc go to the movies.)
Greenbaum (1996: 246–7) noted however, that modals are followed not only by
the infinitive of a lexical verb. He wrote, “the auxiliaries appear in a set se-
quence: modal – perfect have – progressive be – passive be – main verb….It is
not usual for all to be present in one verb phrase though it is certainly possible”.
Table 4 shows the nine verb phrase structures in which modal verbs can
occur, at least two of which can have a noun or adjective instead of a past
participle (# 6, 9), and one of which (# 2) has about 25% of its tokens with be or
have as the infinitive. Mindt (1995) has explored the distribution of modal verb
phrase structures in a corpus of fiction texts. These modal verb-phrase struc-
tures are sometimes given names such as ‘modal perfect passive ‘ (e.g. He could
have been told). All structures except 1 and 5 have be either as part of the passive
or progressive, or as a lexical verb in its own right. The multifunctional be is
probably responsible for many learners’ difficulties with modal verbs. Without
a distributional analysis a learner might conclude that all nine structures are
used equally since most grammars do not provide an analysis of their relative
frequencies. For language learners who are trying to learn to sequence the items
correctly, such an assumption would be a serious misconception, as Biber et al
1999: 498–501 has demonstrated. “While the majority of modals do not co-
occur with marked voice or aspect, particular modals show differing preferences
for these combinations”. They note, for example, that while can, could, should,
82 Graeme Kennedy
and must are fairly common in passive constructions, may, might, should and
must are the most frequently used modals with perfect aspect. Data from the
BNC strongly support the general picture of variation in the use of verb phrase
structures painted by Mindt and Biber et al.
Table 5 contains an analysis of the distribution of the nine modal verb
phrase structures in different genres of the BNC. For the corpus as a whole,
column 1 shows that there are huge differences in the relative distributions of
use of the nine structures. The modal + infinitive structure (structure 2)
accounts for 76% of all modal tokens in the corpus. Column 2 shows an even
higher proportion of use of structure 2 in spoken texts (84.9%) and imaginative
prose (81.3%). Structure 3 contains the passive voice, and together with
structure 2 accounts for over 90% of all modal tokens in the BNC as a whole, as
well as in most genres. Structure 1 (modal alone) occurs proportionately more
Table 5.Modal verb phrase structures in genres of the BNC (%)
1 2 3 4 5 6 7 8 9 10 11 12
Modal structure All BNC All BNC All BNC Imaginative Natural and Applied Social World Commerce Arts Belief and Leisure
spoken written prose pure science science science affairs and finance thought
1 Modal alone 1.9 6.0 1.3 3.1 0.4 0.5 0.5 0.6 0.6 1.1 0.7 0.9
2 Modal + infinitive 76.0 84.9 74.5 81.3 67.4 71.0 69.4 71.0 72.0 75.3 75.0 76.7
3 Modal + be + past ptc 14.7 3.4 16.6 4.8 27.1 24.3 25.2 19.2 23.1 15.0 17.3 14.9
4 Modal + be + pres ptc 1.5 1.8 1.5 1.7 0.7 1.2 1.0 1.1 1.1 1.8 1.3 2.0
5 Modal + have + past ptc 5.1 3.6 5.4 8.4 3.7 2.5 3.1 6.8 2.6 5.9 5.0 4.8
6 Modal + be + being + 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
past ptc
7 Modal + have + been + 0.7 0.2 0.7 0.5 0.7 0.6 0.7 1.2 0.6 0.8 0.7 0.6
past ptc
8 Modal + have + been + 0.1 0.1 0.1 0.2 0.0 0.0 0.1 0.1 0.0 0.1 0.1 0.1
pres ptc
9 Modal + have been being 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
+ past ptc
Total 100 100 100 100 100 100 100 100 100 100 100 100
Variation in the distribution of modal verbs
83
84 Graeme Kennedy
in speech and imaginative prose, and reflects the use of ellipsis in conversational
discourse. (e.g. Can any of you come back tomorrow? — Yes, I can.) In the
spoken texts 6% of the modal tokens occur in structure 1. Conversely, structure
3 (modal with passive) is used much less in spoken English and imaginative
prose than in other genres, but proportionately more in the various ‘Sciences’,
‘World Affairs’ and ‘Commerce and Finance’ (columns 5–9). In the ‘Social
Science’ texts in the BNC 25% of the modal verb tokens occur in structure 3.
Structures 4, 5, 7 and 8 do not have high use in any genre, and structures 6 and
9 are extremely rare. In the BNC most tokens of structures 4,5,7 and 8 express
epistemic modality (e.g. ‘She must have known about it’).
Whereas Table 5 shows the distribution of the nine modal verb phrase
structures in different genres of the BNC, Table 6 shows the extent to which
different modals make use of the nine verb phrase structures.
As noted above, overall for all modals in the BNC, structure 2 (modal +
infinitive) is predominant with 76% of modal tokens occurring in this struc-
ture. There is considerable variation, however. Whereas dare and used to have
over 94% of their tokens occurring in structure 2, should and must have about
63% and 65% respectively, perhaps because both should and must have a lower
proportion of deontic uses, and a correspondingly higher proportion of
epistemic uses than other modals in the active voiced structure 2.
In structure 3, on the other hand, where the mean for all tokens is 14.7%,
should (26.3%), must (19.5%) and can (21.9%) have a much higher proportion
of their tokens than any of the other modals. This structure contains the
passive voice which, by means of agent deletion, can be used to imply possibly
externally-imposed obligation. Things should be done, must be done or can be
done but not necessarily by the speaker or listener. Would, which seems to have
many of its tokens in reported speech, has a notably lower use of structure 3,
with only 7% of its tokens in that structure. Would, could, may, might, and
must also have a notably higher proportion of their uses in structure 5,
typically in epistemic uses (Mary must have known) than is the case for the
other modals. Will, can and shall are rarely used in structure 5. Table 6 also
shows clearly that none of the modals have a significant proportion of their
uses in the BNC in structures 6–9.
Table 7 shows the extent to which each individual modal contributes to the
total number of tokens of modal use for each of the nine modal verb phrase
structures. For structure 1 ( the modal alone), 83% of the modal tokens come
from will, would, can and could, with can contributing over a quarter of all
tokens. In structure 2, it is will which contributes the highest proportion (24.9%)
Table 6.Distribution of modal verbs in verb phrase structures of the BNC
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Modal structures Mean for Will Would Can Could May Might Shall Should Must Ought to Need to Dare Used to
all modals % % % % % % % % % % % % %
in BNC
%
1 Modal alone 1.9 2.0 2.0 2.8 2.4 0.5 1.5 3.1 1.6 1.0 2.5 1.9 5.0 1.2
2 Modal + infin. 76.0 82.6 78.8 74.6 75.7 72.4 70.9 83.4 63.2 65.3 72.8 76.7 94.2 94.5
3 Modal + be + past ptc. 14.7 11.6 7.1 21.9 13.6 17.3 11.3 9.5 26.3 19.5 13.5 14.7 0.1 3.9
4 Modal + be + pres. ptc. 1.5 2.8 1.5 0.3 0.6 1.5 1.8 3.4 1.7 1.4 4.0 0.3 0.0 0.3
5 Modal + have + past 5.1 0.9 9.7 0.3 6.6 7.0 12.4 0.5 6.1 11.3 5.9 5.9 0.7 0.1
ptc.
6 Modal + be + being + 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
past ptc.
7 Modal + have + been + 0.7 0.2 0.9 0.1 1.1 1.3 1.8 0.1 0.9 1.0 1.2 0.5 0.0 0.0
past ptc.
8 Modal + have + been + 0.1 0.0 0.1 0.0 0.1 0.1 0.3 0.0 0.2 0.5 0.1 0.1 0.0 0.0
pres. Ptc.
9 Modal + have been 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
being + past ptc.
Total 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Variation in the distribution of modal verbs
85
86 Graeme Kennedy
of the modal tokens, while in structure 3, on the other hand, can is again the
most frequently-occurring modal (27.2%).
As noted above, structure 5 is commonly used epistemically. It is particular-
ly noteworthy how will (4%) and would (37.7%) differ in the contribution they
make to the overall total for structure 5, and how this is the reverse of the
relationship between will (41.3%) and would (20%) in structure 4. Structures 6
and 9 are highly infrequent. Structure 6 has only 48 tokens in the whole of the
BNC, of which 25% of the tokens occur with may as the modal verb. Structure
9 has only two tokens in the whole BNC. Both of these occur with the modal
may. Structure 8, which is also relatively infrequent, has almost half of the
modal tokens contributed by must and would, typically with epistemic uses. The
marginal modals, ought to, need to, dare and used to are very infrequent.
4. Conclusions
Analysis of the BNC thus confirms the findings of earlier studies, often in
smaller and less representative corpora that there is substantial and multidi-
mensional variation in the use of modal verbs and the structures they occur in.
First, there are differences in the relative use made of individual modals. The
BNC generally supports the estimate of the relative frequency of the modal
verbs made by Coates (1985) based on the LLC and LOB texts from the 1960s.
The BNC suggests that in the 1990’s the use of must and shall may be propor-
tionately less than found in Coates’ study, but the frequency of can and to a
lesser extent, will may have increased. In the BNC will accounts for almost 23%
of all modal tokens, followed by would, can, and could, with can being especially
frequent in spoken texts. Semantically, those modals associated with the deontic
meanings of ‘willingness/intention’, ‘ability’, ‘habit’ and ‘hypothesis’, and the
epistemic uses of ‘prediction’, ‘certainty’ and ‘possibility/probability’ account
for 73% of the modal tokens in the BNC. The use of modals to express the
deontic meanings of ‘obligation/necessity’ and ‘permission’ is relatively
infrequent in the BNC.
Tables 2–7 suggest that there is also systematic variation associated with
whether the texts are of spoken or written origin, whether the verb phrase is
affirmative or negative, what genre a particular modal occurs in, and which
complex verb phrase structure a particular modal token is used in.
There are over 56% more modals per 1000 words in speech than in writing.
This is possibly because of the role of modality, especially in face-to-face spoken
Table 7.Distribution of modal verb phrase structures in the BNC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Modal structures Tokens in Will Would Can Could May Might Shall Should Must Ought to Need to Dare Used Total
all BNC % % % % % % % % % % % % to %
%
1 Modal alone 28,643 23.0 19.8 26.0 14.1 1.8 3.2 2.2 6.1 2.4 0.5 0.2 0.1 0.5 100
2 Modal + infin 1,108,046 24.9 20.6 17.9 11.5 7.4 3.9 1.6 6.4 4.3 0.4 0.2 0.1 1.0 100
3 Modal + be + past ptc 213,906 18.1 9.6 27.2 10.7 19.1 3.2 0.9 13.7 6.6 0.4 0.2 0.0 0.2 100
4 Modal + be + pres ptc 22,190 41.3 20.0 3.8 4.7 7.6 5.0 3.2 8.5 4.7 1.1 0.1 0.0 0.2 100
5 Modal + have + past 74,082 4.0 37.7 1.2 14.9 10.7 10.3 0.1 9.2 11.0 0.5 0.3 0.1 0.0 100
ptc
6 Modal + be + being + 48 10.4 14.6 2.1 8.3 25.0 10.4 0.0 12.5 12.5 4.2 0.0 0.0 0.0 100
past ptc
7 Modal + have + been + 9,433 6.1 27.2 1.0 19.0 15.0 11.7 0.3 11.0 7.9 0.7 0.2 0.0 0.0 100
past ptc
8 Modal + have + been + 1,371 2.8 22.6 1.3 10.1 11.7 14.3 0.2 11.3 25.1 0.6 0.2 0.0 0.0 100
pres ptc
9 Modal + have been 2 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100
being + past ptc
Total 1,457,721
Variation in the distribution of modal verbs
87
88 Graeme Kennedy
Prose’ and ‘World Affairs’ than in other genres. The use of modals with
progressive aspect in the BNC is extremely rare as is shown in Table 5 for
structures 4, 6, 8, and 9. Tables 6 and 7, suggest that the variation in the use of
individual modals in the extent to which they each use the nine modal struc-
tures is even more striking than variation among genres.
It can be argued that corpus-based descriptions of aspects of language are
of interest in their own right because they can improve descriptive adequacy by
adding a distributional dimension to linguistic description (Kennedy, 1998).
Corpus-based descriptions of English are already being reflected in innovative
new grammars. The most comprehensive of these, by Biber et al 1999, is a
major work of scholarship. Although the uses which are made of grammatical
descriptions are not entirely predictable, their use in language education to
develop curricula and new teaching materials is well-established. For teaching
and learning the use of modal verbs this analysis of their occurrence in the BNC
suggests that among the individual modals, the nine ‘central’ modals and their
elided and negative forms continue to be especially important, (although shall
is considerably less frequent than the other eight), and some should be the focus
of particular pedagogical attention so that learners can develop proficiency
handling the semantic functions of ‘willingness/intention’, ‘certainty’, ‘possibili-
ty/probability’, ‘prediction’, ‘ability’, and ‘hypothesis’ in both affirmative and
negative contexts.
This analysis also suggests that while there is pervasive variation in the
distribution of modal verbs in different genres and media, there is substantial
stability in their use in complex verb phrase structures. The 100 million words in
the spoken and written texts of the BNC represent the equivalent of about 10,000
hours of continuous spoken discourse, equivalent to the exposure which an
individual might receive if exposed to English for 8 hours per day for 3.5 years.
Over this period of time, Table 7 shows that we could expect to meet structure 9
twice, structure 6 48 times, and structure 8 1371 times. On the basis of this
evidence, structures 1,2,3 and 5 clearly have a high priority in language pedagogy.
As part of the consciousness-raising of teachers, we might expect that such insights
from corpus-based analysis will help associate use with usefulness, so that linguistic
items become part of language education not just because they exist, but
because they are used often enough to justify inclusion in instruction.
The analysis of the use of modal verbs in a large corpus also demonstrates
that linguistic variation is characteristically a probabilistic phenomenon rather
than an absolute one. In the present study this has been illustrated at the level
of genre. It can be anticipated nevertheless that sociolinguistic or regional
</TARGET "ken">
90 Graeme Kennedy
varieties of the language are similarly likely to show not the presence or absence
of particular linguistic phenomena, but a tendency for them to be used more or
less than in other varieties.
Note
References
Aijmer, K. and Altenberg, B. (eds) 1991. English Corpus Linguistics: Studies in Honour of Jan
Svartvik. London: Longman.
Biber, D., Conrad, S., and Reppen, R. 1998. Corpus Linguistics. Cambridge: Cambridge
University Press.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of
Spoken and Written English. Harlow: Pearson Education.
Coates, J. 1983. The Semantics of the Modal Auxiliaries. London: Croom Helm.
Collins, P. 1991 “The Modals of Obligation and Necessity in Australian English”. In Aijmer
and Altenberg (eds) 145–165.
Greenbaum, S. 1996. The Oxford English Grammar. Oxford: Oxford University Press.
Kennedy, G. 1998. An Introduction to Corpus Linguistics. London: Longman.
Mindt, D. 1995. An Empirical Grammar of the English Verb: Modal Verbs. Berlin: Cornelsen.
Palmer, F. 1979. Modality and the English Modals. London: Longman.
Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. 1985. A Comprehensive Grammar of the
English Language. London: Longman.
<TARGET "haa" DOCINFO
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 5
Ferdinand de Haan
University of New Mexico
1. Introduction
deriving from the corpus is based on a current description of the language. All too
often people make use of 19th and early 20th century texts in Russian studies.
The study of modality and negation in Russian is important because there
are a number of areas in which Russian potentially differs from English. First,
the modal system of Russian is not as grammaticalized as the English system
(see Section 2), and this may lead to duplications and ambiguities. Second, the
Russian sentence structure differs from the English one. In English, scope of
negation and modal is expressed by means of different modal verbs, while in
Russian it can depend on the position of the negation which scope interpreta-
tion is meant (see De Haan 1997 for a typological discussion of the interaction
of modality and negation). A better understanding of the relationship between
syntactic and semantic scope is needed. This paper is a start in that direction.
One note on terminology. In this paper I will use the traditional terms
epistemic and deontic modality, for ease of reference. The two meanings are
shown in (1) below, using the English strong modal verb must as example:
(1) a. John must go to New York tomorrow.
b. John must be at home: the light is on.
Sentence (1a) shows the deontic use: there is an obligation for John to perform
an action. Deontic modality deals with obligation and permission. Sentence
(1b) is epistemic: the speaker assigns a likelihood to the statement that John is
home. The relative degree of likelihood, or confidence on the part of the speaker
in what he or she is saying is the domain of epistemic modality (see Palmer 1986
for an introduction to these terms). These terms are the traditional terms, but
in recent years other terms have been coined. Instead of deontic modality, quite
often the term root modality is used (e.g., in Coates 1983) and terms like agent-
oriented modality (Bybee, Perkins and Pagliuca 1994) or participant-oriented
modality (Van der Auwera and Plungian 1998) have been used. While all these
terms are supposed to have slight differences in meaning, in practice these
terms tend to be used interchangeably. There is no consensus on terminology
and the traditional terms are used in this paper, if only for the reason that these
terms are the most familiar.
The rest of this paper is structured as follows: Section 2 discusses the basic
modal elements in Russian and its relation with negation. Section 3 is an
introduction to indeterminacy as defined in Coates 1983. Sections 4 through 7
discuss the modals most commonly used in the literature when discussing
strong modality and negation. Section 8 summarizes the implications of
NEG-raising while Section 9 draws some conclusions.
Strong modality and negation in Russian 93
2. Modality in Russian
Some of these modals (namely the verb moč’ and the adverb dolžen) require a
subject in the Nominative (if there is one), while the impersonal modals require
that the subject is in the Dative (nado, nužno, nel’zja, for instance). Some
modals have an inherent past and future tense (e.g., the verbs moč’, past tense
mog; prixodit’sja, past tense prišlos’), others (the adverbs) need the auxiliary verb
byt’ ‘to be’ to form past and future tenses (e.g., dolžen, past tense dolžen byl).3
Simple negation in Russian is expressed with the particle ne, placed before
the verb. In case of a modal word in the sentence, ne can be placed before the
modal (as in sentence (3a) below), before the main verb (3b), or before both
(3c). all examples come from the scholarly literature.
(3) a. Ne nado vyzvat’I ego, on priedet sam.4
‘There is no need to summon him, he’ll come himself.’
b. Nado ne tol’ko vsestoronne podgotovit’sjaP k igre, nado tak že osno-
vatel’no gotovit’sja k každoj trenirovke.
94 Ferdinand de Haan
Even though the linear order would lead us to expect a wide scope interpreta-
tion (i.e, a translation with need not or its equivalent), we have instead a narrow
scope interpretation, translated by should not.
This is the theory, but as the examples from the corpus will illustrate, in
practise the situation is more complicated. It is not quite clear, for instance, why
sentence (4) above is not translated with need not (and the interpretation of the
sentence then becomes He is a very punctual person so there is no need for him to
be late). In other words, can we always distinguish between both scope interpre-
tations? This leads us to the problem of indeterminacy which is addressed in the
next section
It has been claimed in the literature (e.g., Forsyth 1970, Rappaport 1985)
that there is a relation in Russian between modality, negation, and the choice of
aspect of the main verb.5 An example will illustrate this (example and discus-
sion from Rappaport 1985: 206). The modal nel’zja (see Section 6 for details) is
ambiguous between a negated possibility and a negated obligation. When the
interpretation of nel’zja is ‘impossible’ the main verb must be in the perfective
Strong modality and negation in Russian 95
aspect form (as illustrated in sentence (5a)), but if the sense is ‘must not’, the
main verb has to be in the imperfective aspect form (sentence (5b)).
(5) a. Zdes’ net telefona, otsjuda nel’zja pozvonit’P.
‘There is no telephone here; it is impossible to call from here.’
b. Otsjuda nel’zja zvonit’I, my pomešaem ljudjam rabotat’.
‘One mustn’t call from here; we will disturb people working.’
3. Indeterminacy
In her corpus study of modality in English, Coates (1983) showed that it is often
difficult to determine the precise status of a given modal. This problem is
known as indeterminacy. Coates (1983: 14–7) distinguishes three types of
indeterminacy:
– Gradience, or the continuum of meaning of a given modal.
– Ambiguity, when it is not possible to determine which meaning is intended
and when the interpretation makes a difference.
– Merger, similar to ambiguity but with the difference that the two meanings
are not mutually exclusive.
Especially the last two types of indeterminacy are of importance in this study. In
(6), an example of ambiguity is shown (Coates 1983: 16):
(6) He must understand that we mean business.
must be chosen. Given that the two meanings are distinct, this proves, accord-
ing to Coates, the existence of the epistemic-deontic distinction.6
Sometimes, in the case of merger, the two meanings are not mutually
exclusive. In such a case, given the context, both interpretations can make sense
and it is not necessary to determine the correct interpretation. The classic
example is the exchange of (7), from Coates (1983: 17):
(7) A: Newcastle Brown is a jolly good beer.
B: Is it?
A: Well it ought to be at that price.
In this exchange, the modal ought in the third sentence can be interpreted as
either a deontic modal (the brewers of Newcastle Brown have an obligation to
put out a good beer given the high price) or an epistemic modal (“It costs a lot,
therefore it is good”). In English, merger occurs often with the modals should
and ought. Indeterminacy occurs in Russian as well as will be demonstrated in
the next sections.
The modal dolžen is the prototypical modal to express strong deontic modality,
or obligation in Russian. In sentence (8a) above, the obligation is an opinion
expressed by the writer of the article and is more subjective in nature. Typically,
Strong modality and negation in Russian 97
the obligation stems from an unspecified source rather than from the discourse
participants. For this reason, subjects with dolžen are quite often inanimate. In
the examples under (9) below, the obligation is more objective in nature.
Sentence (9a), from an article on new equipment for ambulances, shows an
obligation in the form of a rule or law, while sentence (9b), from a text on
equipment for a new rocket ship, shows an obligation imposed by the laws of
physics (the discussion revolves around weight in space).
(9) a. V každoj takoj mašine dolžen byt’I i defibrilljator .
‘In every such car must also be a defibrilator.’
(Izvestija, 1987)
b. Sama radioantenna dolžna vesit’I 700 kilogrammov.
‘The radio antenna alone must weigh 700 kilograms.’
(Izvestija, 1988)
The aspect of the main verb following dolžen is transparent. The presence of the
modal has no effect on the choice of aspect of the main verb. As can be seen
from the examples in (8) above, both aspects are possible, because the normal
rules for determining the aspectual choice of the main verb apply.8
Tense of the modal is expressed by means of adding the auxiliary verb byt’
‘to be’ as is shown in (10) below. The function of the auxiliary is to show that
the obligation existed in the past (but not in the present) as in sentence (10a),
or will exist in the future, as in (10b). In sentence (10a) the obligation existed in
the past but is no longer relevant for the present, hence the use of the past tense
byla. Sentence (10b) shows that there will exist an obligation in the future, but
this obligation does not yet exist in the present.
(10) a. A dal’še reč’ … dolžna byla pojtiP o preémnike.
‘And further the speech (FEM) … had to be (lit. go) about the
successor.’ (Izvestija, 1987)
b. E s l i ž e n a r o d ny e z a s e d a t e l i … b u d u t o b ” e d i n e ny v
samostojatel’nuju organizaciju , oni dolžny budut prinjat’I rešenie
sami, bez učastija sud’i-professionala.
‘If then the people’s assessors … will be united in an independent
organization, they will have to take decisions by themselves, with-
out participation of a professional judge.’ (Izvestija, 1987)
Most of the time the obligation is a present obligation (in the corpus this occurs
in 86.5% of all cases, as opposed to 12.8% of cases with a past tense and only
0.7% of dolžen with a future tense), or includes the present in its obligation.
98 Ferdinand de Haan
However, it is quite often the case that the linear order does not reflect the
relative scope of the modal. The linear order is that of a wide scope negation
(i.e, ne dolžen), but the interpretation is narrow scope (i.e., dolžen ne, translated
as must not), and examples are shown in (13) below.
(13) a. Zapomni ešče raz : nikto pro menja ne dolžen daže dogadyvat’sjaI.
‘Remember once again: nobody must even guess about my exist-
ence.’ (Rasputin, V., Zhivi i pomni)
b. Samo po sebe èto položenie edva li možet vyzvat’ vozraženija. No
pri opredelenii statusa èstonskogo jazyka ne dolžny uščemljat’sjaI
Strong modality and negation in Russian 99
Unlike dolžen, the impersonal verb prixodit’sja ‘have to’ is a relatively straight-
forward modal.10 It is in the corpus often used to denote multiple events, as in
sentence (14) below whereas dolžen rarely is used for multiple events.
(14) Mne prixodilos’ ne raz slyšat’I slova kolleg.
‘I (DAT) had to listen more than once to the words of my colleagues.’
(Nauka i Zhizn’, 1988)
This confirms Nichols’ (1985: 99) observation that prixodit’sja often occurs with
explicit expressions of multiplicity, such as ne raz ‘more than once’, často ‘often’
whereas dolžen occurs rarely in such contexts. There is no 100% correlation
between prixodit’sja and its use to denote strong modality in multiple events,
100 Ferdinand de Haan
however (see, e.g., sentence (15) below). Because of this occurrence with
multiple events, the main verb accompanying prixodit’sja is almost always in the
imperfective. In the corpus, only one occurrence had a main verb in the
perfective. Sentence (15) uses the main verb priznat’ ‘admit’ in the perfective
due to the fact that we are dealing with a singular event with an inherent end.
(15) No teper’ prixoditsja priznat’P, čto èto uže ne čisto biologičeskaja nauka,
a kompleksnaja, v kotoruju vovlečeny praktičeski vsestorony žizni i
dejatel’nosti čeloveka.
‘But now one must admit that this is no longer a pure biological science,
but a complex one, which involves practically every aspect of life and
human endeavor.’ (Nauka i Zhizn’, 1987)
The modal prišlos’ in (16b) occurs without explicit main verb, but the intended
verb is dovoevat’ (‘to fight on’).
As is the case with dolžen, NEG-raising also occurs with prixodit’sja. In
sentence (16c) the interpretation is one of wide scope (there is clearly a
Strong modality and negation in Russian 101
The modal nel’zja is an inherently negative form with a fairly broad set of
meanings. In textbooks, it is usually stated that nel’zja can denote negated
ability (impossible), negated possibility (cannot), and negated permission (must
not). It is frequently seen as the negative counterpart to the adverb možno
‘possible, allowed’ which cannot be combined with a negation.11
In addition, it is usually mentioned that nel’zja quite often occurs without
overt subject. In the corpus, nel’zja most often occurs without overt subject. Of
the 301 occurrences of nel’zja in the corpus, 280 were without overt subject
(93.0%), 15 occurrences were with a third person subject (5.0%), 4 with a first
person subject (1.3%), and two occurrences were with a second person subject
(0.7%). While most modals can not occur with an overt subject, none has such
a high correlation with the absence of the subject.
There is also an idiomatic expression with nel’zja, namely kak nel’zja lučše
‘better than ever’ (or a variation of this idiom) of which there are ten occur-
rences in the corpus. This use of nel’zja has been disregarded in this study.
In the corpus, the range of nel’zja is restricted. The epistemic interpretation
of nel’zja is by far the most common interpretation (170 out of 301 examples).
This epistemic reading is in all cases the cannot, or, impossible one. Typical
examples are:
(18) a. Odnako mnogie iz nix naxodjatsja sejčas kak by na pereput’e-
ponimajut, čto dejstvovat’I po-staromu nel’zja, a po-novomu ne
102 Ferdinand de Haan
mogut, ne umejut.
‘However, many of them find themselves at a crossroads as it were.
They understand that they can’t act as in the old days, but they can-
not, are not capable, of doing thing the new way.’ (Pravda, 1988)
b. Slovami ètot zapax opisat’ nevozmožno. Pripomnit’P ego usiliem
mysli tože nel’zja. Kakoj on byl?
‘This smell could not be described in words. It was also impossible
to remember it with all (my) might. What was it?
(Grekova, I., Kafedra, 1980)
We now turn to the role of the aspect of the main verb. It has been often
observed in the literature (e.g., Forsyth 1970, Rappaport 1985) that the choice
of aspect has an influence on the interpretation of nel’zja. Example sentences
were shown in (5) above. These sentences were all unambiguous because of
context. The claim that is frequently made is that the choice of aspect without
Strong modality and negation in Russian 103
Even though the inability reading is present in sentence (22a), and therefore the
main verb should be in the perfective aspect, the imperfective form is used to
show that the action described (the showing of movies) is an ongoing activity,
which is expressed by the imperfective aspect in Russian. Conversely, because
the act of destroying a city is an action with an inherent endpoint, the perfective
aspect is appropriate in sentence (22b). In both cases, the normal rules that
govern aspectual choice override the alleged opacity of the modal.
This is certainly also the case in the corpus. In many cases there was no clear
relationship between the choice of aspect and the choice of modal interpreta-
tion. In many cases the precise modality was not clear (as in sentence (23a)
below), and in other cases a different aspect than predicted was obtained.
(23) a. Slovno by zabyvaetsja i o tom, čto est’ vozrast, fiziologičeski naibolee
blagoprijatnyj dlja obučenija, kotoryj nel’zja upustit’P.
‘It is as though one forgets as well that it is the age which is physical-
ly best suited for education, and which we can’t (mustn’t) let go to
waste.’ (Izvestija, 1987)
b. Upravljat’I, kak prežde, uže nel’zja.
‘Govern, like before, is now impossible.’ (Izvestija, 1988)
Since the main verb is perfective in sentence (23a) we would expect the negated
ability (impossible) interpretation. While this is certainly possible, the must not
interpretation is also possible, and we are dealing with indeterminacy. Since
104 Ferdinand de Haan
both interpretations are possible, we are dealing with a case of merger. Sentence
(23b), on the other hand, is clearly a case of negated ability and the perfective
would have been appropriate. The imperfective is used, however, because the
action expressed (upravljat’ ‘to govern’) is durative in nature, and durative
verbs in Russian require the imperfective aspect.
The adverb nado ‘necessary’ in its modal sense can only be used as a strong
deontic modal.12 Thus, a sentence such as (24) below can only be interpreted as
an obligation, not as an epistemic necessity (Forsyth 1970: 264).
(24) Knigi nado sdavat’I v biblioteky vsegda vovremja.
‘Books must always be returned to the library on time.’
The aspect of the main verb in positive sentences is not influenced by nado, but
rather by the general rules that govern aspectual choice. Thus, the main verb is
in the imperfective in sentence (24) because we are dealing with a general truth.
When a negation is present, the situation changes.
The modal nado can be combined with ne ‘not’ by placing it either before
nado or before the main verb (it is also in principle possible to have two nega-
tions, but this was not attested in the corpus). Examples from the corpus are:
(25) a. Ne nado zabyvat’I , čto reč’ idet ne tol’ko o rybe.
‘One must not forget that we are not just talking about fish.’
(Nauka i žizn’, 1989)
b. …, mnogie trudnosti, odolevajušč ie naše narodnoe xozjajstvo, kak,
estestvenno, i puti ix preodolenija, iskat’I nado ne v samoj èkonomike.
‘(the cause of the) many problems that plague our economy, and of
course the way to their solution must not be found just in economics.’
(Izvestija, 1988)
It has been noted in the literature (e.g., Forsyth 1970:246; Rappaport 1985:212–3)
that the aspect of the main verb after ne nado is always imperfective. This is also
true in the corpus. Of the 68 (out of 78) examples of the corpus with ne nado and
a main verb, all main verbs were in the imperfective aspect. Sentence (25a) above
is an example of the imperfective aspect of the main verb. The aspectual choice of
the main verb combined with ne nado is therefore opaque: it always has to be in
the imperfective and this is due to the presence of ne nado.
Similarly, the statement that ne nado (and nado ne) are always deontic in
Strong modality and negation in Russian 105
nature is also borne out by the corpus. Examples with nado and a negation
always refer to either a lack of obligation (which can be translated with need
not) or to an obligation not to do something (in which case it can be translated
with must not).
The problems surrounding nado and negation are similar to those that
occur with dolžen and negation. Scope relations are not always straightforward.
In principle the linear order should determine the scope of the negation. In
sentence (25b) the negation follows the modal, so that the must not translation
is appropriate, while in (26) below the negation precedes the modal and the
need not translation is appropriate.
(26) Ne nado dolgo naprjagat’I pamjat’, čtoby vspomnit’ lučšie kinoroli Eleny
Proklovoj.
‘One need not think long and hard in order to remember the best
movieroles of Elena Proklovaja.’ (Socialističeskaja industrija, 1988)
However, in many cases the linear order is not indicative of the scope relation
between the modal and the negation. In sentence (25a) above, the surface order
suggest a wide scope interpretation, yet it is clear that the intended interpreta-
tion is narrow scope. This occurs quite often in the corpus. Of the 72 cases of ne
nado, 34 have a wide scope interpretation, 24 a narrow scope interpretation,
and a further 14 cases in which the intended scope interpretation is ambiguous.
The narrow scope linear order is unambiguous: the interpretation always
follows the linear order and nado ne is always interpreted as narrow scope.
The interesting fact about the narrow scope interpretation is not that it can
occur with a wide scope linear order (i.e., ne nado), but that it rarely occurs
with a narrow scope order (i.e., with the order nado ne). This order only occurs
6 times in the corpus. Of those 6 times, only 2 times a true narrow scope
situation occurred and sentence (25b) is an example. In 4 instances, nado ne
occurred in a contrastive situation (one must not do X, but Y), and an example
is shown in (27) below.
(27) Oni ponjali, čto iskat’I nado ne točnye kopii izučaemogo gena, a vsex ego
rodstvennikov, to est’ geny, blizkie emu po strukture .
‘They understood, that they had to find not exact copies of the gene
under investigation, but of all of its relatives, that is to say, genes that
resembled it in structure.’ (Sputnik, 1986)
106 Ferdinand de Haan
Examples such as (27) show that nado ne in these cases is actually a positive
sentence. The negation is not sentential but only refers to the constituent (see
De Haan 1997 for a discussion and tests of sentence versus constituent nega-
tion), since the action expressed is not denied but the object of the verb. In the
concrete case of sentence (27) the negation has only the object of the verb to
find in its scope (the exact copies of the gene under investigation), not the verb
itself or the modal so that the sentence expresses a positive obligation to
perform a certain action.
A wide scope linear structure with a narrow scope interpretation, i.e.,
situations in which the form ne nado has the interpretation must not, is very
common. As mentioned above, this occurred 24 times in the corpus. Thus, the
form ne nado is potentially ambiguous between a wide and a narrow scope
interpretation of the negation. This difference in interpretation has no impact
on the choice of the aspect of the main verb (it is still imperfective), showing
that the scope interpretation of nado and the negation is not a factor but that ne
nado inherently requires its main verb to be in the imperfective.13
In a number of cases it is not possible to decide without a doubt from the
context whether we are dealing with narrow or wide scope negation. An
example is shown in (28).
(28) Ne nado vygonjat’I iz školy Sašu Stameskina … Pered licom svoix
tovariščej po Leninskomu komsomolu ja toržestvenno obeščaju, čto
Stameskin stanet xorošim učenikom, graždaninom i daže
komsomol’cem.
‘ (We) mustn’t/needn’t expel Sasha Stameskin from school. Before the
faces of his comrades of the Lenin Komsomol I swear that Stameskin will
become a good student, citizen, and even a (good) Komsomol member.’
(Vasil’ev, B., Zavtra byla vojna, 1984)
From the context it is not possible to determine whether we are dealing with a
wide or a narrow scope. Either interpretation is possible in this situation.
However, since both interpretations are mutually compatible, we are dealing
here with a case of merger. With merger, it is not necessary to decide which of
the two meanings is intended, and this is clearly going on in example (28). Since
14 out of 72 cases, or 19.4%, are indeterminate in this way, the correct scope
interpretation cannot be determined in a sizeable portion of all occurrences of
ne nado. But given that the two scope interpretations are not far apart in
meaning, this rarely creates any problem. Compare the findings of ne nado with
the analogous German combination nicht müssen “must / need not” which is
Strong modality and negation in Russian 107
similarly ambiguous. Scope ambiguity also occurs when there is no main verb
that accompanies ne nado, as in (29)
(29) A prinimat’ snotvornoe ne privykla . — I ne nado, — podderžal ee Gil’e.
‘But I didn’t get used to taking something to help me sleep. — “And you
mustn’t / needn’t,” insisted Gil’e.” (Lidin, V., Fedra, 1962)
The main function of NEG-raising in examples such as (13) and (16) is to show
that we are dealing with a negative sentence. The best way of ensuring that the
listener is aware that we are dealing with a negative sentence is to place the
negative element as early as possible, i.e., right before the modal. This view is
well attested in many languages (see De Haan 1997 and the references there for
discussion). Compare this with the English situation: the constructions must not
and need not have syntactically identical structures; in both cases, the negation
is sentential.14 Not so with the constructions modal + ne and ne + modal in
Russian. Only in the second case is the negation sentential. In the case of modal
+ ne, we are actually dealing with a positive sentence, but with a negated
constituent (the VP). Apparently it is felt that in the construction modal + ne
the negation in its narrow scope interpretation is still somehow sentential in
nature (as it is in English) and since the negation in the modal + ne construction
is not syntactically sentential, the modal is shifted to the beginning, creating the
construction ne + modal. This frees up the construction modal + ne and it can
be used for contrastive situations which is one of the natural functions of
constituent negations crosslinguistically. This situation is found almost without
exception in the corpus. Cases in which the combination modal + ne is not
contrastive may involve an implicit contrast.
The process of NEG-raising does entail that the construction ne + modal is
now ambiguous between a wide and a narrow scope interpretation. This
creation of ambiguity is clearly not considered to be a big problem. However,
there is some indication that the modal dolžen combined with ne is now mainly
108 Ferdinand de Haan
used to denote the narrow scope, i.e., it must be translated by must not rather
than need not. Most of the clear-cut cases in the corpus (those not involving
indeterminacy) have the narrow scope interpretation. This could mean that
dolžen is moving in the direction of a verb with only one possible scope inter-
pretation when it is combined with a negation. If this is the case, it will have the
same status as English must and need, namely verbs which have only one scope
interpretation when they are combined with a negation.
9. Conclusions
Notes
4. In citing the Russian examples, I make use of the following conventions. First, the modal
is printed in bold and the main verb is underlined. If a subject is present, it is italicized.
Finally, according to standard practise, the aspect of the main verb (if present) is indicated
by means of either a superscript I (for imperfective aspect) or P (for perfective aspect).
5. In Russian, most verbs have two morphological forms, depending on the choice of aspect.
For instance, the Russian translation of the verb ‘to write’ is either pisat’ for situations
requiring the imperfective aspect (broadly speaking, actions in progress), or napisat’ when a
perfective interpretation is required (when an action is viewed in its totality). Both pisat’ and
napisat’ are infinitives and can be inflected for tense and person. The relationship between
imperfective and perfective verb forms is for the most part idiosyncratic and for almost every
verb, both forms must be memorized.
6. Coates uses the term root modality instead of deontic modality.
7. The form of the modal dolžen depends on the gender of the subject. The forms are dolžen
(for masculine singular nouns), dolžna (feminine singular), dolžno (neuter singular) and
dolžny (plural). The subject is in the nominative.
8. In the corpus, dolžen is very often accompanied by a main verb. Only in 1.7% of all cases
was there no main verb present. With other modals it is more common to omit a main verb
when it can be recovered from the context.
9. Syntactically, dolžen ne does not behave like ne dolžen, as explained in De Haan 1997.
Narrow scope negation fails the tests for sentence negation and must be considered instances
of constituent negation. The combination dolžen ne is not the only combination with this
behavior, see Section 7 below on nado ne.
10. The impersonal verb prixodit’sja is the imperfective member of an aspectual pair, of
which the verb prijtis’ is the perfective member. It is one of a very few aspectual modal pairs
in Russian and the existence of the two verbs is due to the fact that their origin lies in motion
verbs. In fact, the pair prixodit’sja/prijtis’ can still be used as full verbs, with the meaning of
‘end up’. The distribution of prixodit’sja and prijtis’ is unclear and no attempt is made in this
paper to explain the reasons underlying the choice of prixodit’sja and prijtis’.
11. In the corpus, the combination ne možno does not occur, although možno ne does
sporadically.
12. In addition, nado can also serve as an adverb meaning ‘need’, as in the following
sentence:
(i) Odna iz našix ženščin zajavila , čto ne nado nam nikakix plat’ev iz-za granicy.
‘One of our females explained that we don’t need any dresses from abroad.’
(Literaturnaja gazeta, 1989)
This use of ne nado occurred 21 times in the corpus. Because this is not a modal usage, these
21 cases are disregarded in the rest of the paper.
13. Rappaport (1985: 212–3) explains this by stating that ne nado in its narrow scope
interpretation can be looked at as a negated deontic possibility by applying the elementary
modal logic conversion rule necessary not == not possible, or must not == can’t. In contexts
of negated possibility in Russian, the imperfective is also required. This explanation entails
</TARGET "haa">
that the modal nado can be viewed as either a strong modal (need) and a weak modal (can)
but the latter only in negative contexts. This seems strained.
14. This can be demonstrated with syntactic tests, such as the addition of tag questions or
tags with not even (or its equivalent in the language under discussion). See De Haan (1997)
for discussion.
References
Bybee, J., Perkins, R. and Pagliuca, B. 1994. The Evolution of Grammar: Tense, Aspect, and
Modality in the Languages of the World. Chicago: University of Chicago Press.
Chvany, C. V. 1974. The grammar of Dolžen: Lexical entries as a function of theory. In Slavic
Transformational Syntax. Michigan Slavic materials 10, D. Richard D. Brecht and C. V.
Chvany (eds) , 78–122. Ann Arbor: University of Michigan.
Coates, J. 1983. The Semantics of the Modal Auxiliaries. London: Croom Helm.
Collins, P. 1991. The modals of obligation and necessity in Australian English. In English
Corpus Linguistics, Karin Aijmer and Bengt Altenberg (eds), 145–65. London and New
York: Longman.
De Haan, F. 1997. The Interaction of Modality and Negation: A Typological Study. New York:
Garland.
Flier, M. S. and A. Timberlake (eds). 1985. The Scope of Slavic Aspect. Columbus, OH: Slavica
Publishers, Inc.
Forsyth, J. 1970. A Grammar of Aspect: Usage and Meaning in the Russian Verb. Cambridge:
Cambridge University Press.
Grenoble, L. A. 1992. Double negation in Russian. Linguistics 30, 731–52.
Horn, L. R. 1989. A Natural History of Negation. Chicago: University of Chicago Press.
Kytö, Merja. 1987. On the use of the modal auxiliaries indicating “possibility” in Early
American English. In Martin Harris and Paolo Ramat (eds.). Historical development of
auxiliaries. Berlin: Mouton de Gruyter, 145–70.
Kytö, Merja. 1991. Variation and diachrony, with Early American English in
focus: studies on can/may and shall/will. Frankfurt a.M.: Peter Lang Verlag.
Nichols, J. 1985. Aspect and Inversion in Russian. In Flier and Timberlake (eds) 94–117.
Palmer, F. R. 1986. Mood and Modality. Cambridge: Cambridge University Press.
Rappaport, G. C. 1985. Aspect and Modality in Contexts of Negation. In Flier and
Timberlake (eds) 194–223.
Tottie, G. 1985. The negation of epistemic necessity in Present-day British and American
English. English World-Wide 6, 87–116.
Van der Auwera, J. and Plungian, V. A. 1998. Modality’s semantic map. Linguistic Typology
2, 79–123.
<LINK "oak-n*">
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 6
Formulaic language in
English academic writing
A corpus-based study of the formal
and functional variation of a lexical phrase
in different academic disciplines*
David Oakey
University of Birmingham
1. Introduction
There has long been evidence from cognitive psychology, from studies of first
and second language acquisition, and from textual description, which suggests
that speakers may possess a non-homogeneous store of language knowledge
consisting of a system of generative grammatical rules and a store of pre-
assembled patterns, and that a speaker at times ‘bypasses’ the rules and retrieves
a pre-fabricated pattern instead. From a cognitive perspective Pawley and Syder
argued that the majority of a speaker’s output is in some part memorised, and
only “a minority of spoken clauses are entirely novel creations in the sense that
the combination of lexical items is new to the speaker.” (Pawley and Syder
1983: 205). Bolinger drew on the work of Van Lancker and suggested that
lateralisation of functions in the cortex “points to a side which files things and
a side which puts them together,” (Bolinger 1976: 13), and that formulaic
language is “part of the automatic or semi-automatic store which continues to
be more or less automatic, even when passed through the analytical sieve that
separates them.” (ibid.: 13). From a psycholinguistic perspective, Peters (1983
cited in Weinert 1995: 181) found evidence of formulaic language in first
language acquisition, and Hakuta (1974) saw it in data from child second
language learning.
Evidence for formulaic language is also apparent from the study of written
texts, particularly in corpus-driven research into collocation and patterns.
Sinclair’s ‘idiom principle’ resulted from the observation that speakers and
writers can use “a large number of semi-preconstructed phrases that constitute
single choices, even though they might appear to be analyzable into segments.”
(Sinclair 1991: 110). Hunston and Francis (2000) developed their previous
work on grammar patterns to propose a corpus-driven grammar of English
described in terms of patterns. Partington (1998) and Wray (1999) have
presented thorough surveys of formulaic language across these different fields
of reference.
The above evidence suggests that formulaic language of some kind features
widely in language learning and language use and thus may be important for
learners and users of second languages, as well as their teachers. This was
pointed out by Cowie, for example, who held that “the sheer density of ready-
Formulaic language in English academic writing 113
made units in various types of written text is a fact that any approach to the
teaching of writing to foreign students has to come to terms with.” (Cowie
1992: 10). Lewis, a influential force in raising awareness of formulaic language
in English language teaching during the 1990s, similarly contends that
chunking of written text principally involves words, word partnerships
and, for those learning to write in a particular genre such as academic
English, developing an awareness of the sentence heads and frames
typical of the genre. (Lewis 1996: 15)
As already mentioned, the focus of the present study is the lexical phrase, a term
which has been used frequently in the English language teaching literature over
the past decade, not least due to the contribution made by Lewis. The term
dates back at least to an Artificial Intelligence paper by Becker (1975), although
it is also used to refer to types of compound nouns in the field of Information
Retrieval (c.f. Krovetz 1997).
Nattinger and DeCarrico (1992) used the term to describe a pedagogically
applicable unit of formulaic language; one which had a categorical form and
pragmatic discourse function. They specified lexical phrases for, among other
uses, helping students with the organisation and form of their essays, based on
the observation that:
the typical essay a student writes in North American universities…adheres to
the following structure:
(1) Opening
a. Topic priming: sets the scene and prepares the reader for what is to
follow.
b. Topic nomination:
(a) statement of purpose: explains what the writer intends;
(b) statement of topic: explains what the writer will talk about;
c. Statement of organisation: explains how the writer will talk about the
topic
(2) Body: sets forth the argument, conveys the information.
(3) Closing: brings the argument to a close.
(Nattinger and DeCarrico 1992: 164)
They then provide lists of “representative lexical phrases for the above
categories.” (Nattinger & DeCarrico, 1992: 165). However, this model of
academic discourse organisation, with three complex stages for ‘opening’, yet
only one each for ‘body’ and ‘closing,’ appears rather superficial when compared
114 David Oakey
with the texts which more advanced university students are expected to produce
and publish. While Nattinger and DeCarrico specified these lexical phrases for
wider application in the teaching of English academic writing, it is not clear to
what degree these phrases are “representative” of other genres such as theses,
journal articles and so on. One reason for this lack of clarity is the data used by
Nattinger and DeCarrico in formulating these lexical phrases. This data is rather
briefly described as “written discourse collected from a variety of textbooks for
ESL, textbooks for academic courses, letters to the editor of various news
publications, and personal correspondence.” (Nattinger and DeCarrico
1992: xvi). There is little information about the quantity and attestedness of this
data which obviously crosses a number of genres. Since Nattinger and
DeCarrico’s work, genre analysis has revealed much about the complex ways in
which texts in different genres are produced, leading to an interest in the
patterning and construction of writing in individual disciplines. It would seem
that lexical phrases may need to be more genre-specific than currently specified.
There arise, therefore, two questions which this study attempts to answer:
Are lexical phrases replicable as specified in authentic corpus data? — i.e. do
word strings with the same form as these lexical phrases occur in published
academic prose? The second question relates to the functional discourse
model: If such phrases do occur, what is their function in different disciplines?
The answer to this question has relevance for teachers: if lexical phrases are
seen to vary across disciplines, then a one-size-fits-all pedagogical approach
will not be sufficient. There now follows a brief description of the lexical
phrase used in this study.
The object of the present study is the lexical phrase it is/has been (often) assert-
ed/believed/noted that X. Nattinger and DeCarrico term this a “sentence
builder” lexical phrase which provides the framework for whole sentences
(ibid.: 42). This “sentence builder” category is discontinuous and highly
variable, and this particular lexical phrase has the potential to frame a long and
complex sentence. A writer whose first language was not English would need a
sound grasp of what variation is and is not permissible in order to do this
successfully. It would seem that more paradigmatic variation is possible than
specified, for example verbs such as argue or claim could be used here. Similarly,
Formulaic language in English academic writing 115
syntagmatic variation would also seem possible: there is no reason why often
need stay in the specified position after been. The function of this lexical phrase
is intended to be “topic priming”, something defined by Nattinger and
DeCarrico for academic writing as the way the writer “sets the scene and
prepares the reader for what is to follow.” (ibid.: 164).
5. Methodology
Texts 91 23 18
Total words 2,607,749 1,413,493 620,424
Average words per text 28,657 61,456 34,468
equally. Nonetheless, it is fair to say that a subset of this size yields sufficient
occurrences for meaningful comparisons to be made between sub-genres.
5.2 Searches
The subset was searched using WordSmith Tools Version 3.0 (Scott 1999) and
strings identified which had the same form as Nattinger and DeCarrico’s lexical
phrase. These occurrences were not immediately classed as lexical phrases. By
definition a lexical phrase is a discontinuous formulaic sequence which has
been assigned a pragmatic function, and thus each occurrence of a pattern
needs to be examined in context in order to identify the function it appears to
have in the text. Thus, two conditions need to be satisfied in order to identify a
lexical phrase: its form must approximate to the specified string, and then the
pragmatic function of the phrase must be the one specified.
Corpus analysis can speed the identification of word strings in a large
corpus, but traditional intuitive judgement must still play a large part in
deciding whether a particular string performs a particular function. For
example, if a researcher is looking for examples of it is the case that X , the
concordancing software could be instructed to search the corpus for all occur-
rences of case where that occurs within nine words to the right. However, it is
still the researcher’s task to identify and remove those examples which refer to
legal or medical cases and so on. In proposing a pragmatic function for it is the
case that, such as “topic priming”, it is still the researcher’s intuition which
makes the judgement based on the context in which the string occurs.
As mentioned above, sentence builder lexical phrases are highly variable
and discontinuous. This possibility of variation means that a search using a
fixed search string, e.g. it has been shown that, would yield only occurrences of
that string and miss any syntagmatic and paradigmatic variation. Similarly,
searching this corpus subset using the wild card search string it has been * that
would only highlight paradigmatic variation, and miss syntagmatic variation
such as the insertion of adverbs or conjunctions.
Formulaic language in English academic writing 117
Allowance must also be made for scanning infelicities in the corpus subset.
There were many instances where two spaces occur between words,
(e.g., it is assumed that …).
This would not be picked up by a search string with a single space.
Cumulative wildcard search strings were therefore (Table 2 below) used in
order to ensure that all possible patterns, not just a particular string, were
found.
it * is has been
it is * that has * been
it is * * that has * * been
it is * * * that has * * * been
it is * * * * that has * * * * been
it is * * * * * that has * * * * * been
6. Results: Form
The first question relating to the form of lexical phrases is easily answered. It
seems that in general strings of the form it is/has been (often)_________ that X
containing a passive verb form are not very frequent in the corpus subset. There
are a total of 362 simple present occurrences per million words, and 261 present
perfect occurrences (see Table 3 below). Biber et al. (1999: 674) find around
2000 that-clause types per million words of academic prose. Looking for the
exact forms specified by Nattinger and DeCarrico, the simple present form with
believed is the only form to occur across all three sub-genres, and the present
perfect form only occurs with noted.
In general, occurrences of word strings of the form it is/has been ____ that
X containing a passive verb form vary across sub-genres (see Appendix 1).
Writers of medical texts use most over all, a total of 269 per million words, and
118 David Oakey
Table 3.Occurrences of word strings of the form it is/has ____ that X per million words
Social % Medicine % Technical/ % Total
Science Engineering
social science use the fewest. Medical writers are also alone in using more of
these strings in the present perfect than in the simple present. Technical writing,
by contrast, uses nearly four times as many strings in the present simple than in
the present perfect.
Turning to other verbs used in the simple present form of this string, i.e. it
is _____ that X, no verb occurs more than 17 times per million words (see
Appendix 2). Biber et al. (1999: 663), when discussing verbs which control
that-clauses, deem features which occur this infrequently “less common”. The
strings it is known that and it is concluded that occur most frequently, both of
them in medical texts. The verb which occurs most across all three sub-genres
is assumed, which occurs between 10 and 12 times.
For strings in the present perfect, i.e. with the form it has been _____ that X,
the most striking feature of the results is that show and suggest are the most
common verbs for all three disciplines. In medical texts these verbs occur more
than 40 times per million words, which makes them “relatively common” by
the Biber criteria (see Appendix 3).
7. Results: Function
The second question relates to the function of these strings: whether or not they
can be termed lexical phrases in Nattinger and DeCarrico’s sense, and whether
variation across sub-genres is important for teachers of writing to students in
different academic disciplines. One example will serve to illustrate the impor-
tance of context in judging the function of a string of the form it is/has been
_____ that X. In example 1 below, it is the writer who is doing the recognising,
Formulaic language in English academic writing 119
300
250
200
150
100
50
0
Social Science Medical Technical
16
120 David Oakey
14
12
10 SocSci
Medical
8 Technical
0
pt ee ue e ve m de er ge sh te ct el d d pe d w te ize ize nt rt y e st k
sa se how gge hin
a cce agr arg sum elie clai clu sid visa abli ima xpe fe fin hol ho ten kno tula ise/ ise/ me epo
n n t t e s l s u t
as b in n m r s
co co en es es po rea cog eco
re r
Verb
50
Appendix 3: Formal variation of it has been _____ that X (more than two occurrences per million words)
45
40
35
30
SocSci
25 Medical
Technical
20
10
0
e e e e h e d t te e e y t
pt gu m im at in lis at ow ou la os /iz rt sa ow es
ce ar su cla str i m fin t t u p i s e po sh gg
ac as n erm stab t kn in s ro re su
o t e es po p gn
m de po co
de
Formulaic language in English academic writing
Verb re
121
122 David Oakey
(2) It is widely recognized that the proportion of women who suffer mental
disorders — particularly depression — exceeds that of men (Cochrane,
1983).
A related phenomenon in strings of this type has been noted by Johns (1991)
who observed that tense choice also affects function. He suggests that:
in English science and engineering academic abstracts, the present perfect is
specifically used to refer to the work of other scientists. For example It is
proposed that … suggests that the writer of the abstract is doing the proposing,
but It has been proposed that … suggests that the proposing is done by someone
other than the writer.
(Johns 1991 quoted in Baker 1992: 101)
lexical phrases are used for a function of this type, although with far more
paradigmatic variation allowed than simply asserted, believed, or noted.
(3) 7.2.1. Lexical Stress
An alternative means of increasing the number of units in the input
utterance, and thereby decreasing the number of word paths found,
would be to include stress in the lexicon and input utterance. It is gener-
ally recognised that there is a great deal of information in the speech
wave — particularly prosodic information — which we are not yet able
to isolate and use.
(4) Abstract
It has been thought that aircraft maintenance problems i.e. planning/-
scheduling activities, aircraft system/equipment failure diagnosis, etc.,
are too large and complex to be tackled successfully with computers.
7.2 Support
The most common function of these strings in all sub-genres is to bring in
support from outside sources. Writers bring in non-conflicting factual informa-
tion or arguments from outside sources in order to lend authority to their
position or stance. The fact that there is a clearly-identifiable and thus teachable
function performed by this string means it could justifiably be termed a lexical
phrase in the sense intended by Nattinger and DeCarrico.
This function was seen in two varieties, “non-cited” and “cited”. “Non-
cited support” is when the writer introduces outside information as support,
the source of which is not attributed in any way (see examples 5 and 6 below).
This is the most common function of this lexical phrase in the corpus subset
(see Appendix 4), particularly in medical texts, where nearly 70% of
occurrences of the phrase in the present perfect performed this function.
(5) It is now realised that a million words is insufficient to produce an ade-
quate model of a language since many of the phenomena of the language
are so rare that they will be absent from such a corpus.
(6) The number of strands in the tight junctions may correlate inversely
with the permeability of the epithelium and it has been shown that the
crypt tight junctions have fewer strands than the villous junctions.
Therefore, the results of the current study may indicate that the tight
junctions are functionally altered, thereby allowing PT-gliadin to pass
into the intercellular space.
80
Appendix 4 Functions of it is/has been ____ that X
report
report
showshow
70 suggest
suggest
124 David Oakey
60
50 Topic priming
argue
argue
estimate
estimate
point out
Self reference
point out
suggest
accept suggest Support (non-cited)
40 accept
know
know
Support (cited)
accept Straw Man
accept
know
30 know
argue
argue Discarded
accept
20 accept
say
say
0
Social Studies Medical Technical Social Studies Medical Technical
It should be pointed out here that putting linguistic items in tidy functional
categories can obscure the fact that they can have multiple functions, a phe-
nomenon described by Moon (1998:21) as “cross-functioning.” So, for example,
in a journal article abstract the lexical phrase it is argued that is at one stage “topic
priming” by introducing the reader to the main idea contained in the paper, but
also acting as a “straw man” since the topic is subsequently negatively evaluated.
Formulaic language in English academic writing 127
8. Conclusions
This study has attempted to test the notion of the lexical phrase in attested
corpus data by highlighting the relationship between the form and function of
one particular formulaic string. The results show that strings of this form are
indeed used by academic writers, but it is also clear that, if we accept in princi-
ple the definition of a lexical phrase, the functions performed by this particular
phrase are more varied than originally specified. In addition to “topic priming”,
instances have been found in all three sub-genres where this phrase has been
used for attributed and non-attributed support, negatively evaluated state-
ments, and reference within the text.
While the main finding of this study is clear, more work remains to be done.
Since not all the texts in this BNC subset are complete, the distribution of
occurrences of different functions within whole texts, e.g. at the beginnings,
middles, or ends of sections or chapters, cannot be fully determined. It could be
argued that these results are not useful since a corpus of published academic
writing was searched for occurrences of something which was specified for use
in student academic essay writing. However, if we accept the basic definition of
the lexical phrase, we should also acknowledge that the functions of similar word
strings are likely to vary between these forms of writing, not least because of the
differences in length. For example writing it has often been argued that X may
prime the topic of a five-paragraph student essay, but it is clear from the results
of this study that it is not the only possible function of this string. Therefore it is
valid for published data to be consulted in an attempt to make the lexical phrase
relevant to teachers across the broader field of academic writing.
Moving on to the wider debate about whether lexical phrases and formulaic
language are of use for teaching academic writing, opinions are divided. Lewis
(1993: 96) argues that “correctly identified lexical phrases can be presented to L2
learners in identifiable contexts, mastered as learned wholes and thus become
an important resource to (sic) mastering the syntax.” (emphasis added). It is
hoped that this study has not only “correctly identified” the various forms and
uses of this particular phrase, but also has suggested pedagogically useful
functions.
<DEST "oak-n*">
Notes
* The author wishes to thank the Centre for Research in Language and Linguistics, based in
the Language Institute at the University of Hull for supplying the copy of the British National
Corpus used in this study.
References
Arnaud, P. L. J. and Bejoint, H. (eds) 1992. Vocabulary and Applied Linguistics. Basingstoke:
Macmillan.
Aston, G. 2001. “Text Categories and Corpus Users: A Response to David Lee.” Language
Learning & Technology 5/3: 73–76.
Baker, M. 1992. In other words : a coursebook on translation. London: Routledge.
Becker, J. 1975. “The Phrasal Lexicon.” in Nash-Webber, B. and Schank, R. (eds) 70–77.
Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. (eds) Longman
Grammar of Spoken and Written English. London: Longman.
Bolinger, D. 1976. “Meaning and Memory.” Forum Linguisticum 1/1: 1–14.
British National Corpus v 1.0 (1995) Oxford: Oxford University Computing Service.
British National Corpus World Edition (2000) Oxford: Oxford University Computing Service.
Cowie, A. P. 1992. “Multiword Lexical Units and Communicative Language Teaching.” in
P. J. L Anaud, & H. Bejoint, H. (eds). pp 1–12.
Hakuta, K. 1974. “Prefabricated Patterns and the Emergence of Structure in Second
Language Acquisition.” Language Learning 24/2: 287–297.
Hunston, S. 1995. “A corpus study of English verbs of attribution.” Functions of Language
2/2: 133–158.
Hunston, S. and Francis, G. 2000. Pattern Grammar. Amsterdam: Benjamins
Krovetz, R. 1997. “Homonymy and polysemy in Information Retrieval” in Proceedings of the
35th Annual Meeting of the Association for Computational Linguistics: 72–29.
Lee, D. 2001. “Genres, registers, text types, domains, and styles: clarifying the concepts and
navigating a path through the BNC jungle.” Language Learning Technology, 5/3, 37–72.
http://llt.msu.edu/vol5num3/lee/default.html
Lewis, M. 1993. The Lexical Approach. Hove: Language Teaching Publications.
Lewis, M. 1996. “Implications of a lexical view of language”. in J. Willis, D. Willis (eds). 10–16.
Moon, R. 1998. Fixed Expressions and Idioms in English: A corpus based approach. Oxford:
Clarendon Press.
Nash-Webber, B. and Schank, R.1975. (eds) Theoretical Issues in Natural language Processing
1. Cambridge, Mass.: Bolt, Beranek, and Newman.
Nattinger, J. R. and DeCarrico, J. 1992. Lexical Phrases and Language Teaching. Oxford:
Oxford University Press.
Partington, A. 1998. Patterns and Meanings: Using Corpora for English Language Research
and Teaching. Amsterdam: John Benjamins
</TARGET "oak">
Pawley, A. and Syder, F. H. 1983. “Two puzzles for linguistic theory: nativelike selection and
nativelike fluency.” In J. C. Richards, R. W. Schmidt. (eds). 191–227.
Richards, J.C. and Schmidt, R.W. 1983. (eds). Language and Communication. London:
Longman.
Scott, M. 1999. WordSmith Tools 3.0 Oxford: Oxford University Press.
Sinclair, J. McH. 1991. Corpus, Concordance, Collocation Oxford: Oxford University Press.
Weinert, R. 1995. “The Role of Formulaic Language in Second Language Acquisition.”
Applied Linguistics 16/2: 180–205.
Willis, J. and Willis, D. (eds). Challenge and change in language teaching. Oxford:
Heinemann.
Wray, A. 1999. “Formulaic language in learners and native speakers.” Language Teaching 32:
213–231.
<TARGET "cor" DOCINFO
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 7
Viviana Cortes
Iowa State University
1. Introduction
For several decades, linguists have been interested in the study of lexical
association patterns, that is, the way words seem to co-occur with another word
or words. Firth (1964) introduced the terms ‘collocation’ and ‘collocability’ to
express the habitual occurrence of a word with another word, and he tried to
explain the way in which collocations help complete the meaning of the word.
While in the past the study of these frequent lexical co-occurrences was
completed from a rather perceptual point of view, the growth of corpus
linguistics studies and the development of computer programs specifically
designed for the analyses of language corpora have introduced new research
paths and tools. These research methodologies have been employed in the study
of the distribution of lexical association patterns in natural contexts (Biber,
1993, 1996; Biber & Conrad, 1999; Biber, Johansson, Leech, Conrad, & Finegan,
1999; De Cock, 1998), which allowed not only the identification of collocations,
which are two words that tend to frequently co-occur in real language, but also
of ‘lexical bundles’, which are extended collocations, sequences of three, four,
five, or six words that statistically co-occur in a register (Biber et al., 1999).
Biber et al. (1999) used computational analyses of a large corpus to identify
the most frequent lexical bundles in academic prose and conversation, and
produced a comprehensive list of examples and their grammatical categoriza-
tions. Modeled after their investigation, this study analyzes a corpus of writing
from freshman composition courses to find out the most frequent 4-word
lexical bundles produced by freshman university students, and to establish a
comparison with those 4-word lexical bundles most frequently found in
academic prose and conversation (Biber & Conrad, 1999). The study includes
grammatical and functional analyses of the most frequent 4-word bundles
132 Viviana Cortes
The varied research conducted on the study of word combinations has come a
long way: from the ethnographic and qualitative studies in first and second
language acquisition of formulaic routines of the 70s to the quantitative corpus-
based studies of natural language in the 90s. In addition, several studies tried to
account for the use of word combinations and their frequency in language in an
intuitive way. The purpose of this section is to present a survey of studies of
word combinations that will help create a better framework of reference for the
study of lexical bundles.
A wide variety of studies have concentrated on the investigation of word
combinations from a pedagogical point of view (DeCarrico & Nattinger, 1988;
Fillmore, 1979; Hakuta, 1974; Howarth, 1998; Pawley & Syder, 1983; Peters,
1983; Yorio, 1980). Other studies are descriptive studies of word combinations,
both perceptual (Allerton, 1984; Becker, 1975; Chafe, 1994; Cowie, 1988; Fraser,
1990) and empirical studies (Altenberg, 1993, 1998; Biber et al., 1999; Butler,
1997; De Cock, 1998; Moon, 1998).
Many times studies that claim they analyze different types of word combi-
nations use different labels to name the same type of word sequence. At other
times, studies use the same label to define different types of expressions. Pawley
(1985) centers his work around speech formulas, which he defines as “a
conventional pairing of a particular formal construction with a particular
conventional idea” (p. 88). He believes that these expressions have something in
common with idioms, lexical items, and grammatical constructions, but do not
fit in any of those groups. Cowie (1988) does not call these expression “speech
formulas” but he uses a variety of terms like “ready-made expressions”, “multi-
word units”, or “prefabricated routines.” He divides these combinations into
two groups, which are different in intended conveyed meaning and structural
level. The first category contains pragmatically specialized units which have
evolved meanings that reflect their function in discourse, such as good morning,
or how are you?. The second category groups expressions like kick one’s heels or
pass the buck, which have developed referential meaning by being used as
invariable units. Moon (1992) explains the importance of fixed expressions for
lexicographers, as fixed expressions have significant functions in text, and the
Lexical bundles in Freshman composition 133
3. Lexical bundles
311 papers
Personal pronoun + lexical 44% I don’t know what 2% I will use this
verb phrase (+ complement
clause)
(verb/adjective +) to-clause
fragment 5% 9% are likely to be 2% to appeal to the
indicating location (e.g., the back of the, the bottom of the, the edge of the), time
(as in at the same time) and other phrases with varied meanings, such as
expressions that indicate quantity (as in a lot of the, a part of the, a wide range of,
a wide variety of), among other different meanings. In this grammatical group,
bundles also presented some special uses, as in the case of at the same time. In
certain examples, at the same time was used as a temporal marker, used to link
142 Viviana Cortes
5. Conclusion
The results of this study reflect the importance of different types of analysis in the
investigation of lexical bundles. When first analyzed, the lexical bundles produced
by freshman students in composition courses looked structurally similar to those
used in academic prose. However, a closer look showed that although the
grammatical structures confirmed that similarity, the bundles used were in many
cases served as temporal or location markers, which are bundles not exclusively
used in academic prose. The extended analysis of the bundles in context showed
that when writing for these freshman composition courses, students made an
effort to produce lexical bundles that resembled bundles used in academic prose
more than those bundles used in conversational language. This can be seen in the
complete absence of contractions in the lexical bundles identified in this corpus,
as well as in the absence of bundles starting with the pronoun I followed by a
verb of perception (e.g., I don’t think so, I don’t know what, I think it is, I know
Lexical bundles in Freshman composition 143
what you) or any other bundles which occur exclusively in conversational style.
The analysis of the lexical bundles that most frequently occur in freshman
composition and the examples of their use in context indicated that the
instructional tasks designed for this particular composition courses influence
students’ use of certain bundles. This is the case, for example, of those lexical
bundles that act as location and temporal markers, and those bundles that work
as instructions to the readers. The use of these bundles was often directly related
to the tasks in the course, as in the case of descriptions, which require the use of
many location and time markers, and rhetorical analyses of texts, which call for
the use of expressions that help organize the discourse, as those bundles used to
guide the reader.
It cannot be predicted if the use of these lexical bundles will transfer to
Plan I am going to
In the introduction, I am going to draw the reader in with a
story….
Purpose to be able to
…they want the government to be able to limit that type of
weapons.
References
Allerton, D.J. (1984). Three (of four) levels of word cooccurrence restriction. Lingua, 63, 17–40.
Altenberg, B. (1993). Recurrent word combinations in spoken English. In J. D’Arcy (Ed),
Proceedings of the Fifth Nordic Association for English Studies Conference. (pp. 17–27).
Reykjavik : University of Iceland.
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent
word-combinations. In A. Cowie (Ed.), Phraseology (pp. 101–122). Oxford: Oxford
University Press.
Altenberg, B. & Eeg-Olofsson, M. (1990) Phraseology in spoken English: Presentation of a
project. In J. Aarts and W. Meijs (Eds.), Theory and practice in corpus linguistics
(pp. 1–26). Amsterdam: Rodopi.
Becker, J. (1975). The Phrasal Lexicon. Bolt Beranek and Newman Report No. 3081. AI
Report No. 28.
Biber, D. (1993). Co-occurrence patterns among collocations: A tool for corpus-based lexical
knowledge acquisition. Computational Linguistics, 19, 549–556.
Biber, D. (1996). Investigating language use through corpus-based analyses of association
patterns. International Journal of Corpus Linguistics, 1(2), 171–197.
Biber, D. & Conrad, S. (1999), Lexical bundles in conversation and academic prose. In H.
Hasselgard and S. Oksefjell (Eds.), Out of corpora: Studies in honor of Stig Johansson.
(pp. 181–190). Amsterdam: Rodopi.
Biber, D. , Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of
spoken and written English. London: Longman.
Chafe, W. (1994). Discourse, consciousness, and time. Chicago: The University of Chicago Press.
Cowie, A. (1988). Stable and creative aspects of vocabulary. In R. Carter and M. McCarthy
(Eds.), Vocabulary and language teaching (pp. 126–139). London: Longman.
DeCarrico, J. & Nattinger, J. (1988). Lexical phrases for the comprehension of academic
lectures. English for Specific Purposes, 7, 91–102.
De Cock, S. (1998). A recurrent word combination approach to the study of formulae in the
speech of native and non-native speakers of English. International Journal of Corpus
Linguistics, 3, 59–80.
Fraser, B. (1990). An approach to discourse markers. Journal of pragmatics 14, 383–395.
</TARGET "cor">
TITLE "Pseudo-Titles in the press genre of various components of the International Corpus of English"
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 8
Charles F. Meyer
University of Massachusetts at Boston
(2) Do pseudo-titles have the same structure in these varieties that they
have in British, American, and New Zealand English?
(3) What is the relationship between pseudo-titles and equivalent appo-
sitional structures, both in news broadcasts and newspapers that
permit the use of pseudo-titles, and in those that do not?
To answer these questions, I studied the structure and use of pseudo-titles and
equivalent appositives in one type of media English — press reportage — in
seven regional components of the International Corpus of English (ICE): Great
Britain, East Africa, Jamaica, New Zealand, the Philippines, Singapore, and the
United States. I demonstrate that while pseudo-titles are still uncommon in
British press reportage, they are very widespread in the press reportage of other
varieties, even those directly influenced by British English. This finding is a
reflection of the powerful influence that the American media has had on the
evolution of English as an international language (see Crystal 1997: 82–95). In
addition, I show that while there are general constraints on the form of
pseudo-titles, in New Zealand and Philippine press reportage, innovative
forms are evolving, making these varieties different from the others. This
finding is significant because most studies of the influence of American English
on other Englishes has focused mainly on lexical items, and such studies, as
Peters (2001: 299) observes, do not present “the full linguistic bottle.” The
spread of pseudo-titles in press writing, therefore, not only shows that a
grammatical construction can be borrowed from one variety into another but
that once the construction is borrowed, the constraints on its usage can
change, leading to new forms.
Because titles are markers of respect, they are capitalized and pseudo-titles are
not. However, practice is more mixed. In (3a), National Security Advisor is
capitalized; in (3b) it is not.
(3) a. Senator Richard Lugar (R-IN) has followed up on efforts which
began last month by a bi-partisan group of Senators, Representa-
tives, and a large [group] of farm organizations by writing directly to
National Security Advisor Samuel Berger. (The Washington Report
January 22, 1999, http://ianrwww.unl.edu/nwga/page8.htm)
b. On March 14, for instance, national security adviser Sandy Berger
went on NBC’s “Meet the Press” to announce that the White House
had responded “swiftly” to each charge of espionage brought to its
attention. (The New Republic, May 24, 1999)
However, as will be shown in 4.9, examples such as (5) were rare in the corpus,
even in newspapers disallowing the use of pseudo-titles, contexts in which we
would expect to find more constructions with a determiner. Moreover, there
are other examples, such as (6a), that are identical in structure to (5), except
that the proper noun occurs before rather than after the noun phrase contain-
ing the determiner:
(6) “That’s something that we do worry about,” said Mr. Bacon, the Penta-
gon spokesman. (New York Times, March 31, p. A11)
If (4)–(7) are considered collectively, it makes more sense to say that pseudo-
titles and equivalent appositives are related not through the process of deter-
miner deletion but through the process of “systematic correspondence” (Quirk
et al. 1985: 57f), a process that captures the relationship between constructions
that differ in form but that are similar in meaning.
152 Charles F. Meyer
Bell’s (1988) study of pseudo-titles illustrates that their usage is time sensitive.
In a matter of 10 years, Bell (1988: 338) found a perceptible rise in the use of
pseudo-titles in three of four radio stations whose broadcasts he studied in 1974
and 1984. This finding suggests that any study of pseudo-titles should be based
on texts collected during a similar time frame and at least a decade after Bell’s
(1988) study was conducted to determine whether pseudo-title usage is
continuing to increase. The various corpora included in the International
Corpus of English (ICE) fulfil both these criteria.
The ICE Project was begun by Sidney Greenbaum in the late 1980s to study
the evolution of English as a world language. Participating in the project are
approximately twenty regional teams representing such countries as Australia,
Canada, East Africa (Kenya and Tanzania), Great Britain, Hong Kong, India,
Ireland, Jamaica, New Zealand, Singapore, and the United States. Each regional
team began collecting in the early 1990s one million words of speech and
writing divided into (ca.) 2,000 word samples representing various kinds of
English: spontaneous conversations, broadcast discussions, speeches, learned
and popular writing, and fiction, to name but a few of the various types of
English that are being collected (cf. Nelson 1996: 29 for a complete listing of text
categories). Three components of ICE (Great Britain, East Africa, and New
Zealand) are currently complete, and other components are presently in
progress, with an interim release of the entire corpus due in the near future.
Within both the spoken and written sections of the ICE corpus are various
kinds of journalistic English: news broadcasts, press reportage, and press
editorials. Although pseudo-titles can be found in any kind of “hard news
reportage” (Bell 1988: 327), such as press reportage or news broadcasts, I
decided to restrict my discussion to press news reportage, since at the time that
this study was conducted a number of components had completed the press
reportage section of their corpus but relatively few had finished collecting and
transcribing news broadcasts. The following ICE components were included in
the study: East Africa, Great Britain, Jamaica, New Zealand, the Philippines,
Singapore, and the United States. I supplemented examples from these compo-
nents with examples taken from other sources.
Even though each component of ICE contains 20 samples of press report-
age, in many components, the same newspaper is represented in more than one
sample. Since the use of pseudo-titles is sensitive to the editorial policy of a
Pseudo-Titles 153
Table 1.Number of Different Newspapers Included Within the Various ICE Components
Component Total
Great Britain 15
United States 20
New Zealand 12
Philippines 10
Jamaica 3
East Africa 3
Singapore 2
The relatively few different newspapers in East Africa, Jamaica, and Singa-
pore reflects the fact that these countries have small populations and that in two
of them (East Africa and Singapore) English is not a primary, but a second,
language.
From the samples represented in Table 1, two kinds of information were
obtained. First, all samples were examined to determine whether they con-
tained pseudo-titles. Second, ten samples from ten different newspapers were
subjected to a detailed linguistic analysis (described below). Since the ICE
components from Jamaica, East Africa, and Singapore did not contain 10
samples from 10 different newspaper, these varieties were excluded from this
part of the analysis. This left four components (USA, GB, NZ, and Philippines)
for this part of the analysis.
To carry out this analysis, each pseudo-title and equivalent apposition in
ICE-USA, GB, NZ, and Philippines was assigned a series of tags (see Table 2) that:
a. Identified the regional variety and sample the construction occurred in
(variables 1 and 2: ‘Country’ and ‘Sample #’)
b. Specified whether the construction was a pseudo-title or an equivalent
appositive (variable 3: ‘Type’)
c. Indicated that if the construction was an equivalent appositive, whether its
‘correspondence relationship’ (variable 4) to a pseudo-title involved total
equivalence:
Durk Jager, executive vice president (ICE-USA) Æ executive vice president
Durk Jager
154 Charles F. Meyer
determiner deletion:
the Organising Secretary, Mr Stephen Kalonzo Musyoka (ICE-East
Africa) Æ Organising Secretary Mr Stephen Kalonzo Musyoka
or partial equivalence:
Ted Shackley, deputy chief of the CIA station in Rome in the 1970s
(ICE-GB:W2C-010 #62:10) Æ CIA deputy chief Ted Shackley but not
?deputy chief of the CIA station in Rome in the 1970s Ted Shackley
d. Noted the ‘Form’ (variable 5) of the construction: whether it was a simple
NP, a genitive NP, or an NP with some postmodification.
e. Measured the ‘Length’ (variable 6) of the pseudo-title or unit of the
appositive that could potentially become a pseudo-title, from values of one
word to six or more words.
US (1) W2C001 (1) Pseudo-title (1) Total Simple NP (1) One word
Equivalence (1) (1)
East Africa (3) W2C003 (3) Partial Equiva- Multiple post- Three
lence (3) Modification words (3)
(3)
Singapore (7)
Coding the data this way allowed for the results to be viewed from a variety of
different perspectives. For instance, because each construction is given a
number identifying the regional variety in which it occurred, it was possible to
Pseudo-Titles 155
compare pseudo-title usage in, say, British English and Philippine English.
Likewise, each sample represents a different newspaper. Therefore, by recording
the sample from which a construction was taken, it was possible to know which
newspapers permit pseudo-title usage, and which do not, and to determine the
extent to which newspapers use pseudo-titles and equivalent appositives
similarly or differently. Noting the length of each pseudo-title helped determine
whether pseudo-titles were longer in some varieties than others, a hypothesis
prompted by an initial survey of the data, which suggested that pseudo-titles
were lengthier in the Philippine samples than in the other samples.
Table 3 lists for each ICE component the number of newspapers that did or did
not contain pseudo-titles.
Great Britain 7 8 15
United States 1 19 20
New Zealand 1 11 12
Philippines 0 10 10
Jamaica 0 3 3
East Africa 0 3 3
Singapore 0 2 2
Totals 9 (14%) 56 (86%) 65 (100%)
revealed that this newspaper did indeed allow pseudo-titles: the 2,000 word
sample in ICE-USA simply did not contain any examples, a consequence in this
case of the relatively small size of the sample. Not only did the New York Times
contain borderline examples of pseudo-titles, such as former Vice President Dan
Quayle (cf. 1.0), but it and one British newspaper from ICE-GB forbidding
pseudo-titles, the Guardian, contained pseudo-titles in sports reportage.
Commenting on the inclusion of pseudo-titles in the sports reportage of the
New York Times, Siegal and Connolly (1999: 334) note that this “exception” to
the editorial policy of the paper has its roots in “tradition” and is not intended
to cover all cases where a pseudo-title could potentially be used but only “sports
positions” (e.g. Red Sox Manager Pat Agneau); other kinds of pseudo-titles, such
as suspended Braves pitcher Leigh Dann, are prohibited. The acceptance of
pseudo-titles in sports reportage suggests that the prohibition against pseudo-
title usage does not always extend to less formal kinds of writing.
Although one of the British-influenced varieties, East Africa, did contain
newspapers with pseudo-titles, there were cases clearly illustrating a mixture of
British and American styles. Example (8) begins with a pseudo-title, Lawyer
Paul Muite, but two sentences later contains not another pseudo-title, but
instead an equivalent appositive, a lawyer, Ms Martha Njoka, that has very
marked characteristics of British newspaper English: a title, Ms, before the
proper noun (American newspapers, with the exception of the New York Times,
use only the name) and no punctuation following the title to mark it as an
abbreviation (American usage would mandate a period).
(8) Lawyer Paul Muite and his co-defendants in the LSK contempt suit
wound up their case yesterday and accused the Government of manipu-
lating courts through proxies to silence its critics…Later in the after-
noon, there was a brief drama in court when a lawyer, Ms Martha Njoka,
was ordered out after she defied the judge’s directive to stop talking
while another lawyer was addressing the court. (ICE-East Africa)
Figure 1 plots the frequency of pseudo-titles and appositives in the four ICE
varieties that were subjected to detailed linguistic analysis: ICE-USA, Philip-
pines, NZ, and GB. As Figure 1 demonstrates, there were significant differences
between the distributions of pseudo-titles and equivalent appositives in the four
Pseudo-Titles 157
90
80
70
60
50
40
30
TYPE
20 PT
Count
10 Appo
US Phil NZ GB
COUNTRY
2
Statistical Test Value Degrees of Significance Level
Freedom
Chi square 65.686 3 p = .000
Likelihood Ratio 67.832 3 p = .000
60
50
40
30
20 CORR
Total Equiv.
10
COUNT
Det. Del.
0 Partial Equiv.
US Phil NZ GB
COUNTRY
with the other samples containing pseudo-titles, the differences were still not
statistically significant.
In each of the varieties, as Figure 2 illustrates, the correspondence relation-
ship with the highest frequency was the category of partial equivalence. There
are purely linguistic reasons why most appositives were in this category. There
are two types of constructions in this category that inhibit conversion to a
pseudo-title: those containing some kind of postmodification (9a) (Bell
1988: 336)4and those containing a genitive noun phrase (10a).
(9) a. Ted Shackley, deputy chief of the CIA station in Rome in the 1970s
(ICE-GB:W2C-010 #62:1)
b. ?deputy chief of the CIA station in Rome in the 1970s Ted Shackley
160 Charles F. Meyer
300
200
100 FORM
Simple NP
Gen. NP
Count
0 Post. Mod.
PT Appo
TYPE
Figure 3.The Form of Pseudo-Titles and Equivalent Appositives in the ICE Varieties
Pseudo-Titles 161
80
60
40
20
LENGHT
1–4 Words
Count
0 5 or More Words
US Phil NZ GB
COUNTRY
Although ICE-NZ, Phil, and USA also had many pseudo-titles that were two or
three words long, ICE-NZ and Phil had significantly more pseudo-titles that
were lengthier than five words (12a-f), making these two varieties quite distinct
from the other two varieties:
(12) a. Salamat and Presidential Adviser on Flagship Projects in
Mindanao Robert Aventajado (ICE-Philippines)
b. Oil and Gas planning and development manager Roger
O’Brien (ICE-NZ)
c. Time Magazine Asia bureau chief Sandra Burton (ICE-
Philippines)
d. corporate planning and public affairs executive director
Graeme Wilson (ICE-NZ)
e. Autonomous Region of Muslime Mindanao police chief
Damming Unga (ICE-Philippines)
f. Wesley and former New Zealand coach Chris Grinter. (ICE-NZ)
At the end of his discussion of pseudo-titles, Bell (1988: 341–3) offers a number
of predictions about the future development of pseudo-titles in the various
national varieties of English. Since the present study was conducted on texts
published 10–15 years after those analyzed by Bell (1988), it is possible to test
whether Bell’s (1988) predictions have been fulfilled.
One prediction that Bell (1988: 342) makes is that pseudo-titles will spread
to other registers of English, possibly even conversational English. Because only
press reportage was investigated in this study, this prediction is difficult to test.
But a search of the entire ICE-GB found pseudo-titles in the categories of
broadcast interviews, sports commentaries, news broadcasts, and skills/hob-
bies.6 Whether pseudo-titles will ever find their way into spontaneous dialogues
is highly questionable, primarily because they serve no honorific function. It is
quite common for us to address people as President or Professor because these
are titles of respect. But it would be quite another matter to walk up to an
individual and say, “Hello, former radio broadcaster Franco Gupta.” Con-
ceivably, a pseudo-title could be used when talking about someone, as in
“Yesterday I spoke with former radio broadcaster Franco Gupta.” But there is
little communicative need for such structures because equivalent appositives of
164 Charles F. Meyer
this type (e.g. a former radio broadcaster, Franco Gupta) are extremely uncom-
mon in speech (Meyer 1992: 116).
Bell (1988: 341) also predicts that the “staccato, formulaic nature” of the
pseudo-title will increase, and that many pseudo-titles, such as Prime Minister,
will become acceptable as full titles. Because all ICE corpora are relatively short,
it is not possible to determine whether a given pseudo-title is formulaic. For
instance, following the U. S. invasion of Panama, Manuel Noriega was continu-
ally referred to in the American press as Panamanian dictator Manuel Noriego.
But to discover usages such as this, one would need a huge monitor corpus,
such as the Bank of English, containing millions of words of text covering an
extended period of time. However, because three of the components studied —
ICE-East Africa, GB, NZ — are complete, it is possible to determine the extent
to which in the spoken and written language of a given variety, an expression
such as Prime Minister has fully moved into the category of titles.
Great Britain 4 0 21 25
New Zealand 15 5 9 29
East Africa7 12 13 22 47
Table 5 lists the number of times Prime Minister was used as a full title 13a),
a pseudo-title (13b), or an equivalent appositive (13c).
(13) a. Prime Minister Obed Diamini (ICE-East Africa)
b. former prime minister Sir Robert Muldoon (ICE-NZ)
c. the Prime Minister Mr John Major (ICE-GB:S2B-012 #10:1:A)
Prime Minister is well established as both a full title and pseudo-title in ICE-NZ
and ICE-East Africa. However, in ICE-GB it occurred very infrequently as a full
title, most frequently in an appositive construction, and never as a pseudo-title.
These findings are consistent with other findings in this study: that of all the
varieties analyzed, British press English is the most conservative and the most
resistant to change.
The larger explanation for this finding lies in another of Bell’s (1988: 342)
predictions, namely that “as long as the prestige media within a country hold to
determiner deletion, the rule will keep its social force.” This study has demon-
<DEST "mey-n*">
Pseudo-Titles 165
strated quite strongly that the British-based stigma against pseudo-titles has
virtually disappeared world-wide, even in those varieties, such as East African,
Jamaican, New Zealand, and Singapore English, that are British influenced.
There still exist isolated newspapers — the New York Times in the United States,
the Press in New Zealand — that still follow the British model. But outside
Great Britain, pseudo-titles are quite common — so common, in fact, that they
were used more frequently in the New Zealand and Philippine samples than in
the American samples, even though pseudo-titles originated in American
English. And in the New Zealand and Philippine samples new types of pseudo-
titles are evolving that are longer than pseudo-titles in the other varieties.
Notes
* I wish to thank Dr. Jie Chen, Computing Services, University of Massachusetts at Boston,
for help with the statistical tests discussed in this paper.
1. Examples taken from ICE components are identified simply by reference to the particular
component. Full citations are given for examples taken from sources outside the ICE
components.
2. Two statistical tests were applied to the results discussed in this section: chi square and
likelihood ratio. The likelihood ratio test (which is sometimes refereed to as the “log
likelihood” or G2 test) is considered more reliable for lower frequencies, as is the case for the
data in this paper (cf. Dunning 1993 for details).
3. It is not possible to obtain valid statistical results for tests such as chi square when frequen-
cies fall below 5, as was the case with the category of exact equivalence in ICE-USA, Phil, and
NZ. To increase frequencies, I therefore combined the results for the categories of exact
equivalence and determiner deletion, since these categories represent very similar stylistic
choices (e.g. Dan Quayle, former Vice President or Dan Quayle, the former Vice President).
4. Bell (1988: 336) cites other linguistic criteria favoring or disfavoring conversion to a
pseudo-title not investigated in this study, such as the articles a and the favoring conversion.
5. A third statistical test, Kruskal Wallis, is included here because this test is less sensitive to low
cell frequencies (e.g ICE-USA had only two instances of pseudo-titles lengthier than five words).
6. Because ICE-GB is fully tagged and parsed, it is relatively easy to search the whole corpus
and find instances of pseudo-titles. However, ICE-East Africa and ICE-NZ exist only in
lexical form. Therefore, these components were excluded from this analysis, since it was not
possible to automatically search them for pseudo-titles
7. Because ICE-East Africa is longer than the other corpora (ca. 1,300,000 words), the figures
in Table 6 list number of occurrences per one million words.
</TARGET "mey">
References
Bell, A. 1988. “The British Base and the American Connection in New Zealand Media
English.” American Speech 63: 326–44.
Biber, D., Johansson, S., Leech, G., Conrad, S. and E. Finegan 1999. Longman Grammar of
Spoken and Written English. Essex: Pearson Education Limited.
Crystal, D. 1997. English as a Global Language. Cambridge: Cambridge University Press.
Dunning, T. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence.”
Computational Linguistics 19(1). 61–74.
Meyer, C.F. 1992. Apposition in Contemporary English. Cambridge: Cambridge University Press.
Nelson, G. 1996. “Markup Systems.” Comparing English Worldwide: The International Corpus
of English, S. Greenbaum (ed.), 36–53. Oxford: Clarendon Press.
Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. 1985. A Comprehensive Grammar of the
English Language. London: Longman.
Rydén, M. 1975. “Noun-Name Collocations in British Newspaper Language.” Studia
Neophilologica 67: 14–39.
Peters, P. 2001. “The Influence of American English on Australian and British English.”
Who’s Centric Now? The State of Postcolonial Englishes, B. Moore (ed.), 297–309. Oxford:
Oxford University Press.
Siegal, A. M. and Connolly, W. G. 1999. The New York Times Manual of Style and Usage,
Revised and Expanded Edition. New York: Times Books.
<TARGET "hun" DOCINFO
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 9
Susan Hunston
University of Birmingham
2. Pattern grammar
To this we might add that, when observing raw corpus data, traditional gram-
matical categories (such as ‘direct object’, ‘indirect object’, ‘noun clause’ and
‘extraposed clause’) might be unnecessary or even unhelpful (Hunston &
Francis 1998; 1999).
A pattern is a sequence of grammar words, word types or clause types which
co-occur with a given lexical item. An item may be said to control or ‘have’ a
pattern if the pattern occurs frequently and is dependent on the item in
question. Patterns are observable through concordance lines, though intuition
is also involved in deciding on dependency. Below are sets of concordance lines
for the verb decide, showing the patterns this verb controls.1 (For reasons of
space, only five lines for each pattern are shown: this masks the comparative
frequency of the patterns).
V that
when Bartoli was a free agent, he decided all he had to do was play a
jobs. The letter reads: ‘I have decided it is necessary to draw your
s postwar ‘economic miracle”. MITI decided that the computer industry ha
on World War II until in college I decided that I wanted to become a Mar
with infertility treatment, Lorna ‘decided that it really didn’t matter
V wh
another face, for a while. I must decide if I want my old one, or a nic
pregnancy” not Profet’s list to decide what’s best for her. Willi
are in Minnesota as tourists. They decided what they wanted to do and wh
to sent short messages and Glen decides whether or not he’s going to
A trial is set for December 4 to decide who will get permanent custody
170 Susan Hunston
V wh to-inf
of the issues facing charities when deciding how to utilise their investm
would be. When you’re a teenager, deciding what to do with your life, y
who, before Royan last week, had not decided whether to start Almox Ratina
up to the state attorney general to decide whether to appeal the judge’s
demolished.” The bank must now decide whether to sell the building,
V to-inf
So the churchwardens have decided no longer to pay the quota, a
Register office wedding. People decide to marry at a register office
the parents. <M01> And what did you decide to study? <M02>I-
ovulatory problems). When a couple decides to have a child, it is a decisio
availability of the plants. So he decided to start his own nursery, from
be V-ed
blabbered.’ The itinerary was decided at the highest level. The Hom
cus race in Santa Cruz County that was decided by 25 votes. There were, you
interpreters, the matter may be decided for you by the coupling.
Allende’s election itself was decided in the Chilean congress again
1), 7-5, 6-3. The match will be decided tomorrow,” he said of the dou
it be V-ed that
of the Battle of Britain. So it was decided that the celebrations would be
babes. For some reason, it was decided that the latest from the Mexican
completely done by now, but it was decided that rather than put Brett in da
give up his life so easily. It was decided that the only place to treat him
out from monetary union it has been decided that eight different coins will
To show the range of words that might control one pattern, below are
concordance lines showing some of the adjectives occurring in the pattern it
v-link ADJ that (referred to in Biber et al 1999 as an extraposed that-clause).
(Again for reasons of space, only a maximum of two lines for each adjective are
shown.)
ill Constable Jones, Mr Casey said It is apparent that fate intervened th
orses. Elegance being a key factor, it was appropriate that Joanne
all I can tell you.” From his tone it was clear that Dick Ryle had had
in Mecca to perform the haj. It is clear that the revolution in mas
National Union of Students said that it is crucial that universities give a
only to add the fateful words: ‘But it is essential that we end it in such
e Woodgate Valley Country Park, and it was fitting that two of the city’s
health clinic, or any hospital. It is important that the woman
is sewn up is not so important; but it is important that North Korea should
impossible. At the same time it is inevitable that those at home,
ters to Stick Letters <p> I suppose it was inevitable that this passionate
him to act so out of character? It was ironic that Penelope’s insistence
co-operation, however he said it was likely that Germany would have t
is a lethal muscle wasting disease. It is likely that any child with this
ll enjoyment of this exquisite poem, it is necessary that the reader should
a camera attached and working. It is obvious that the chance of a
which it had been contracted here, it was obvious that that was not what
<M02> And we asked them and it was overwhelming that- the the
portant parts of every relationship. It is possible that your partner’s mil
s a template for building proteins — it is possible that the FraX protein h
exchange transactions in London. It is revealing that the Socialists wh
st past the end of the year. Indeed it was significant that the Jakarta
n euthanasia is a guess at best. It is surprising that, following an
others. <p> Some doctors have said it is suspicious that the pills named
s ever as simple as presented. While it is true that vertical integration
Europe will require a high-wire act. It is true that Malcolm Rifkind, the
Pattern grammar, language teaching, and linguistic variation 171
et to him, but he couldn’t help it. It was typical that Robyn would have
who said: ‘In a civilised society, it is unacceptable that women are
in this Year of Remembrance. It is unfortunate that the article made
anti-government protests. He said it was unfortunate that a number of
More recent work has suggested that it is unlikely that family boundary
to testify. In retrospect, it is unlikely that a US court would
Below are individual examples of words and their patterns (in each case the
word with the pattern is underlined; in the coding, the symbol for the word
with the pattern is in capitals):
i. Crowds of near hysterical men jostled their way through to try to find news
of their wives and families. V way prep/adv
ii. He instructed family members in nursing techniques. V n in n
iii. Japan’s industrial output increased by 2%. V by amount
iv. The mood in Japan is changing and candidates want to identify themselves
with reform. V pron-refl with n
v. I can be very rude to motorists who hoot at me. V at n
vi. It’s an honour to finally work with her. it v-link N to-inf
vii. He was too high on drugs and alcohol to remember them. ADJ on n
viii. Do they have a chance of beating Australia? N of -ing
ix. We played that record all night long. n ADV
x. …a thinly disguised attack ADV -ed
xi. There’d be no telling how John would react to such news as this. DET n as n
xii. She let the dogs into the house and fed them. v PRON
2. attempts to do or get something: attempt, bash, chance, crack, effort, go, shot,
stab, tilt, try
Mr Downer said that he may one day get another chance at the leadership.
3. looking at someone or something, physically or metaphorically: glance,
glimpse, look, peek, smile
The island is bigger than a first glance at the map indicates.
4. someone is good or experienced at a particular activity: dab hand, expert,
genius, master, novice, old hand, past master, whizz, wizard
Dickens was a genius at creating characters of great depth and this film is
peppered with them.
5. critical comments: dig, protest, side-swipe
It’s a none-too-subtle dig at the officials of the Brisbane and Canberra clubs.
3.1 Accuracy
Patterns are important to language production in terms of both accuracy and
fluency. Even advanced learners tend to have imperfect control over patterns;
in fact, in the case of very advanced learners, pattern use is perhaps the greatest
source of a sense of non-idiomaticity in English. Below are some examples, with
the kind of advice the teacher might offer, based on the association of pattern
and meaning.
Teachers … discourage students to try to use the target language to express their
own ideas.
The verb discourage is not used with this pattern (though its opposite encourage
is). The correct pattern is ‘verb + noun + from + -ing’, so the phrase should read
discourage students from trying…. The teacher could point out that discourage is
similar in meaning to stop and prevent, which have the same pattern.
Criminals will find it difficult to evade from being arrested.
174 Susan Hunston
The verb evade is not used with this pattern (though escape is). The most likely
alternative is evade arrest (with the pattern ‘verb + noun’). The pattern ‘verb +
-ing’ is also possible (evade being arrested) but is much less frequent. The better-
known verb avoid also has these two patterns.
Not all undergraduates are given the privilege to stay in university accommodation.
The noun privilege is rarely used with this pattern (though the pattern ‘it + link
verb + noun + to infinitive’, as in It’s a privilege to meet you, is common). Much
more frequently found is the pattern ‘noun + of + -ing’ (the privilege of staying).
The nouns advantage, benefit, distinction, gift, honour, luxury and pleasure are
also used with this pattern.
3.2 Fluency
Control over patterns can be said to aid fluency as well as accuracy. This is
because if a word with its pattern has been learnt the learner can produce, not
just one word, but a series of words, a phrase, together. A single mental effort
produces a whole string of language. For example, here is a native speaker of
English talking about his addiction to cigarettes:
Pattern grammar, language teaching, and linguistic variation 175
My nan sometimes says to me that I get really moody when I don’t have a cigarette
and I keep snapping at her she says but I try not to do it but I just keep doing it
and then she gives me a cigarette.
Each of the verbs in this short extract has a pattern, which translates into a
recognisable phrase:
say V to n that says to me that
get V adj get really moody
have Vn don’t have a cigarette
keep V — ing keep snapping; keep doing
snap V at n snapping at her
try V to-inf try not to do
do Vn do it; doing it
give Vnn gives me a cigarette
Together, these phrases, which are not fixed lexically but are not random
either, make up a large proportion of the utterance. The speaker has produced a
novel utterance by putting together patterns belonging to the individual words
that are used. So, although a learner may never have heard or said keep snapping
at her before, it can be produced without hesitation by putting together the
pattern of keep (keep snapping) with the pattern of snap (snapping at her).
One way of interpreting fluency is as what has been called ‘pattern flow’.
When a word that is part of a pattern has a pattern of its own, the result is flow
from one pattern to the other. It is possible to show this diagrammatically, as in
the example below.
In this example, the lexical items tend, think, wrong and arm demonstrate a
typical behaviour. This gives the sequence I tend to think that it’d be wrong a sense
of naturalness and familiarity, such as might be associated with a fixed phrase that
is chosen by the speaker as a single item, rather than being constructed from the
raw materials of lexis and grammar. Yet the sequence is not frequently met (there
are no instances in the Bank of English corpus). It is not a single choice but
might be seen as a series of choices, each arising from the one before.
176 Susan Hunston
3.4 Consciousness-raising
Exercises designed to raise learners’ awareness of pattern can involve pieces of
language taken out of context. Such exercises have the benefit of traditional
parsing exercises in that they encourage learners to identify the parts that make
up a sentence, but because they require recognition of surface features only
they make far fewer demands in terms of metalanguage. They also direct
attention to specific items such as individual prepositions as well as to general
categories such as ‘noun’. Here is one such exercise, with the instructions to
learners given first:
Here are two sets of sentences. Each sentence from the first set matches one from the second
set in that the word in bold has the same pattern. Match up the two sets. (For example,
sentence 1c matches 2a because in both the verb is followed by of and a noun — died of a
heart attack and complained of a headache.)
Pattern grammar, language teaching, and linguistic variation 177
Look at what you have written in Column 1. What kind of things do these words describe?
Look at what you have written in Column 4. What kind of things do these words describe?
178 Susan Hunston
The security door left the captain and the crew stuck in the cabin.
The work on identifying patterns to date has been done manually,3 most of it
by lexicographers compiling the Collins Cobuild English Dictionary (1995).
The patterns of around 20,000 words are given in that dictionary and/or in the
two major volumes of the ‘pattern grammar’ series. Now that this ground-
word has been done, it is a tractable problem to automate the identification of
patterns in running text. That is, a program can be written which, on
encountering a word, can check what patterns that word may have, and thus
can identify the elements of the pattern in the text (Mason and Hunston 2001).
There are various possible applications of this. Firstly, and most obviously, the
comparative frequency of patterns with individual lexical items can be
calculated. This would extend the work done in Biber et al (1998) which
compares, for example, the frequency of begin followed by a to-infinitive and
followed by an ‘-ing’ clause. Secondly, the relative frequency of different
patterns in various registers can be calculated. This would extend the work
described in Biber et al (1999) on complementation clauses, allowing a more
complete picture of verb, noun and adjective behaviour to emerge.
Finally, the connection between pattern and meaning opens the possibility
of quantifying ways of expressing meanings in different registers via the
concept of ‘local grammar’ (Barnbrook and Sinclair 1995; Hunston and
Sinclair 2000). A local grammar is a grammar that seeks to account for, not the
whole of a language, but one meaning only. One example is a grammar of
definitions (Barnbrook and Sinclair 1995), another is a grammar of evaluation
Pattern grammar, language teaching, and linguistic variation 179
(Hunston and Sinclair 2000). A grammar currently being written is that for
‘cause and effect’ (Allen 1999). Below are some examples of analysed expres-
sions of cause and effect. In each case, a sentence expressing causality is parsed
into semantic elements (‘cause’, ‘effect’ and ‘observer’). The parsing can be
done because a pattern is recognized along with one of a number of verbs or
nouns which use that pattern to express causality. In the first example, for
instance, identification of the verb lead with the pattern ‘noun1 + verb + noun2
+ to-infinitive’ is followed by a mapping of the meaning elements on to the
pattern (where ‘noun1’ = ‘cause’; ‘noun2 + to-infinitive’ = ‘effect’), allowing
the analysis to be made.
Drugs are certainly the cause of much crime but a large part of this is because of
their illegality.
Drugs = CAUSE
Much crime = EFFECT
IDENTIFYING PATTERN = noun…be…the cause of…noun
POSSIBLE NOUNS: agent; benefit; cause; consequence; effect; fruits; generator;
implications; legacy; outcome; product; result; root; secret; source
The effect of radiation is to shift the transition from ductile to brittle behaviour
to a higher temperature.
Radiation = CAUSE
Shift the transition…to a higher temperature = EFFECT
IDENTIFYING PATTERN = the effect of…noun…be…to-inf
POSSIBLE NOUNS: effect; result
180 Susan Hunston
The final exam determines whether you can sit for university entrance or not.
The final exam = CAUSE
Whether you can sit for university entrance or not = EFFECT
IDENTIFYING PATTERN = determine … wh
POSSIBLE VERBS: decide, determine, define, dictate, influence
Once a complete grammar of cause and effect is available, all instances can be
identified in a large corpus, and the frequency of instances in different registers
can be calculated.
An example may be given from a less complex meaning-type than cause and
effect (less complex in terms of the range of patterns used): the meaning
‘abstain from an action’ (see Francis et al 1996: 619–620). Typical realisations of
this meaning include:
‘verb + -ing’, with verbs such as avoid and (not) bother, as in avoided doing
the washing-up;
‘verb + to-infinitive’, with verbs such as (not) bother, fail, forget and refuse,
as in failed to do the washing-up;
‘verb + about + -ing’, with verbs such as forget and (not) bother, as in forgot
about doing the washing-up;
‘verb + from + noun or -ing’, with verbs such as abstain, desist, flinch,
recoil, refrain and shrink, as in refrained from house-work/doing the washing-
up;
‘verb + out of + noun or -ing’, with verbs such as drop, get and opt, as in
opted out of doing the washing-up.
corpus there are 436.4 instances per million words, taking all the
patterns together. The comparable figure for the spoken corpus is 85.
ii. Overall, the patterns with prepositions are less frequent than those with
non-finite clauses. The total frequency per million words over both
corpora is 135.4 for patterns with prepositions and 386 for patterns
with clauses. In both corpora, the patterns with a to-infinitive are more
frequent than any other group.
iii. The patterns with prepositions are especially infrequent in spoken
English. For example, the verbs with from occur a total of only 3.7 times
per million words. Many of the target verbs are not found in those
patterns at all in the spoken corpus. An exception to this general rule is
the expression GET out of, which occurs 34.3 times per million words
in the spoken corpus. In the Guardian corpus, most of the verb-pattern
combinations are found. Some of them are infrequent, but the relative-
ly large number of verbs pushes up the overall frequency. The verbs
with from, for example, occur a total of 36.4 times per million words.
iv. These verb-pattern combinations are much more frequent in the
Guardian than in the spoken corpus: ‘avoid + -ing’; ‘decline + to-
infinitive’; ‘fail + to-infinitive’; ‘refuse + to-infinitive’; ‘refrain + from +
n/ing’; ‘opt + out of + n/ing’.
v. These verb-pattern combinations are much more frequent in the spoken
corpus than in the Guardian corpus: ‘(not) bother + -ing’; ‘forget + to-
infinitive’; ‘(not) bother + about + n/ing’; ‘get + out of + n/ing’.
5. Conclusion
Notes
1. Concordance lines and examples are taken from the Bank of English corpus, currently
standing at over 400 million words, and jointly owned by HarperCollins publishers and the
University of Birmingham.
2. All examples appear to be from native speakers. Three examples occur in Australian
newspapers, suggesting a possibility that the pattern may stem from a regional variety. The
other examples come from books published in Britain (1), books published in the US (1),
and a British tabloid newspaper (1).
3. ‘Manually’ here means that the researchers examined the concordance lines and collo-
cational information for each word in turn, or for each pattern in turn. To this extent the
search was computer-assisted, but was not automatic.
References
Allen, C. 1999. A local grammar of cause and effect. Unpublished MA dissertation, Universi-
ty of Birmingham.
Barnbrook, G. and Sinclair, J. M. 1995. “Parsing Cobuild Entries”. In The Languages of
Definition: The Formalization of Dictionary Definitions for Natural Language Processing,
J. M. Sinclair, M. Hoelter and C. Peters (eds), 13–58. Luxembourg: Office for Official
Publications of the European Community.
Biber, D., Conrad, S. and Reppen, R. 1998. Corpus Linguistics: Investigating Language
Structure and Use. Cambridge: CUP.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of
Spoken and Written English. London: Longman.
Francis, G., Hunston, S. and Manning, E. 1996. Collins Cobuild Grammar Patterns 1: Verbs.
London: HarperCollins.
Francis, G., Manning, E. and Hunston, S. 1997. Verbs: Patterns and Practice. London:
HarperCollins.
Francis, G., Hunston, S. and Manning, E. 1998. Collins Cobuild Grammar Patterns 2: Nouns
and Adjectives. London: HarperCollins.
Hornby, A. S. 1954. A Guide to Patterns and Usage in English. London: OUP.
Hudson, R. 1984. Word Grammar. Oxford: Blackwell.
Hunston, S. and Francis, G. 1998. “Verbs observed: a corpus-driven pedagogic grammar”.
Applied Linguistics 19: 45–72.
Hunston, S. and Francis, G. 1999. Pattern Grammar: A Corpus-driven Approach to the Lexical
Grammar of English. Amsterdam: Benjamins.
Hunston S. and Sinclair J. M. 2000. ‘A local grammar of evaluation’. In Hunston S. and
Thompson G. (eds.) Evaluation in Text: Authorial Stance and the Construction of
Discourse, 75–101. Oxford: OUP.
Levin, B. 1993. English Verb Classes and Alternations: A Preliminary Investigation. Chicago:
The University of Chicago Press.
</TARGET "hun">
Lewis, M. 1993. The Lexical Approach: The State of ELT and a Way Forward. Hove: LTP.
Long, M. H. and Crooks, G. 1992. “Three approaches to task-based syllabus design” TESOL
Quarterly 26: 27–56.
Mason, O. and Hunston S. 2001. “The automatic recognition of verbs patterns: a feasibility
study”. Paper read at the COMPLEX conference, University of Birmingham, 2001.
Rudanko, J. 1996. Prepositions and Complement Clauses: a Syntactic and Semantic Study of
Verbs Governing Prepositions and Complement Clauses in Present-day English. New York:
State University of New York Press.
Sinclair, J. 1991. Corpus Concordance Collocation. Oxford: OUP.
Willis, D. 1990. Lexical Syllabuses. London: HarperCollins.
Willis D. and Willis J. 1996. “Consciousness-raising activities”. In Challenge and Change in
Language Teaching, J. Willis and D. Willis (eds), 63–76. Oxford: Heinemann.
<TARGET "p2"
</TARGET "p2">DOCINFO
AUTHOR ""
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Part II
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 10
1. Introduction
The reason for choosing these particular syntactic features is the regularity and
frequency with which they are identified in the literature as characteristic
features of Indian English. A corpus-based approach has been adopted for this
study, and a certain feature will be identified as characteristic of Indian English
if it is found to differ consistently (judged by its frequency of occurrence) from
British and American English. My main source of comparison of Indian English
with British and American Englishes is Biber et al. (1999).
Researchers of Indian English (Hosali, 1991; Kachru, 1976) make the claim
that the use of stative verbs in the progressive is a feature of Indian English.
Hosali claims that “there are certain features of English usage which are
widespread in India” (p.65), and that the use of stative verbs in the progressive
is one example. The illustration she provides is “Are you having a cold?” Kachru
(1976) claims that the “be + ing” verb constructions in Indian English seem to
“violate the selectional restriction applicable to such constructions in the native1
188 Chandrika K. Rogers (formerly Balasubramanian)
varieties of English, where members of the sub-class of verbs such as hear and
see do not occur in the progressive tenses” (p. 17). A possible reason for their
occurrence in Indian English, Kachru explains, is that the progressive form is
permissible in Hindi. One wonders, however, that without empirical support,
how such a claim may be made of Indian English in general, where there are so
many Indian languages (other than Hindi) influencing it. Also without empiri-
cal support, several other researchers (Bakshi, 1991; Lukmani, 1992; Verma,
1980) mention the same feature as being characteristic of Indian English. Some
examples that appear in the literature include the following:
1. a. I am not understanding the lesson
b. They were now knowing one another.
Schmied’s work is notable and different from other work on Indian English
because it is largely empirical. He conducted a study of the syntactic features of
Indian English on the Kohlapur Corpus, an untagged written corpus of Indian
English compiled in 1976. On the use of the progressive constructions in Indian
English, he states: “To tackle the question of whether ‘Indians’ tend to overuse
the progressive form, a broader retrieval form has been applied… And indeed,
although we have taken into account a certain error margin, the construction’s
frequency in the Kohlapur corpus exceeds that in the LOB corpus by far”
(p. 224). From Schmied’s statement, however, it is not clear whether he is
referring to all progressive forms, or to occurrences of stative verbs in the
progressive only.
Use of the present or past perfect in place of the simple past tense is another
feature that researchers often identify as characteristic of the English spoken in
India. This is explained by Verma (1980), as follows: “In English, the present
perfect establishes a link between the past and the present. It is not used in the
environment of the simple past. In Indian English, this distinction is neutral-
ized” (p. 80). Examples of this use (Shekar & Hegde, 1996; Verma, 1980)
include the following:
2. a. I have worked there in 1960
b. I have read this book yesterday.
c. I had been there last year.
With reference to the use of prepositions, researchers comment that Indian
English contains “errors” of three types: prepositions are deleted where essen-
tial, prepositions are inserted where inessential, and “wrong” prepositions are
used (Hosali, 1991; Bakshi, 1991). The following are examples of sentences to
illustrate Indian patterns of preposition use:
Syntactic features of Indian English 189
3. a. She said she would neither resign nor bow down to their demands.
b. The next course will commence from Monday, 8 January.
That speakers of Indian English do sometimes use features that differ from
British or American English is not debated. What is being questioned, however,
is whether any features are used consistently enough to warrant their being called
characteristics of any register of Indian English. Sahgal and Agnihotri (1985)
echo this opinion:
Verma (1980) claims that certain syntactic patterns have become so well
established in IndE that they get passed on from one generation to the next,
acquiring the status of stable dialectal innovations. He also claims that these
patterns differ systematically in a rule-governed way from the native varieties of
English. The frequency with which educated Indians use these patterns in their
actual behavior, is, however, an empirical question. (p.117)
It is this empirical question that this paper attempts to answer. The significance
of this study is that it is one of the first empirical investigations of what gram-
matical features are actually used in Indian English, and the extent to which they
occur across different registers.
2. The Corpus
A corpus was specifically compiled for this study because no complete and
adequate corpus of Indian English exists for a syntactic study of this kind. The
only exisiting comparable corpus, the Kohlapur Corpus, is untagged and now is
over 30 years old.
The corpus used for this study currently has only a written component. A
spoken component, however, is being added to it. Currently the corpus consists
of approximately 800,000 words in 11 different registers. The different registers
and the number of words in each are provided in Table 1 below. Gathering texts
for 9 of the registers entailed downloading material from the Internet. Materials
mainly included articles from Indian newspapers and magazines. Newspapers
and magazines from different parts of India were chosen in order to get as wide
a representation as possible of the different language backgrounds in India. In
order to have a reasonable basis of comparison, I tried to include as many of the
registers as possible from other well-known corpora such as the Brown Corpus,
and the Kohlapur Corpus. Texts were chosen only if I was sure they were written
by Indians in India. Once the texts were chosen, all names and any other forms
190 Chandrika K. Rogers (formerly Balasubramanian)
of identification were removed from them, and they were then saved as text files.
Appendix A describes each of the registers briefly.
Texts from the tenth register came from email messages written by
speakers of Indian English both to me and other speakers of Indian English.
The resulting sample includes a range of topics. All personal names and any
other identifying factors were removed from the emails, and they were then
saved as text files.
Texts for the eleventh register contributed about 96,000 words of Indian
fiction in English to the corpus. This fiction section included 27 short stories
originally written in 7 different languages. The short stories were all taken from
the Journal of Indian Literature, published in India. The stories were photocop-
ied from the journal and scanned into a computer. The scanned versions were
then checked for accuracy, and compared to their hard copies to make necessary
corrections. The versions on the computer were then also saved as text files. The
list of short stories, their authors, and other information about the authors and
their native languages is provided in Appendix A.
A limitation of the current collection of fiction is that all the short stories
were originally written in an Indian language and subsequently translated into
English either by the author or by a translator. It will be necessary to add fiction
originally written in English2 to the present collection, and to compare the
results of the present study with the results obtained by conducting a similar
analysis of fiction originally written in English. With the fiction added, the entire
corpus is represented in Table 1 below.
With all registers, once the files were saved as text files, they were all tagged
using Biber’s tagger. All analysis was conducted on the tagged versions of the texts.
A limitation of the corpus as it currently exists is that the sub-corpora are of
different sizes. In order to perform a thorough register analysis of the different
registers of Indian English represented in the corpus, it would be beneficial to have
sub-corpora of comparable sizes. However, it would be very difficult to greatly
increase the size of certain sub-corpora like Emails. The corpus is currently being
expanded to make the different sub-corpora as equal as possible. In any future
studies on this corpus, any difference in size between sub-corpora that might
still exist due to practical limitations will have to be noted.
Syntactic features of Indian English 191
3. Methodology
To conduct the necessary analysis of the three syntactic features under investi-
gation, two different computer programs were written. The first program was
used to analyze the first two features, and the second, to analyze the third feature.
In order to determine the frequency of occurrence of stative verbs in the
progressive, the first computer program generated KWIC concordance lines for
all instances where the six stative verbs that were under investigation (have,
know, want, like, hear, and look) occurred as finite progressive verbs. In order to
determine the patterns of occurrence of the present and past perfect, the first
computer program was then modified slightly so that it would generate KWIC
lines for all occurrences of the past or present aspect of any verb.
In order to determine what prepositional verbs occurred in written Indian
English, the second computer program generated a list of all the prepositional
verbs that occurred in the texts being analyzed, and a frequency of each of the
verbs.
This section provides details on the methods of analysis and then the results
obtained from the analysis of the three features in question.
192 Chandrika K. Rogers (formerly Balasubramanian)
Having 4
Knowing 1
Wanting 0
Liking 0
Hearing 1
Looking 2
Given their low occurrence in the progressive, the present analysis of written
Indian English (including fiction, which contains a fair amount of dialog)
therefore suggests that stative verbs do not occur in the progressive any more
frequently than they do in British or American English. An analysis of spoken
Indian English, might, however, reveal interesting differences.
(per 100 instances) counts of “Indian” patterns of past and present perfect.
Given the counts in the table above, a few points should be noted:
Out of the three “Indian” past perfect instances in the Travel section, two
came from the same text by the same author.
Most of the occurrences of “Indian” past and present perfect in fiction
occurred in direct speech. This is interesting in light of the fact that in fiction,
direct speech represents spoken language.
Emails had a higher proportion of “Indian” patterns of past and present
perfect relative to the other registers. This is also interesting given that Email is
a register that approximates some registers of spoken language in its degree of
informality.
What can be concluded from these results is that overall, the frequency of
the “Indian” present and past perfect is not sufficient to say that any pattern is
characteristic of written Indian English. More importantly, however, the results
of the current study direct my next inquiry to a thorough analysis of spoken
Indian English. Also, the present results point out the need to perform a
thorough register analysis of Indian English and determine whether certain
features are characteristic of certain registers.
Examples of “Indian” instances of the present and past perfect include the
following sentences. Following the sentences within parentheses are the short
stories the sentences came from.
6. a. I had tried to get two tickets but could not. (Fiction)
b. I had thought you were sleeping since then. (Fiction)
196 Chandrika K. Rogers (formerly Balasubramanian)
5. Conclusion
Based on these results obtained from this study, therefore, we can conclude that
there seem to be no differences between British and American English on the
one hand, and written Indian English on the other, with regard to the patterns
of occurrence of the present and past perfect. This conclusion can also be
extended to the occurrence of stative verbs in the progressive. My preliminary
analysis of Indian English fiction, however, has raised important questions
about the frequency of occurrence of the progressive verb form in general (as
opposed to the progressive form of stative verbs in particular). An analysis of
the proportion of progressives to the proportion of simple tenses in Indian
English is therefore a next logical step.
This study has also confirmed that with regard to prepositional verbs, and
perhaps the use of prepositions in general, written Indian English does differ
from British and American English. Further research is needed to confirm the
present conclusion, and also to determine whether spoken Indian English
shares these differences.
Appendix A
Regional News
Included in this register are news items from all over India. Articles were chosen if they
were written by Indians in India. Articles were not chosen if they were written by foreign
correspondents to the newspapers. Further, even if the articles were identified as having been
written by people in Pakistan, they were not chosen. This is because of my inability to
recognize if the authors were Indian or Pakistani, and the only language included in this
corpus is that produced by Indians.
Business
This section includes business news from all the newspapers and the magazine India
Today. Business news ranged from news regarding the status of the Reserve Bank of India to
the price of washing machines. Articles included in this section came from the business
sections of the newspapers and magazines examined.
Entertainment
This is a large section containing mostly news from the film world in India. The section
contains articles on films being produced, reviews of films already produced (which can be
compared to fiction), and also any articles on any film personalities. The language tends to
be more informal than that used in sections such as Regional News or Business.
Features
This is a section that contains articles both from magazines and newspapers. The articles
are on diverse subjects, from child-bearing issues to gardening, from religion to tips on doing
laundry. While a lot of these articles come from Femina, several newspapers (particularly the
Sunday editions) do have sections labeled “Features”, too.
Interviews
This is an interesting section providing the corpus with some spoken English (the
transcribed version). The interviews come mainly from the Rediff collection, but several
come from the film magazine, Filmfare. I made sure the interviewees and the interviewers
were both Indian (mostly names I recognize) before selecting a particular interview to be a
part of the corpus.
Letters to the editor
This section tends to contain more informal language than does a register like “Regional
News.” The letters are by Indians all over India to the editors of the various newspapers, and
deal with various issues of relevance to the common person. If the letters were chosen from
one of online-only publications, I made sure that the person writing the letter lived in India,
and not anywhere abroad. As far as possible, all names were deleted.
Editorials
This section contains articles by the editors of the various newspapers examined. The
nature of the writing is such that an issue is raised by the editor, followed by reactions or
responses by other people, who know something about the issue. Once again, editorials were
chosen only if the participants lived in India, and all names were deleted.
Sports
This section contained sports news from all the newspapers chosen. Once again, I made
sure that the news items concerned sports events in India. Even if the article dealt with the
performance of an Indian team abroad that article was not included. This is because I didn’t
want to include anything by a foreign correspondent.
Syntactic features of Indian English 199
Travel
This section includes travel information and information about restaurants and Indian
food mainly from two online publications — the Restaurant Guide and the India Travel
Guide. I made sure that the articles from both these online publications were by Indians who
currently lived in India.
Email
This section contains personal email messages. This register was included because it
makes an interesting connection between written and spoken language, and given the
informality of the register, I am curious to see if it contains any features that a more formal
register does not. In order to get as large a collection of email messages as possible, and
messages on a wide variety of subjects, I requested friends and family to forward me their
own messages. Any names present were deleted. While several of the friends and family the
messages were collected from live in the US at the moment, they have lived here for less than
a year.
Fiction
This section included approximately 96,000 words of Indian English fiction — short
stories. All the stories appeared in a journal named “Indian Fiction,” published in India. The
authors of all the stories are Indians who live in India. Like Emails, fiction, too, makes an
interesting addition to the current corpus, as it contains a lot of dialog. It therefore has some
representation of spoken language, and it would be interesting to see if it contains any
features that the more formal written registers do not. The table below provides details on
the individual stories in this collection of fiction.
The Profession Ismat Chughtai 4336 Urdu Abul Farooque Urdu short
story writer
The News Salam Bin Razak 2216 Urdu Author Short story
writer
Notes
1. Traditionally, the distinction between “native” and “non-native” was used to differentiate
people who speak English as a first language (such as people from Britain and America, for
</TARGET "chan">
example) from people who do not. Thus, this distinction tended to exclude people from
countries like India. However, this distinction has lost its popularity because it has become
clear that it is extremely difficult to define what “non-native” is. For countries like India and
Singapore today, it is difficult to draw boundaries between “native” and “non-native”
because, as Graddol, Leith & Swann put it, “some (notionally) non-native speakers become
familiar with English from an early age and use the language routinely” (p. 13). For detailed
discussions on the native/non-native distinction, see D’Souza (1997), Crystal (1995), and
Graddol, Leith, & Swann (1996).
2. Since this study was conducted, another study studying fiction originally written in
English by Indians has also been done. The latter study analyzed the second set of Indian
fiction for the same three features under examination in the present study. The latter study
revealed that the original English fiction did not differ from the translated fiction in any
substantial way with respect to the three grammatical features examined. This indicated that
perhaps due to the fact that the translation was done by Indians, it did not effect the results.
References
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 11
Eniko Csomay
San Diego State University
1. Background
2. Methodology
This section describes the design of the study and the analytical procedures. I
used a corpus-based analysis, which required decisions about the corpus of
texts, operational definitions of the text categories, and a careful selection of
linguistic features for analysis.
2.1 Corpus
A total of 176 academic lectures were selected from the T2K-SWAL Corpus,2 a
2.7 million word corpus of spoken and written academic discourse collected at
four universities in the United States (Biber, et al., 2001). All lectures were
audio-recorded with a tape-recorder placed at the front of the classroom near
the lecturer. Following the recording, tapes were transcribed based on pre-
defined transcribing conventions, and the texts were tagged for grammatical
features using Biber’s grammatical tagger.
Texts were classified according to two situational parameters: level of
instruction (low division, upper division, graduate) and interactivity (low,
medium, high). These categories are described further in 2.2. Table 1 shows the
distribution of texts according to the two categories.
Table 1.Distribution of lectures and number of words based on level of instruction and
interactivity
Level and Low level Upper level Graduate level Total
Interactivity (1–200) (3–400) (500-up)
High
Interactivity 15 108,108 23 144,232 23 220,324 61 472,664
Medium
Interactivity 20 133,824 25 177,715 16 126,804 61 438,343
Low
Interactivity 19 113,586 24 141,577 11 82,641 54 337,804
2.2 Definitions
“Academic lectures” in the present study is a cover term for various kinds of
teaching taking place in university classrooms. Two parameters — interactivity
and level of instruction — were used to classify academic lectures and investi-
gate linguistic variability within them.
The analysis was based on a preliminary categorization of interactivity taking
the normed counts of turn-taking patterns in a lecture. Accordingly, lectures
containing fewer than 10 turns per one thousand words were classified with low
interactivity, those lectures having more than 25 turns per a thousand words were
labeled as high interactivity lectures, and finally, the ones in between were put
into a medium interactivity group. Although satisfactory as an operational
definition for the present purposes, a more precise way to define interactivity is
needed for future research. I return to this need in the conclusion.
The level of instruction was defined by the course number available for each
lecture. Accordingly, 100 and 200 level courses (generally taken by first and
second year students), were considered as lower division 300 and 400 level
courses (generally taken by third and fourth year students), were considered
upper division, and classes with course numbers indicating 500 and above, were
classified as graduate level courses.
As outlined earlier, the primary goal of the present study is to investigate the
major patterns of variation within lectures by focusing on level of instruction
and interactivity. A two-way analysis of variance tests the significance of the two
main effects — interactivity and level of instruction — as well as the interaction
between these two variables as predictor variables for the combined scores of
the various linguistic features in the five feature groups.
As Table 2 shows, four of the five feature groups showed statistically
significant differences. In three feature groups the differences were related to
the degree of interactivity while in one feature group, they were related to the
level of instruction. On the other hand, no statistically significant differences
were found in the way the grammatical features appear in the different levels of
Variation in academic lectures 209
Table 3.Mean scores for informational focus by degree of interactivity and level of
instruction
Interactivity Level of instruction
OK, I have uh, well, I’ll just have to refine, my true limits to whatever degree of
precision that I need. And I can do this, to infinity if I have to. And so, and then
we change. See one, just a little bit. [WOB]
2: So you can fill it in however you want to?
1: No. You have to follow these directions.
2: You just you can change them to infinity.
1: And it’s not, we can, not if we want to, we can if we have to.
2: Oh.
(Solus51)
Informational focus — low interactivity lecture
Extract 2
1: So that’s our our final formula and then this uh VIJ, this Einstein coefficient,
this actual work here was done by Einstein originally, a very clever way of
research [a few unclear syllables] the way Einstein presented this. Um, it’s it’s just
a constant and we’ll come back to what that number is or what it should be in a
a moment or two. OK, so that’s uh that’s the (reduction) rate going up. Now once
the electron is in the uh upper state it can fall back down. The first we’ll say is the
spontaneous return which is just left on its own the electron will probably decay
back down to a lower state and the energy rate of return of [incomplete word]
photons back into the beam from this uh drop down will go by MJ which is the
number of particles that are in the upper state J.. uh each uh energy jump of
course is (H. Nu) and uh we’ll say that there’s a constant…
(Aslgg44)
The varying font styles show specific examples of the grammatical features
associated with this feature group. In the second extract (Extract 2) there are
clearly more nouns, prepositions, and attributive adjectives. A specific example
of a very dense informational package could be reflected for example, in
prepositional phrases with (multiple) ‘of ’ constructions, or nouns premodified
by nouns (noun-noun) constructions. While both constructions can be found
in low interactivity lectures (Extract 2: “the energy rate of … return of
[incomplete word] photons”, or “actual work here was done by Einstein” or
“energy rate” or “energy jump”) neither of them is present in high interactivity
lectures (Extract 1).
Classes with low interactivity have fewer turns than classes with high
interactivity. One speaker tends to hold the floor in these situations while, as
the extract shows, the text is denser in informational features. The focus in
these less interactive lectures seems to be information transmission, most
probably from the lecturer, while in more interactive classes knowledge may be
212 Eniko Csomay
Table 4.Mean scores for involved production by degree of interactivity and level of
instruction
Interactivity Level of instruction
Interestingly enough, this difference is the exact reverse of the ones reported
in the previous section. That is, classes with high interactivity display the most
of the grammatical features present in this feature group versus classes with low
interactivity. Recalling the fact that both this feature group (involved produc-
tion) and the one discussed earlier (informational focus) originate from Biber’s
Variation in academic lectures 213
(1988, 1995) single dimension (Dimension 1), the results are not surprising. In
fact, these results support the ones reported in Biber (1988, 1995) in that there
is a strong relationship in the way these two sets of grammatical features in
Dimension 1 relate to each other. The findings of the present study support the
idea that lectures vary providing strong evidence for not only the variation in
methods of delivery (degree of interactivity) but also the variation in which
information is conveyed in these varied settings.
Discussing involved production in more detail it is noteworthy that both
lower division and graduate lectures show a progressive change in the way the
grammatical features are present. That is, low interactivity lectures exhibit the
least number of grammatical features associated with this feature group
followed by medium and then high interactivity lectures. In upper division
classes, the distribution of the grammatical features associated with this feature
group is noticeably different. While there is a large difference between classes
with medium and high interactivity, almost no difference is present between
classes with medium and low interactivity. What is more interesting is that in
the upper level, classes with medium interactivity have fewer of the involved
production features than those with low interactivity.
The three extracts below illustrate the differences in the way the grammati-
cal features are present in the lectures with varying interactivity patterns.
Extract 3 is from a high interactivity graduate class, Extract 4 is from a medium
interactivity graduate class, and finally, Extract 6 is from a low interactivity
graduate class. Bold italicized words show present tense, bold regular (non-
italicized) words show private verbs, italicized underlined words with a single
line show first and second person singular pronouns, italicized underlined words
with a double line show contractions, capitalized non-italic words show DEM-
ONSTRATIVE PRONOUNS, capitalized italic words show BE AS COPULA,
and finally, words in parentheses denote [that0] deletion.
Graduate — high interactivity class
Extract 3
2: How open ARE those two conferences to student papers? When they, do the call
for papers do they
1: very. They never know you’re a student.
2: Right.
1: And, um, I presented papers when I was getting my Master’s degree. Um, and,
if you write a good abstract, it, yeah.
2: And, if you’re presenting a paper you can get funding, to go. But just to go for
yourself you don’t. But if you can BE on a panel, present a paper.
214 Eniko Csomay
1: Not much, but it helps. Yeah. Even if you don’t go this year, it’s something to
keep in mind for next year, the call for papers usually comes out in, April or,
March actually. And then it’s due in, May or something really early, you have to
plan almost a year ahead.
2: yeah.
1: But
2: But it’s worth it. And you could prob-I mean, I don’t know what this (dinners)
ARE like but if it’s like other things in the disciplines that I’ve been involved in, the
papers that have been known in this class I think could probably be made into
conference presentations. And THAT’s a great (bee-line).
2: What journal is the main triple A. L., (in)?
1: You don’t get a particular journal with your membership. Applied linguistics,
is, somewhat affiliated with it. And then there’s an international, applied
linguistics, organizational journal. But there’s no journal, there’s not like a, triple
A. L. journal. OK.
Graduate — medium interactivity class
Extract 4
1: There’s this, end, but, it’s just, I guess, along the way, it was like, it was
just reaching a level of absurdity. You can’t, you can’t replicate, the experience of
the other.
2: Ever. You can’t. [4 sylls], but you can never, you can never experience it, so, I
guess just taking it kind of, appreciation IS wrong, the wrong word but that.
1: Some criticism [3 sylls] from, came to from, a number of the African American
natives and others as well, who felt that you know here you could go.
2: And sometimes that has to be good enough. You know. It has. You can’t recreate
it and you don’t want to recreate it. And then I just kind of want to go back further
and say, when they interviewed her, …
(Polgn203)
Graduate — low interactivity class
Extract 5
1: …the story goes like this, when Duke recruited McD., they presumably offered
him something other than just a mere position because he had been at Harvard
and nonetheless he left Harvard and went to Duke and according to this story the
faculty remaining at Harvard made a joke about this. and the joke was that McD.
thought he was getting a University and Duke thought they were getting a
professor and they were both disappointed
2: nasty
1: I had anticipated uh having doctor K. here given that he has some connection
with Duke so that he would give this story but such IS life. he doesn’t he isn’t here
today. uh McD. was a very unusual psychologist uh he had been educated in
Variation in academic lectures 215
England but spent most of his career in this country. he was very interested in
strange kinds of behavior and in particular he was very interested in extra sensory
perception and things such as that and continued that line of work.
(Pslgg116)
in this feature group. The mean scores for each division are in Table 5. The post
hoc test results (Table 7, Appendix) show a statistically significant difference in
abstract style between medium, and high interactivity classes, where medium
interactivity classes have a higher number of the grammatical features associat-
ed with this feature group.
Extract 6 is taken from a medium interactivity lecture and displays a high
number of passive constructions associated with abstract style. In the extract
below, underlined are the agentless passives in this feature group, bold italicized
are the post-nominal passives and finally non-italicized are the by-passives.
Table 5.Mean scores for abstract style by degree of interactivity and level of instruction
Interactivity Level of instruction
equally high if one lexical item is repeated several times or many different
lexical items of the same grammatical type are present in texts. Alternatively, a
lexical distribution plot, which I will return to in the conclusion, could over-
come this problem.
Table 6.Mean scores for on-line elaboration by degree of interactivity and level of
instruction
Interactivity Level of instruction
and let’s say that uh thirty-eight percent of college graduates make more than fifty
thousand a year. and of people who are not college graduates let’s say that twenty-
two percent make more than fifty thousand a year. again, then between college
graduates and making more than fifty thousand dollars a year there is a positive
correlation. why? because percentage wise, more college graduates than non-college
graduates belong to this group. [unclear words] notice I’m not comparing college
graduates with each other, I’m not comparing this thirty-eight with the other
sixty-two percent, so it’s going to have positive correlation. this doesn’t have to be
more than fifty percent. it just has to be higher than people who don’t belong to
this group or things that don’t belong to this group or (whatever). OK? if that’s
positive correlation what would negative correlation be? it would be the opposite.
it’s where this number is smaller than this number.
(Humplleldhg119)
(Extract 9), they are mainly demonstrative pronouns rather than demonstrative
determiners, denoting a different type of discourse. Demonstrative pronouns
characterize conversations while a demonstrative determiner denotes singular
nouns following them.
Even more striking is the high number of verbs taking that complement
clauses in the graduate class versus in the undergraduate class. As can be seen in
Extract 9, they are extensively used in the graduate classes. The most commonly
used verbs in spoken registers with that deletion are also the ones that mark
stance. Whether being able to take and state your stance in graduate classes is
due to generally smaller class-sizes of this division remains an area to investigate
further. At the same time, the difference in language use may relate to a diverse
atmosphere triggered by power relations between teacher and student in
undergraduate versus graduate divisions. Students in the graduate classes may
experience a more collegial atmosphere. Alternatively, pedagogical goals and
approaches to transmitting information may differ in the two settings. This may
be reflected in the way students may be exposed to different types of questions,
triggering more or less elaborate responses. The difference in language use, in
fact, seems to support this claim.
The extract from a lower division class shows the way the teacher talks.
While explaining some concept to the students, the teacher asks some (rhetori-
cal) questions that s/he immediately gives the answer to without waiting for
the students to work out the answers. In contrast, the student is talking in the
graduate class (Extract 9), hence using the grammatical structures mentioned
above. The prompt to which the student is responding in the graduate business
management class is “How did you like the poem?”. Through this question, the
students are suggested to give an opinion, an evaluation of the poem read. In
contrast, the prompts in the upper division class are, “What needs to be
done?”, and “What guests, add what for what purpose?”. The questions posed
in the upper division class require phrasal or simple clause answers whereas the
question posed in the graduate class seems to trigger careful thought expressed
as a more elaborated answer. The two types of prompts seem to generate
qualitatively different answers reflecting contrasting attitudes to displaying
knowledge.
From the point of view of the quality of the “discussion”, the lower division
class seems to be very similar to the upper division class. The answer in the
upper division class simply responds to a display question, generating a pseudo-
interactional pattern, while in the graduate class, a genuine discussion seems to
Variation in academic lectures 221
4. Conclusion
The present study was carried out to investigate major patterns of variation
within lectures by focusing on level of instruction and interactivity. Patterns
222 Eniko Csomay
Appendix
Notes
1. In this study ‘oral’ refers to stereotypical speaking such as conversation, and ‘literate’ refers
to stereotypical writing as in academic prose (Biber, 1988, 1995).
2. I would like to thank the Educational and Testing Services (ETS) for the permission to use
the corpus for research purposes.
3. For example, two lectures with identical turn-taking patterns (e.g., five turns each) can
display varying turn-length measures (number of words in each turn).
References
Biber, D. 1988. Variation across Speech and Writing. New York: Cambridge University Press.
Biber, D. 1995. Dimensions of Register Variation. New York: Cambridge University Press.
</TARGET "cso">
Biber, D., S. Johansson, G. Leech, S. Conrad, E. Finegan. 1999. Longman Grammar of Spoken
and Written English. New York: Longman.
Biber, D., R. Reppen, V.Clark, J. Walter. (2001). Representing spoken language in university
settings: The design and construction of the spoken component of the T2K-SWAL
Corpus. In R.Simpson and J.Swales (eds.) Corpus Linguistics in North America.
pp. 48–57. Ann Arbor MI: University of Michigan Press.
Chaudron, C. and J. C. Richards. 1986. The effect of discourse markers on the comprehen-
sion of lectures. Applied Linguistics 7/2:113–127.
Conrad, S. 1996. Academic discourse in two disciplines: professional writing and student
development in biology and history. Unpublished Doctoral Dissertation: Northern
Arizona University.
Csomay, E. 2000. Academic lectures: An interface of an oral/literate continuum. NovELTy,
7/3: 30–47.
DeCarrico, J., and J. R. Nattinger. 1988. Lexical phrases for the comprehension of academic
lectures. ESP Journal, 7:91–102.
Dudley-Evans, T. 1994. Variations in the discourse patterns favored by different disciplines
and their pedagogical implications. In J. Flowerdew (ed.) Academic Listening: Research
Perspectives. pp. 146–158. New York: Cambridge University Press.
Flowerdew, J. 1994. Academic Listening: Research Perspectives. New York: Cambridge
University Press.
Grabe, W. 1987. Contrastive rhetoric and text-type research. In U. Connor & R. B. Kaplan
(eds.) Writing across Languages: Analysis of L2 Texts. pp. 115–138Reading, M. A.:
Addison-Wesley Publishing Co.
Grabe, W. and R. B. Kaplan. 1996. The Theory and Practice of Writing. New York: Longman.
Hansen, C. 1994. Topic identification in lecture discourse. In J. Flowerdew (ed.) Academic
Listening: Research Perspectives. pp. 131–145. New York: Cambridge University Press.
Johns, A. 1997. Text, Role, and Context: Developing Academic Literacies. New York: Cam-
bridge University Press.
Leki, I. 1991. Twenty-five years of contrastive rhetoric: Text analysis and writing pedagogies.
TESOL Quarterly 25:123–43.
Nattinger, J.R. and J. DeCarrico. 1992. Lexical Phrases. New York: Cambridge University Press.
Olsen, L. A. and T. N. Huckin. 1990. Point-driven understanding in engineering lecture
comprehension. ESP Journal, 9/1:33–47.
Ruetten, M. K. 1986. Comprehending Academic Lectures. New York: Macmillan.
Swales, J. 1990. Genre Analysis. New York: Cambridge University Press.
Waters, A. 1996. A Review of Research into Needs in English for Academic Purposes of Rele-
vance to the North American Higher Education Context [TOEFL Monograph Series
MS-6]. Princeton, NJ: Educational Testing Service.
Young, L. 1994. University lectures — macro-structure and micro-features. In J. Flowerdew
(ed.) Academic Listening: Research Perspectives. pp. 159–176. New York: Cambridge
University Press.
Young, L. and B. Fitzgerald. 1982. Listening and Learning: Lectures. Rowley, MA: Newbury
House.
<TARGET "p3"
</TARGET "p3">DOCINFO
AUTHOR ""
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Part III
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 12
Susan M. Fitzmaurice
Northern Arizona University
of the second half of the eighteenth century. The corpus consists of sizeable
samples (about 40,000 words per genre) of the personal letters, essays, fiction
and drama written by Addison and his cohort, as well as figures who were
peripheral to this group and figures who were complete outsiders though they
were contemporaries of one or more members of the group. The corpus
includes both women and men in order to afford the examination of sex
differences in language in a range of registers practiced at the time. Every
attempt has been made to include examples of every genre or style produced by
a writer to facilitate the study of idiolect — the extent to which a writer’s
individual style shows up across registers — and to allow the in-depth study of
registers that are part of the literary and linguistic practice of the period. Thus
the writing of Matthew Prior, diplomat and poet, includes satirical dialogues —
a highly popular form of philosophical discourse in the seventeenth century —
in addition to more usual registers represented by letters and essays. Some
figures have very little variety in their writing; Sarah Churchill, Duchess of
Marlborough, is represented by her letters only while Lady Mary Wortley
Montagu’s writing is register-rich, including letters, essays, drama, and fiction.
By including in the corpus the writing of figures peripheral to and outside the
central Addison network, it is possible to compare the writing of the central
group with that of people not associated with them, and to assess the extent to
which the corpus as a whole represents a cluster of idiolects rather than a
homogeneous and continuous variety. The sample searched for the present
study consists of almost 500,000 words of prose. The prose belongs to a single
genre — the private letters of a central figure in the publication of the Spectator
periodical, Joseph Addison, and members of his circle. The letters are all from
attested correspondences. Some were transcribed from existing edited collec-
tions while others were transcribed from manuscript sources.1 The letters of
central figures in the network were readily available in good edited collections
— the letters of Alexander Pope, Jonathan Swift, Richard Steele, Joseph
Addison and Lady Mary Wortley Montagu have been the subject of editorial
interest since their own lifetimes. Letters of figures just as central to the network
but whose literary work received less attention in the twentieth century — poets
George Stepney and Matthew Prior — exist only in manuscript. Included in the
sample are the letters of figures more peripheral to the circle, like John Dryden
(peripheral because he died in 1700) and Daniel Defoe (who had no connection
to what we might call the inner circle dominated by Addison and Steele).2 In
addition, I have included the letters of complete outsiders for the purposes of
comparison; poet and playwright Aphra Behn was known to Dryden but died
Structural ambiguity in eighteenth-century English 229
in 1688 before most of the circle reached maturity. Sarah Churchill, Duchess of
Marlborough, although a contemporary of many of the key figures, was not
connected to any of them as far as I am aware.3 The letters selected for study
were written over a 100-year period, between 1650–1762. Table 1 lists the
writers, their dates of birth and death, and the number of words collected for
this study. The birthdates of the writers place them into three time zones or
generations, a sizeable central generation that clusters around Addison, and two
smaller, peripheral generations on either side. Thus Dryden and Behn, both
born before 1650, belong to the first time zone, and Pope and Lady Mary
Wortley Montagu, both born in the late 1680s, occupy the third zone. Addison
and his cohort, all born in the 1660s and 1670s, inhabit the central time zone.
marker to could be isolated from its verb (Fischer, 2000, Fitzmaurice, 2000a). It
has become clear that infinitives split by negatives are appearing in a range of
spoken contexts in American English. The utterances in (1), transcribed from
recent television broadcasts, illustrate one such context (Fitzmaurice, 2000b):
(1) a. We will send enough troops to not let Macedonia shut down its
borders. (William Cohen, NBC Today, April 5, 1999)
b. You have to learn to not let it start. (‘The Puzzle Place’, PBS TV,
March 16,1999)
(2) a. We will send enough troops not to let Macedonia shut down its bor-
ders
b. You have to learn not to let it start
In the attested utterances in (1), the placement of not after the infinitive marker
to directly adjacent to the verb that falls within its scope results in an emphatic
effect. The source of this effect is arguably the purposive force invested in the
infinitive marker as it is split from the main verb by not. The alternative and
much more usual construction of negative infinitives is illustrated by the
(unattested) examples in (2). Because the anchor of the modern English verbal
system is the auxiliary, and because the auxiliary is central to the construction
of sentence forms like negatives and interrogatives, the variability in the
placement of not in negative infinitives does not result in a change of meaning.
However, in a system in which the main verb rather than an auxiliary might be
used to construct a negative sentence, and which thus allows negation to be
expressed phrasally (and locally) rather than clausally (and globally), it is
possible for the placement of not to give rise to ambiguity of reference.
Given that the construction of negative infinitives with to preceding not and
the verb being negated in modern English results in the impression of the
emphatic, unmistakable expression of negation, it is tempting to consider
whether this split negative infinitive pattern has ever been used as a means to
avoid ambiguity of scope in historical varieties in which mixed systems such as
the one mentioned were productive. Work on the history of infinitive to
(Fischer (1997, 2000), Van Gelderen (1998)) has examined the occurrence in
earlier stages of English, particularly Middle English, of this kind of negative
split infinitive, but this work has not examined the discourse function of or
pragmatic motivation for this construction. In addition, the question of the
continuity or otherwise of the construction in English has not been considered
in any detail (though see Fischer, 1992; and Rissanen 1999 for brief comment).
This study therefore addresses the question of whether, and if so, to what
Structural ambiguity in eighteenth-century English 231
extent, the to not V pattern is attested in earlier periods in the history of the
English language. Our candidate period for this investigation is the late seven-
teenth and early eighteenth centuries.
In (3a), not is anchored by its attachment to dummy auxiliary do with the result
that it has scope not only over the verb know, but over the entire complement
of the verb. By contrast, in (3b), not has scope only over the verb it follows,
namely embellish. The effect of these different rules is basically the same: both
sentences are negative. There may not appear to be any practical or communi-
cative consequences of the difference between the two systems. However, the
negation of infinitive verb complements requires that the negator be under-
stood to have scope over the infinitive verb complement, rather than the matrix
verb. Consider (4a) as a case in which the reader may not be able to decide
immediately whether not negates the matrix verb stayed (as in 4ai) or whether
it negates the infinitive complement governed by stayed (as in 4aii):
(4) a. Leonora stayed not to make him any reply, only tipped him upon
the arm, and bid him follow her at a convenient distance to avoid
232 Susan M. Fitzmaurice
The analysis offered in (4ai) has the entire verb phrase (including the embedded
infinitive) within the scope of not. By contrast, (4aii) restricts the scope of not to
the infinitive complement. The insertion of the purposive phrase ‘in order’
immediately before the infinitive marker to helps to resolve the ambiguity
encountered by the modern reader; the infinitive complement is adverbial in
structure and clearly purposive in force. This example attests the practice of
applying the do-less negative rule at the phrasal level and reading not as if it
were placed in the matrix clause (i.e. notionally raising it in order to read its
scope correctly). The example in (4b) looks very similar to that in (4a), except
that the object of the matrix verb admonished — the pronoun him — is explicit.
(4) b. His Wife admonished him not to think of Revenge, but to take care
of his Stock and his Soul: (Pope, A Full and True Account of A Hor-
rid and Barbarous Revenge by Poison on the Body of MR.
EDMUND CURLL, Bookseller [c 1716])
4bi =his wife did not admonish him [to think of revenge…]
4bii =his wife admonished him [to not think of revenge…] = V [not to
V]
The analyses offered in (4bi ) and (4bii) present the matrix verb either as being
negated or as not being negated. These interpretations do not differ except in
degree of force; it is possible to understand the scope of not in (4bi) to include
both the matrix verb admonish as well as the infinitive complement. This reading
would apply the do-less negative rule at the same time as allowing for not to
govern the infinitive. The analysis in (4bii) may be marginally preferable merely
because of the rhetorical balance created by the second infinitive introduced by
the adversative conjunction but. This example is different from (4a) because
both readings admit the negation of the infinitive complement, and because the
second infinitive clarifies the relation of object to the complement.
These examples illustrate the nature of the ambiguity or vagueness that
arises when two systems are in use in a variety at the same time. Interpreting
these constructions requires in part a practical understanding of which system
of negation is operating in a particular instance — the auxiliary-based, sen-
tential (global) system of present-day English, or the phrasal-based (local) one.
The difference between the two pivots on the development and regularization
of modals and dummy do as full-blown auxiliary verbs. To explore the extent to
Structural ambiguity in eighteenth-century English 233
which speakers compensate for the co-existence of the two systems, this study
examined the relationship of the recessive do-less and the regularizing
do-support negation rules in the late seventeenth and early eighteenth centuries
as a context for the syntax of negative infinitive VPs.
3. The study
The question for investigation is: Does the use of negation with infinitive VPs
in late seventeenth and early eighteenth century English appear to exhibit any
signs of pressure from the flux in the auxiliary system? I approach the answer to
this question by examining first the extent to which the recessive do-less
negative construction (as illustrated in (3b) above) co-occurs with the rule
involving dummy auxiliary do (as illustrated in (3a) above). The present study
provides a salient context for the examination of the grammatical patterns used
in constructing negative infinitives. One motivation for this investigation is the
speculation that writers adopt split negative infinitives (as illustrated in (2a)
above) in order to avoid the possible ambiguity presented by the mixture of
phrasal and clausal rules of negation. The overarching concern of this study
then is to ask to what extent if any the retreating do-less negation rule interacts
with the syntax of infinitive VPs.4
The constructions for investigation include a baseline set and a secondary
set. The baseline set consists of main and subordinate clause negative de-
claratives, with two major variants described below. The first consists of an
auxiliary verb plus not or the contraction n’t. The second consists of a main verb
plus not without auxiliary support.
auxiliary [do/modals/perfect/progressive] + not/’nt (e.g. 3a)
main verb + not (e.g. 3b)
The secondary set consists of post-predicate negative infinitive constructions.
The two salient variants are captured by the structural descriptions below. Note
that the first consists of not immediately followed by the infinitive marker to
and the infinitive verb. The second consists of the split negative infinitive, that
is, it consists of the infinitive marker to followed by not and the verb.
(V NP) not to V e.g. she begged him not to scold the children
(V NP) to not V e.g. she begged him to not scold the children
234 Susan M. Fitzmaurice
This study deployed the techniques of corpus linguistics in order to address the
question. The techniques and materials of corpus linguistics allow the investiga-
tion of constructions that are complementary at the same time as those that
potentially interact with one another. The texts were tagged using a program
developed by Douglas Biber, and key constructions were searched to produce
KWIC (Key Word in Context) concordance files for each search conducted.
The files were checked and erroneous instances discarded. The instances were
counted and the frequency counts normalized to occurrences per 10,000 words
for ease of presentation and interpretation.
4. Findings
30 V not
Frequency per 10,000 words
25 do not
20
15
10
5
0
W e
gu
n
r
n
hn
pe
oe
le
Ad y
y
Co t
ill
v
io
if
so
de
tle
ne
ee
re
ta
Po
ef
ch
Sw
Be
Pr
di
ry
or
ep
ng
St
on
D
ur
D
St
M
Ch
Writers
Figure 1.Do- and do-less negatives
Structural ambiguity in eighteenth-century English 235
system. Her use of do-support (11.54 per 10,000 words) compares well with
Steele (13.58), Stepney (10.94 ), Prior (13.86), and Congreve (13.45), and is
notably ahead of younger men like Swift (7.72), Pope (8.4) and Defoe (9.52
instances per 10,000 words). The most vigorous adherents to the regularizing
rule with do-support are not necessarily the youngest members of the sample;
Sarah Churchill, Duchess of Marlborough (1660–1744) and the youngest
member of the group, Lady Mary Wortley Montagu, are both frequent users,
but the most regular exponent of the new rule is Edward Wortley Montagu, an
exact contemporary of Addison and Steele. Sarah Churchill and Edward
Wortley are not literary figures at all, and the only texts that represent them
(their letters) are for the most part the work of communication rather than the
occasion for literary expression. Their frequencies suggest that these speakers
are both modern and that they tend to the vernacular (rather than to the
formal) in their practice.
Figure 2 presents an opportunity to compare the syntax of negation in main
and subordinate clauses. Dryden exhibits a greater preference for main clause
do-less negatives than for subordinate clause ones, and for do-supported
negatives in subordinate clauses than for the same constructions in main
clauses. This pattern is also evident in the language of Defoe, Swift and Pope.
While Addison, Behn and Wortley also seem to prefer do-supported negatives
in subordinate clauses over do-supported negatives in main clauses, they differ
from the writers mentioned in using do-less negatives more frequently in
subordinate clauses than in main clauses. The others — Prior, Congreve,
Montagu, Stepney and Churchill — clearly prefer do-supported negatives in
main clauses than the same constructions in subordinate clauses, and all (except
for Sarah Churchill) use the recessive pattern more frequently in main clauses
than in subordinate clauses.
The interpretation of the trends presented in Figures 1 and 2 is aided by a
closer examination of the guise assumed by the do-less pattern, that is, by
considering the choice and range of lexical verbs that speakers select for use
with the negative without do-support. After all, it seems clear that the do-less
negative system varies with respect to whether it is productive or merely
residual when we look at the verbs that appear to attract the auxiliary-less
negative, from the most frequent to the least frequent (Table 2).
The verb have is by far the most frequently occurring main verb in the
negative. Have and know occur in the do-less negative pattern in the letters of
every speaker in the group, and doubt appears in the texts of all except for Sarah
Churchill, Edward Wortley and Lady Mary Wortley Montagu. Indeed, Lady
236 Susan M. Fitzmaurice
20 main cl do + not
sub cl do + not
18 main cl V + not
sub cl V + not
16
Frequency per 10,000 words
14
12
10
8
6
4
2
0
u
n
r
ill
n
ve
hn
pe
oe
y
y
ift
le
io
ag
de
ne
le
so
ch
ee
re
ef
Po
Sw
Pr
Be
t
ry
di
ep
or
ur
on
St
ng
D
Ad
D
W
St
Ch
Co
M
Writers
93 53 34 8 5 4 3 2 2 1
Mary Wortley Montagu’s choice of verb is restricted to the pair have and know,
while her husband adds to this pair only seem and come. George Stepney’s verb
choice is limited to the three most frequently occurring verbs, have, doubt and
know; Steele adds to these central verbs a single instance of inquire; Prior adds
stay to the central trio, and Addison adds to the three core verbs his personal
favorite, question. In fact, Addison accounts for the total number of occurrences
of question in the do-less pattern. The low users of the do-less pattern are also
the most stereotypical users. That is, they do not use the pattern productively,
but restrict their choice of verbs to a very small set which they use repeatedly.
Some of the major contributors to the range and variety of verbs used in the
V + not pattern are not surprising; the most senior writer, Dryden, and Daniel
Defoe, one of the oldest members of Addison’s generation, are low do-negative
users and so we might expect them to demonstrate more variety and range in
their control of the recessive system. And indeed, as might be expected, the
most prolific user of the recessive rule, Dryden, is also the most productive. He
uses 27 different verbs at least once each in the V + not pattern, including
pretend, remember, flatter, go, and merit, and Defoe uses thirteen different verbs
in the pattern. Dryden’s near contemporary, Aphra Behn, follows the trend
Structural ambiguity in eighteenth-century English 237
characteristic of the younger speakers. She uses only five different verbs,
including have and doubt; she also has follow, come and suffer.5 Other verbs that
occur once in the do-less pattern include converse, give, live, value, find, please,
like, desire, discover, remember, pay, despair, say, continue, answer, deal, render,
drink and break.
The analysis of the baseline set of constructions indicates that the do-less
negative rule is indeed recessive and unproductive, and that the formation of
negative sentences with do-support is well established in the language of this
sample.
ones. Negative passive infinitives occur more frequently than variants of the NP
and adjective constructions with the additional apparatus of for. In addition, the
construction in which a governing verb precedes an optional NP and a conjunc-
tion occurs sufficiently frequently for me to note it. Below I give examples of
each type, and discuss them in more detail, both in terms of their relation to
one another, and with respect to whether the scope of the negative in these
patterns is ambiguous.7 Figure 3 illustrates the relative frequency with which
these patterns occur in the sample in graphic form.
4.2.1 V NP not to
The following examples illustrate some ways in which the pattern occurs in the
sample.
(5) a. for yt reason ye Elr had injoynd Count Harrack not to converse wth
ym upon these matters, but to conferr only with himself & his Feld
Marshall, (Stepney)
b. and I hope there will not a day appear to our lives end wherein there
will not appear some instance of an Affection not to be excelled but
in the Mansions of Eternity to which We may recommend Our
selves by our behaviour to each other Here. (Steele)
Structural ambiguity in eighteenth-century English 239
350
300
frequencies per 100,000 words
250
200
150
100
50
0
V+NP+ so/as X V+Adj V+ be Ved V (NP) V+for NP V Adj (for X)
not to as not to not to not to not to conj. not to not to
not to
infinitive patterns
The controlling verbs in (5a) — enjoin — and (5c) — conjure, beg — are speech
act verbs that are typical controlling verbs of post-predicate infinitives. Beg
occurs no fewer than seven times in this pattern in the sample as a whole;
conjure and advise occur three times, and bid occurs twice. Other speech act
verbs used in this pattern are address, beseech, relate, and charge. Steele’s
extremely complex string of negatives in (5b) includes a pair of clauses in each
of which the subject is existential there and the main verb is a verb of probabili-
ty, appear; in the first (main) clause as a verb governing the prepositional
phrase, and in the second (relative) clause as the controlling verb (5b) without
subject raising. The sentence may be construed thus: ‘I hope that the day when
an affection that can only be surpassed in heaven (which our behaviour on
earth guarantees our entry) appears before the end of our life’. Note that the
negatives cancel one another out to result in a markedly tentative expression of
regard for the addressee.
Every speaker uses this pattern more frequently than the others (except for
Pope who seems to prefer the intensive/comparative negative infinitive). Note
that the examples in (5) are cases in which the NP may be analysed as a subject
240 Susan M. Fitzmaurice
in the subordinate clause raised to object in the matrix clause. The presence of
the NP provides a block to interpreting the scope of the negator as including the
matrix verb. The NP provides an effective barrier, ensuring that the domain of
not includes the following infinitive. This pattern would seem to be a candidate
for alternative formulation as ‘V NP to not’, that is, in which not might split the
infinitive marker to from its verb. Such a variant in the historical sample would
provide for the unambiguous expression of emphatic negation. As already
noted, however, this variant does not occur at all.
4.2.2 V so X as (not) to Y
(6) a. This I should have acquainted you with about a fort night ago had I
not bin so much taken up as not to have had a spare-moment, and
that I knew my Intelligence of this kind would have bin of no Use to
You. (Addison)
b. Tis not unlike the happy friendship of a stay’d man and his wife,
who are seldom so fond as to hinder the business of the house from
going on all day, or so indolent as not to find consolation in each
other every evening. (Pope)
Characteristic of this pattern is the use of be as the verb preceding the adjectival
subject complement that governs the infinitive clause. The most remarkable
feature of the pattern is the use of the intensifier so (with its variant so much),
and the comparative as which requires a second instance immediately before
the negator. This pattern appears to be characteristic of the historical and social
variety of English encountered in the sample as it occurs in the language of
every speaker except for Dryden. In the same way that the first pattern (V NP
not to) has a built-in barrier to the extension of not’s scope to the matrix verb,
the intensifier/comparative construction also has material (X) that serves to
block the application of negation to the controlling verb.
4.2.4 V not to
This pattern is the most vulnerable to difficulty of interpretation in a variety in
which two different systems are in use. In this pattern the controlling verb is
immediately adjacent to not and is thus in danger of being analyzed as falling
within the scope of not if the system of negation understood to be operating is
the recessive system which allows post-verbal not. Consider the following
examples from the sample:
(8) a. And that you will please not to make so much a stranger of me an-
other time. (Dryden)
b. I come before a couple of Gentlemen who have greater fortunes than
myself and have endeavoured not to fall short of either of them in
my friendship to Your self: (Addison)
c. I don’t much like a Thing I heard of the Gentleman that R. begins
not to think well of, that Hee professes much to bee for the Protes-
tant Succession, but is not att all persuaded there is any Danger of
the P. of W., (Churchill)
In most cases it is not difficult to make a judgment about the most likely
interpretation of the combination of verb and not. Indeed it is possible to make
probabilistic statements about the role of the auxiliary system in the construc-
tion of negative infinitives on the basis of the writer’s preference for do--
supported or do-less negation. So the extent to which a writer uses do or
another auxiliary in constructing the negative will shape judgment of the likely
interpretation of a negative infinitive. For instance, Dryden is notable for using
do-less negatives most often, but the fact that he uses auxiliaries other than do
(much more frequently) suggests that negative infinitive constructions might
well not be ambiguous in his language. Indeed, the cotext in example (8a)
242 Susan M. Fitzmaurice
offers additional guidance, so that not belongs with to make rather than with
the matrix verb please. In this case, the controlling predicate, please, is suffi-
ciently conventional for Dryden’s expression not to be understood as occur-
ring in the negative.
A regular do-supported negative user like Sarah Churchill or Edward
Wortley is even less likely to construct infinitives governed by matrix verbs that
are not negated using do. So Sarah Churchill’s (8c), in which we have a do
negative (don’t much like) co-occurring with a negative infinitive (not to think),
should not present difficulty. However, the fact that Addison uses markedly few
instances of the do-less negative is useful information as we negotiate the
meaning of (8b), which is straightforwardly a negative infinitive rather than a
negated controlling verb.
It is interesting that not all the speakers in the sample use this pattern; it is
absent from Defoe’s corpus and from Behn’s corpus. Because they are two of
the earliest figures whose language is represented, it is tempting to suppose that
the absence of this pattern indicates some awareness of the interpretative
difficulties that it poses. The problem with jumping to such a conclusion is that
although Defoe might justifiably be judged thus on the basis of the extent to
which he uses the recessive rule for the formation of negative declaratives,
Behn’s evident preference for the regularizing rule with do-support does not
endorse such an interpretation. Jonathan Swift’s language also exhibits no
instances of this construction; actually as a sparse user of negative infinitives
compared with the other people in the sample, the lack of this pattern is not
surprising. The distribution of the pattern across the sample does not indicate
then that speakers appear aware of possible structural ambiguity in this pattern.
4.2.5.1 Be verb-ed not. The passive form of the first pattern (without agent)
seems somewhat formulaic in its appearance. Defoe favors the construction,
possibly because the avoidance of the agent expresses negative politeness. The
result is an obsequious tone. Addison also uses the construction more
frequently than his contemporaries.
(9) a. yet I humbly refer to your Lordship my former entreaty that your
Lordship will be pleased not to communicate to my Lord the favors I
Structural ambiguity in eighteenth-century English 243
have received from your Lordship, least perhaps it may cool the
inclination my Lord T-----r has been pleased to express of doeing
something for me. (Defoe)
b. besides, as a father, I hope I may be allowed not to Love in a less
Exalted and Sublime Manner, but a greater; (Defoe)
4.2.5.2 X Conj not to. This construction occurs very seldom; it is worth noting
for its role in creating rhetorically measured balance at the level of the clause.
(10) Only to give the Audience some light into the Character of Maskwell, before
his appearance; and not to convince Mellefont of his Treachery; (Congreve)
4.2.5.3 for NP not to. This is a variant of the pattern that occurs most frequently
in the sample, namely, NP not to. The variant itself appears only in the idiolect
of one person, Jonathan Swift.
(11) a. It is a point of wisdom too hard for me, not to look back with
vexation upon past management. (Swift)
5. Conclusion
Notes
1. I acknowledge the work of Sinthya Solera for transcribing George Stepney’s letters from
microfilm of the British Library manuscript collection, and the work of Sheila Williams, who
transcribed Matthew Prior’s letters from the microfilm of the Longleat manuscript collection
of his private letters and state papers. I am also very grateful to Jeanne Arete, my research
assistant, for her construction of significant sections of the corpus including the letters of
Addison, Defoe, Dryden, Swift, Pope, Lady Mary Wortley Montagu, Congreve, Aphra Behn,
and Sarah Churchill, Duchess of Marlborough. I am grateful to the Earl of Harrowby for
permission to use the unpublished letters of Edward Wortley. Last, but not least, I am
grateful to Randi Reppen for tagging the texts.
2. For an exploration of the nature of Joseph Addison’s inner circle, see Fitzmaurice (2000c).
3. In fact, in September 1704, Prior wrote a congratulatory letter to the Duchess of
Marlborough on the occasion of her husband’s victory at Blenheim over the ‘French and
Bavarians’, offering her a celebratory poem to commemorate the victory, entitled ‘A letter to
Monsieur Boileau Despreaux; Occasion’d by the Victory at Blenheim, 1704’. Prior asked the
Duchess to show the poem to Queen Anne. The Duchess was evidently unimpressed; Prior
wrote on the letter: ‘She sent back the letter unopen’d and said she was sure yt Mr Prior write
but what he would, He could not wish well to Her and her family’. [Prior Papers (Marquess
of Bath) vol 13: folio 55].
4. Rissanen (1999: 290) notes, ‘Somewhat surprisingly, this construction is rare in Early
Modern English and gains ground again only at the end of the eighteenth century. The most
common elements appearing between the to-particle and the infinitive are the negative
particle and adverbs of manner and degree’. The examples Rissanen offers are from More’s
Confutation of Tyndale (16th century) and Stapleton (17th century).
5. Confirmation of this claim for Behn’s syntactic modernity needs to be further tested, for
instance, by looking at her use of the progressive, perhaps relative clause markers, as well as
pronoun usage. It may be that she is conventional and thus old-fashioned in her lexis while
being quite modern in her syntax.
6. In addition to the searches reported here, I conducted a rapid search of the sample for
infinitives split by adverbs of manner such as well, heartily, and fast, and of degree such as
almost, entirely, and perfectly. I could find no occurrences of these adverbs in split infinitives.
7. The following table illustrates the difference in the occurrence of infinitive patterns in
present-day English and in the historical sample. NB. The comparison considers variants
rather than equivalents; Biber et al. do not consider negative infinitives. It remains to
compare the incidence of positive and negative infinitives in the historical sample. The table
shows the most frequent to the least frequent in each sample.
PDE (Biber et al., 698) Historical sample
V + to
V + NP + to, V NP not to
[V so X as (not) to Y]
[Adj not to]
V not to
246 Susan M. Fitzmaurice
References
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. The Longman Grammar
of Spoken and Written English, London: Pearson Longman.
Denison, D., 1993. Historical English Syntax: Verbal Constructions. London: Longman.
Ellegård, A., 1953. The Auxiliary Do: The Establishment and Regulation of its Use in English.
Stockholm: Almqvist & Wiksell.
Fischer, O., 1992. Syntax. In The Cambridge History of the English Language Volume II:
1066–1476. N. Blake (ed). Cambridge: Cambridge University Press. Pp. 207–408.
Fischer, O., 1997. The grammaticalisation of infinitival to in English compared with German
and Dutch. In Language History and Linguistic Modelling. A Festschrift for Jacek Fisiak on
his 60th Birthday. R. Hickey and S. Puppel (eds.). Berlin: Mouton. Pp. 265–80.
Fischer, O., 2000. Grammaticalisation: unidirectional, non-reversible? The case of to before
the infinitive in English. In Pathways of Change: Grammaticalization Processes in older
English. O Fischer, A. Rosenbach and D. Stein (eds.). Berlin: Mouton. Pp. 149–170.
Fitzmaurice, S., 2000a, Remarks on the de-grammaticalization of infinitival to in present-day
American English. In Pathways of Change: Grammaticalization Processes in older English.
O. Fischer, A. Rosenbach and D. Stein (eds.). Berlin: Mouton. Pp. 171–186.
Fitzmaurice, S., 2000b, The Great Leveler: the role of the spoken media in stylistic shift from
the colloquial to the conventional, American Speech, Vol. 75, No. 1, Spring 2000. 54–68.
Fitzmaurice, S., 2000c, The Spectator, the politics of social networks, and language standard-
isation in eighteenth-century England. In The Development of Standard English
1300–1800. L. Wright (ed.). Cambridge: Cambridge University Press. 195–218.
Gelderen, E. van., 1998, For to in the History of English. American Journal of Germanic
Language and Literature 10.1: 45–72.
</TARGET "fit">
Rissanen, M., 1999, ‘Syntax’, in Cambridge History of the English Language, Volume III:
1476–1776. R. Lass (ed). Cambridge: Cambridge University Press. 187–331.
Tieken-Boon Van Ostade, I., 1987. The Auxiliary Do in Eighteenth-century English: A
Sociohistorical-linguistic Approach. Dordrecht: Foris.
<LINK "gei-n*">
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Chapter 13
Christer Geisler
Uppsala University
1. Introduction
In this study, the dimension score analysis uses the sets of co-occurring gram-
matical features in Biber (1988: 102–103, algorithms on pp. 221–245). The
calculation of the dimension scores (or factor scores) follows the methodology
as outlined in Biber (1988: 93–97) and Biber (1995: 117–119). This involves the
following steps: (a) tagging the corpus (see below); (b) extracting feature
counts for each text sample in the corpus (these features are given in Table 1a);
(c) normalizing the feature counts per 1000 words; (d) standardizing the
normalized scores in (c) to a mean of 0.0 and a standard deviation of 1.0; (e)
computing dimension scores for each text sample by adding up the standard-
ized frequencies in (d) that have salient positive loadings and subtracting
salient negative loadings (if any) on a dimension. For example, a dimension
score for each sample on Dimension 3 is calculated by adding up the standard-
ized scores for the features wh-relatives, pied piping constructions and
nominalizations and subtracting the sum of the three features place adverbials,
time adverbials and other adverbs (see Table 1a). The scores for periods 1 to 3
in Tables 3 through 6 represent the means of the dimension scores across each
time period and register.
The interpretive labels of the four dimensions in the present study are taken
over from Biber’s previous studies:
Dimension 1: Involved versus informational production
Dimension 2: Narrative versus non-narrative concerns
Dimension 3: Elaborated reference versus situation-dependent reference
Dimension 5: Impersonal versus non-impersonal style
They reflect the situational, social, and cognitive functions shared by each set of
co-occurring of linguistic features. The relations among registers, or their
relative placement on a dimension, support these labels. In short, Biber &
Finegan (1989) identify three parameters of a textual dimension: Linguistic —
in being defined by sets of statistically co-occurring linguistic features;
Functional — in representing situational, social and cognitive functions;
Relative — in characterizing relations among different registers within a
dimension.
The CONCE corpus was tagged using Conexor’s EngCG-2 tagger
(http://www.conexor.fi). Words given multiple tags by the EngCG-2 tagger,
indicating that the tagger could not decide on a particular tag, were excluded
from the algorithms. There are approximately 19,000 ambiguous tags in the
Investigating register variation in nineteenth-century English 251
pronoun it
main verb be
causative subordination
discourse particles
indefinite pronouns
Hedges
Amplifiers
wh-questions
possibility modals
stranded prepositions
Negative features:
Nouns
word length
Prepositions
type/token ratio
attributive adjectives
252 Christer Geisler
Period 1 Number of 11 9 7 6 10 7 15 65
(1800–1830) text samples
Average text 1774 3507 5972 5133 12102 5399 4136 5300
size
Period 2 Number of 11 5 7 7 13 5 11 59
(1850–1870) text samples
Average text 1751 6122 5557 4332 10060 6268 5489 5790
size
Period 3 Number of 11 6 7 6 9 7 17 63
(1870–1900) text samples
Average text 1772 4997 4299 5065 10027 4329 3950 4725
size
Total text 33 20 21 19 32 19 43 187
samples
Total word- 58256 92161 110798 91510 342037 99435 189570 983767
count
Average text 1765 4608 5276 4816 10689 5233 4409 5261
size
significance with large sample sizes). As in Biber (1988: 127), we also give
R-square values for each dimension showing how much statistical variability is
due to register differences: For Dimension 1, R-square equals 0.788, which
means that 78.8% of the between-group variation is due to register. Table 2 also
provides the statistical groupings suggested by the Newman-Keuls multiple
comparison test, which is a post-hoc test comparing combinations of various
group means. For example, for Dimension 3 the Newman-Keuls test indicates
that there are statistically three major groups of registers (these are discussed
further in Section 3.3). In addition, in order to test statistical variation of a
particular register across the three time periods, Tables 3 to 6 give F-values, with
their associated probabilities and R-square values, for each register on a
dimension. A significant probability, with p. < 0.05 (for a 5% threshold),
indicates that changes in a register across the three time periods are statistically
significant. R-square values above 0.20 are regarded as indicating an important
relationship although they may not be statistically significant (cf. Biber &
Finegan 1989: 498). Statistically significant probabilities of an F-value or
important R-square values are italicized in Tables 3 through 6.
In this study, the dimension score analysis is also used to place a particular
register relative to other registers in CONCE. For instance, CONCE contains
Parliamentary debates and Trials which were originally included because they
represent aspects of speech-related discourse; Parliamentary Debates were also
included as specimens of political language of the time. As it turns out, only
Trials emerge as typically speech-based distributing on the “oral pole” on the
three major dimensions (Dimensions 1, 3, and 5). The Parliamentary debates
are similar to the expository Science and History registers in several respects.
This indicates that the present dimension score analysis has the additional
important function of locating a particular register relative to other registers in
the corpus. Register comparability is another problem when reviewing previous
studies. Biber (1995) makes a distinction between specialist expository registers
and others, whereas Biber & Finegan (1992) separate speech-based versus
written registers, and Biber & Finegan (1989) investigate drift in essays, fiction
and letters across several centuries.
Debates and Trials are the only two registers that contain statistically
significant changes across the three time periods: Table 3 shows that the
probability of the F-value for Debates equals 0.0082 and for Trials the probabili-
ty is less than 0.001. As a text example illustrating characteristic features of this
dimension, extract (1) is from the text with the highest Dimension 1 scores,
indicating a high degree of involvement.
(1) The right knee was bent in considerably, and he stood so as to rest him-
self upon his left leg?
— Yes, his left leg.
When did you see the claimant again?
— I think it was two years ago last August.
How came you to do so?
— I was driving to Alresford. I drove into the Swan Yard — the Swan
Hotel — and just as I got out of the trap the claimant was coming
across the yard, and he sang out, “Halloa, Powell, is that you?” and I
said yes.
He called out “Halloa, Powell, is that you?”
— Yes.
Had you any notice of his coming, or had he any notice of your com-
ing?
— No; I had no notice myself in the morning that he was coming.
And he had no notice of your coming?
— No.
And it was quite accidental meeting?
— Yes. [Text: T3tritic]
The significant scores for Debates are due to the large increase in period 3. The
main reason for the large increase is that the later Parliamentary debates are in
the first person with speaker assignment, whereas the earlier ones are third person
narrative accounts in the form of reported speech, as in extracts (2) and (3).
(2) (4.15.) COLONEL WARING (Down, N.): I had not the slightest inten-
tion to intervene in this Debate a few minutes ago, but when the hon.
Member for West Belfast, who has just sat down, gave such a direct
challenge to Irish Members to get up and support the Amendment of
the hon. Member for South Tyrone, I could not refuse to accept the
challenge. …I have the utmost respect for the opinion of the Marquess
of Salisbury; but does the hon. Gentleman believe that any remarks of
Lord Salisbury on the question would have the slightest effect either in
Investigating register variation in nineteenth-century English 257
These two extracts are different in that the first text in (2) is in the first person
(I had not…I could not…), whereas the second text in (3) is entirely in the third
person (Mr. W. Pole then rose…He observed that…).
Both Trials and Debates become more involved throughout the century. It
is clear from the dimension scores that only Trials come out as a speech-based
register: it is close to Drama on several dimensions (Dimensions 1, 3, and 5).
Since the Debates in periods 1 and 2 are mainly third person accounts of past
events in Parliament, the Debates register is in fact most closely associated with
the specialist/expository registers Science and History.
Among specialist/expository texts, the drift towards more informational
production noted by Biber & Finegan (1997) was not confirmed statistically;
all registers moved, although only slightly, towards more involved production,
as in Figure 1.
258 Christer Geisler
17
15.23
13.63
15.16
11.79
12
11.25
7
6.78 4.86
3.62
2.96
2
Debates
Drama
-1.39 -0.91 Fiction
-3 History
Letters
-4.84 Science
-4.94 -6.88
Trials
-8 -9.29
-10.72 -9.70
-11.37
-13
-18.15 -18.17
-18
-22.13
-23
Period 1 Period 2 Period 3
Figure 1.Mean dimension scores for Dimension 1 across three time periods.
Investigating register variation in nineteenth-century English 259
4.43
4.20
4.0
2.24
2.0 2.26
2.01
1.30 1.66
0.85
0.57
Debates
0.0 Drama
Fiction
-0.36 -0.31 History
-1.12 Letters
-1.08 Science
-1.51
Trials
-1.77 -1.94
-2.0 -2.08
-3.86
-4.0
-5.24
-5.45
-5.86
-6.0
Period 1 Period 2 Period 3
Figure 2.Mean dimension scores for Dimension 2 across three time periods.
[Q.] You have heard the examination of the other witnesses, have you?
— [A.] Yes.
[Q.] He told you the same dreams, did he, as they have stated?
[A.] Yes.
[Q.] Did he not say he was afraid of his wife objecting to it?
[A.] O, yes; he said he was afraid of her objecting to it, and he took a
very particular way of convincing her; he stated that he took her
wedding ring off her finger, and when she awoke, she made sad
work about it, and he suffered her to work on three or four days,
and he then told her that if she would keep the secret he was
going to entrust her with, he would restore her ring.
[Q.] Then he told her, did he?
— [A.] He forced her to vow to keep it secret, and then told her his
intention, and then she cried.
[Text: T1trimar]
Example (4) contains numerous past tense verb forms and third person
pronouns (italicized), and no-negation (no objection). Witness depositions and
cross-examinations clearly consist of a good deal of narrativeness. Fiction is the
prototypical register with narrative concerns, as in (5) with past tense verb
forms italicized.
(5) It was the beginning of October when she met Miss Wells, children, and
luggage at the station, and fairly was on her way to her home. She tried to
call it so, as a duty to Humfrey, but it gave her a pang every time, and in
effect she felt far less at home than when he and Sarah had stood in the
doorway to greet the arrivals. She had purposely fixed an hour when it
would be dark, so that she might receive no painful welcome; she wished
no one to greet her, she had rather they were mourning for their master.
[Text: T2ficyon]
Among the registers there are three (marginally four) that have statistically
significant dimension scores across the three time periods: Debates, History,
and Trials change through the time periods. For History an increase in narrative
scores between period 1 and 2 may be noticed in Figure 2, for Debates there is
a clear decrease throughout all three time periods. The Trials register is more
complicated, with a substantial drop in period 2 which actually accounts for the
statistically significant result for this register. The dimension scores for period
1 and 3 are roughly identical (4.20 for period 1 versus 4.43 for period 3 in
Table 4). The Trials register contains a similar radical shift in period 2 on
262 Christer Geisler
Dimension 3 (see Table 5 and Figure 3). Drama also changes towards less
narrative concerns; the probability of the F-value in Table 4 is above the 0.05
threshold, but the R-square value is fairly high at 0.26.
6.0
4.96
4.23
4.0
3.68
2.66 3.01
2.24 2.82
2.61
2.0
Debates
Drama
Fiction
History
Letters
0.87
Science
Trials
0.0
-0.33 -0.25
-2.0
-1.87 -1.91 -2.05
-2.45
-2.87
-3.00 -3.05
-4.0
Period 1 Period 2 Period 3
Figure 3.Mean dimension scores for Dimension 3 across three time periods.
264 Christer Geisler
In the CONCE data, Figure 3 shows that the registers on Dimension 3 form a
clear dichotomy between expository registers (History, Debates and Science
with overall positive dimension scores) and nonexpository registers (Drama,
Fiction, Letters and Trials with overall negative dimension scores). The
Newman-Keuls groupings in Table 2 suggest three subgroups of registers on
Dimension 3: (A) Debates and History, (B) History and Science, and a large
group consisting of (C) Drama, Fiction, Letters, and Trials. History is not
statistically different from Debates or Science. Debates have the highest dimen-
sion score in period 1 (4.96), indicating that this register contains numerous
postmodifying clauses (wh-relative clauses and pied piping constructions).
However, although this register is marked by a statistically significant drop in
referential elaboration across time, it nevertheless remains on the positive pole
of the dimension with dimension scores close to those of Science in period 3, as
in extract (7).
(7) For, without entering into the question whether the exact representation
of that House was altogether satisfactory, or discussing the inequalities
which Gentlemen continually urged who thought an equal representa-
tion would be a far preferable thing, he would say that the people of this
country were deeply and thoroughly attached to the present form of
government under which they lived; and he had always considered it a
condition in every reform, a condition which he thought had been happily
complied with hitherto, that the representation of that House, the mode
in which it was constituted, the mode in which the people elected their
representatives, should be compatible and consistent with a monarchy
and House of Lords, which, along with the House of Commons, were
fundamental and essential parts of our form of government.
[Text: T2debhan]
The dimension score analysis of the Parliamentary Debates shows that the texts in
period 3 are much more involved, less narrative and referentially more situated.
The only other important change on Dimension 3 concerns History, where
there is a significant increase in elaboration (from a mean dimension score of 2.66
in period 1 to 4.23 in period 3). This increase indicates that History texts have
more elaborated noun phrases toward the end of the century, as in example (8).
(8) And the course which he took to restore the authority of the British was,
even in the opinion of his detractors, the very best which he could have
taken. He telegraphed to Bombay for the soldiers which the peace with
Persia had freed from duty; he telegraphed to Madras and Ceylon for any
troops which could be spared from presidency or colony; he sent to
Singapore for the regiments which were on their way to China to punish
an aggression of the Chinese. [Text: T2hiswal]
None of the nonexpository registers vary significantly across the time periods.
As on Dimension 2, Trials jump in period 2 (with a fairly important R-square
value of 0.21 in Table 5). Biber & Finegan (1989 and 1992) and Biber
(1995: 294) found significant changes for Fiction towards more situation-
dependent reference. In CONCE as well, the mean dimension scores for Fiction
drop from -0.33 in period 1 to -2.87 in period 3, but this change is not
statistically significant (see Table 5).
4.70 4.18
4.35
4.0
3.93
2.67
2.00 2.43
2.0
1.64
Debates
1.28 Drama
Fiction
History
-0.05 Letters
0.0 Science
Trials
-0.27
-0.97
-1.33
-1.49
-1.83 -1.59
-2.0
-2.04
-2.24
-2.55 -2.73
-2.99
-4.0
Period 1 Period 2 Period 3
Figure 4.Mean dimension scores for Dimension 5 across three time periods.
different. All three expository registers (Science, History, and Debates) are
statistically distinct from each other. Biber & Finegan (1997: 269) also found the
expository registers to be fairly heterogeneous with regard to the impersonal
pole on Dimension 5 (cf. Kytö et al 2000: 91–92; Biber 1995: 133). The Science
texts have the highest scores (4.70 in Table 6) representing a register that has
clearly an abstract/technical focus, as in (9).
(9) The loan will certainly have ultimately to be repaid; but, at the time when
it is contracted, it acts with the same force as an export upon the country
which receives it, and with that of an import to the country which gives
it. In fact, the borrowing country exports its securities, which are import-
ed by the capitalists who lend. [Text: T2scigos]
The increases attested for Debates (from 1.64 in period 1 to 2.43 in period 3)
and History (from 2.00 in period 1 to 4.18 in period 3) are not statistically
significant (see Table 6 and the probabilities for Debates, 0.307, and History,
0.337). Three nonexpository registers, Drama, Fiction, and Letters, change
across the three time periods (see F-values and their associated probabilities in
Table 6). This finding partly corroborates the results in Biber & Finegan (1989
and 1992) of a “drift” in English registers. In CONCE, a change occurs towards
a more personal/nonabstract style among several nonexpository registers.
268 Christer Geisler
Biber & Finegan (1989: 512) found relatively little change across the
centuries among letters and the change was not statistically significant, Fiction,
on the other hand, changed significantly (1989: 499).
Note
* I thank Merja Kytö for valuable comments on previous versions of this paper. The research
reported here has been generously supported by travel grants from the Language Division,
Uppsala University.
References
Biber, D. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge:
Cambridge University Press.
Biber, D. and Finegan, E. 1989. Drift and the evolution of English style: A history of three
genres. Language 65: 487–517.
Biber, D. and Finegan, E. 1992. The linguistic evolution of five written and speech-based
English genres from the 17th to the 20th centuries. In History of Englishes: New methods
and interpretations in historical linguistics, M. Rissanen, O. Ihalainen, T. Nevalainen and
I. Taavitsainen (eds), 688–704. Berlin: Mouton de Gruyter.
Biber, D. and Finegan, E. 1997. Diachronic relations among speech-based and written
registers in English. In To explain the present: Studies in the changing English language in
</TARGET "gei">
AUTHOR ""
TITLE "Index"
KEYWORDS ""
WIDTH "150"
VOFFSET "4">
Index
A D
academic discipline 4, 5–14, 20, 22, dialect 28–29, 147–165, 187–188
111–127 see also American English, British
academic division 4, 6 English, ICE components, Indian
see ‘academic discipline’ English, Irish English
academic writing 111–127, 132–144, 168 dimensions of variation 204, 206–207,
American English 28, 49–70, 74, 147–148, 208–221, 250–270
155–162, 192–197, 229–231 downtoning 39–40, 43
appositives 148–163
E
B elaboration 204, 206, 207, 209, 217–218,
back channel 52–55, 262–265
BNC 57–58, 75–90, 115
British English 28, 49–70, 75–90, 155–164, F
192–197 female/male speech 3–4, 8–9
formulaic language 111–127
C freshman composition see student writing
Cambridge North American Spoken
Corpus 55–56 G
CANCODE 28, 45, 55–56 gender 3–4, 8–9, 27, 32, 270
CONCE 249, 252 genre 114–115
conversation 19, 28ff, 49–70, 74–75, see also register
132–135, 137–139, 168, 204
see interaction H
conversational analysis 50–54 hedging 3–4, 8–21, 25–45, 251
corpora historical variation 74–75, 227–246,
see: BNC, Cambridge North American 249–270
Spoken Corpus, CANCODE, humanities 5–14
CONCE, ICE,
Limerick, LOB, London-Lund, I
Longman Corpus of Spoken and ICE 115, 152–153
Written English, MICASE, impersonal 99–101, 265–268
T2K-SWAL indeterminacy 92, 94–96
274 Index
L Q
language teaching 27–28, 112–115, questions 33–35, 38, 41–42, 138
173–181
lectures 6–7, 203–223 R
letters 195, 227–246, 252, 255–270 radio phone-in 28–44
lexical bundle 132–144 register 29–36, 79–81, 134–135, 189–191
see also lexical phrase, pattern grammar responses in conversation 56–60
lexical phrase 111–115 Russian 91–108
see also lexical bundle
Limerick corpus 28, 30, 45 S
LOB 74–75, 115 science 5–14, 79–81, 88–89, 252, 255–270
London-Lund 74–75 social science 5–14, 79–81, 88–89,
Longman Corpus of Spoken and Written 111–127
English 74–76, 134–135 solidarity 17, 37–39
sort of 8–21
M stance 15, 19, 35, 41, 88, 220–222, 241
medical prose 111–127 stative verbs 187, 192–193
MICASE 5–8 student writing 132–144
modal verbs 30–44, 73–90, 93, 95–99,
101–104 T
modality 91–108 T2K-SWAL Corpus 205
technical prose 111–127
N see also science
narrative 30, 34, 193, 250, 258–262
negation 65–66, 79, 88, 91, 93, 99–101, V
104–108, 230–246, 261 verbs 99–101, 117–122, 168, 169–171,
172, 175, 178–181, 187, 192–194,
P 213–215, 233, 235–243, 251
passive voice 79, 81–88, 138, 207, see also modal verbs
215–217, 238, 242–243, 265 verb phrase 35–36, 81–86, 135, 138, 143
pattern grammar 167–181
perfect aspect 81–88, 117–118, 187, W
194–196 would 30–44, 77–87
In the series STUDIES IN CORPUS LINGUISTICS (SCL) the following titles have been
published thus far:
1. PEARSON, Jennifer: Terms in Context. 1998.
2. PARTINGTON, Alan: Patterns and Meanings. Using corpora for English language re-
search and teaching. 1998.
3. BOTLEY, Simon and Anthony Mark McENERY (eds.): Corpus-based and Computa-
tional Approaches to Discourse Anaphora. 2000.
4. HUNSTON, Susan and Gill FRANCIS: Pattern Grammar. A corpus-driven approach to
the lexical grammar of English. 2000.
5. GHADESSY, Mohsen, Alex HENRY and Robert L. ROSEBERRY (eds.): Small Corpus
Studies and ELT. Theory and practice. 2001.
6. TOGNINI-BONELLI, Elena: Corpus Linguistics at Work. 2001.
7. ALTENBERG, Bengt and Sylviane GRANGER (eds.): Lexis in Contrast. Corpus-based
approaches. 2002.
8. STENSTRÖM, Anna-Brita, Gisle ANDERSEN and Ingrid Kristine HASUND: Trends in
Teenage Talk. Corpus compilation, analysis and findings. 2002.
9. REPPEN, Randi, Susan M. FITZMAURICE and Douglas BIBER (eds.): TUsing Corpora
to Explore Linguistic Variation. 2002.
10. AIJMER, Karin: English Discourse Particles. Evidence from a corpus. 2002.
11. BARNBROOK, Geoff: Defining Language. A local grammar of definition sentences. 2002.