BERGER-testing Speaking Developing A Rating Sca
BERGER-testing Speaking Developing A Rating Sca
CONTENTS
LETTER FROM THE EDITORS ................................................................... 1
ARMIN BERGER
Testing speaking: developing a rating scale for advanced learners of
English ................................................................................................ 3
GUNTHER KALTENBÖCK
Is that a filler? On complementizer use in spoken object clauses ...... 28
RUTH OSIMK
Decoding sounds: an experimental approach to intelligibility in ELF . 64
Impressum ........................................................................................ 92
THE EDITORS
18(1) 3
1. Introduction
Information about a student‟s L2 speaking ability is not only useful but often
necessary in many situations. Without this information it may be difficult to
see how rational educational decisions such as, for instance, planning a
speaking lesson, placing students in ability groups or measuring achievement,
can be reached. Assessing speaking is a challenge, however, because it
involves a rater making judgements about a person‟s speaking performance.
Unlike reading or listening skills, which can be assessed by discrete items
scored dichotomously as correct or incorrect, speaking skills are usually
assessed in a communicative situation, in which an extended sample of speech
is elicited from the test taker and judged by one or more raters. It is easy to
see how factors other than the test taker‟s speaking ability can influence the
judgements. However, the problem of subjectivity in the rating process can be
minimized by establishing a clear rating procedure and a framework for
making judgements. Depending on the purpose, such a framework may take
the form of an analytic rating scale. This article outlines the design process of
such a rating scale intended for the assessment of spoken English at tertiary
level.
With the present article I address the issue of assessing advanced oral
communication skills at the Department of English and American Studies at
the University of Vienna, thereby satisfying the need to establish clear exam
specifications. At the same time I aim to narrow the gap between the well-
established assessment procedures for writing and reading skills at this
department1 and the assessment of speaking, which has generally received
2. Institutional background
The Klagenfurt Language Testing Centre‟s website states that “[a]lthough
language competence is being assessed in Austria at secondary and tertiary
level in the educational system, professionalism in the current practice is
largely missing” (LTC 2009). Indeed, language testing has yet a long way to
go in Austria. Until recently, testing in the language programmes for students
majoring in English at Austrian English departments has been a largely
independent and isolated endeavour of each individual teacher. Although
there are standardized language programmes with common course curricula,
lecturers generally design their own instruments to test students‟ achievement
of specific course-related objectives. Test content, format as well as
assessment criteria vary among teachers, who seem to rely almost exclusively
on their own testing experience. At best, there are some common guidelines
regarding examination procedures such as a double marking policy, but
specifications for these examinations rarely exist. The test constructs are, if at
all, vaguely defined, the tasks do not always elicit the required information to
18(1) 5
make sound inferences about a student‟s ability, and rating scales are often
poorly constructed. That is, current testing practice at Austrian schools and
university departments lacks validity in many respects.
With a general movement towards more transparency in educational
systems, the demand for international comparability as to language
proficiency and the resulting advent of the Common European Framework of
Reference for Languages (CEFR), the testing scene has begun to change over
the past decade. The ideas and resources set out in the CEFR have sensitized
teachers, course designers, curriculum developers, and language testers to the
lack of professionalism in the field of language testing and the critical need
for action. Remedial measures in Austria include, for example, the foundation
of the Language Testing Centre at Klagenfurt University and its activities,
notably the Austrian University English Language Teaching and Testing
(ELTT) initiative, which promotes concerted action to professionalize
language assessment and certification practices at Austrian university English
departments.2 With such measures language testing ceases to be an isolated,
solitary activity of individual teachers but becomes a group endeavour, in
which language teachers cooperate.
Structural changes have also led to a growing demand for professionalism
in language testing. Many curricula at Austrian universities have already been
converted into separate bachelor and master programmes. Such major
restructuring of the system raises the question of what a BA graduate in
English language and literature should be able to know and do in terms of
language competence. While there has been some work to answer this
question with respect to listening, reading and writing skills,3 the nature of
speaking ability at tertiary level and the question of measuring it have yet to
be addressed in theory and practice. The present study can be seen in this
light. It is part of an ongoing process to consider issues surrounding the nature
of speaking and testing a foreign language at tertiary level, attempting to fill
some serious gaps in language testing practice with respect to validity and
thus contributing to professionalism at Austrian language departments.
2 Cf. http://www.uni-klu.ac.at/ltc/inhalt/430.htm.
3 For example, the Austrian University ELTT Group, a working group consisting of applied linguists and
university language teachers of the Universities of Graz, Klagenfurt, Salzburg and Vienna, have
established an analytic rating scale for Writing as well as a set of benchmarked performances.
6 VIEWS
3. Assessing spoken language
The ability to speak proficiently in a foreign language and to perform
different tasks for various purposes in a number of communicative situations
is highly valued; yet speaking is the skill that has long been neglected in
language testing research and practice. Consequently, the theory and practice
of testing speaking in a second language is the youngest sub-field of language
testing (Fulcher 2003: 1). There are several reasons for which Lado‟s (1961:
239) observation that “testing the ability to speak a foreign language is
perhaps the least developed and the least practised in the language testing
field” still holds true today. Firstly, speaking is a language skill difficult to
assess reliably. In the test situation spoken discourse elicited by some test task
is heard by a human judge who then refers to a rating scale in order to select a
score that represents the candidate‟s ability. It is easy to see that such
performance testing brings with it “potential variability in tasks and rater
judgements, as sources of measurement error” (Bachman et al. 1995: 239).
Research into rater performance investigating such variability was carried out,
for example, by McNamara (1996), McNamara and Lumley (1997),
O‟Sullivan (2000) and Wigglesworth (1993). Secondly, construct-irrelevant
facets might have an impact on the candidate‟s speaking performance and
scores to a greater extent than in test situations assessing other skills. The
nature of the interaction, the test methods, the topics, the interlocutor effect,
and test taker characteristics account for some of the variability in speaking
test scores (Berry 2007; Brindley 1991; Brown 2003, 2005; Kunnan 1995;
O‟Sullivan 2006; Shohamy 1988, 1994). Thirdly, many difficulties of
assessing speaking boil down to the question „What is speaking?‟. Lado
(1961: 239) argued that speaking was neglected because of “a clear lack of
understanding of what constitutes speaking ability or oral production”. What
exactly does it mean to be able to speak, what exactly is being measured in
speaking tests, or how can the construct of speaking be defined? Indeed, much
of the research in language testing is concerned with the ongoing challenge of
construct definition. In addition to theoretical issues, complex logistics and
practical constraints make tests of spoken language more difficult to
administer and research than tests of other skills. In summary, one might
argue with Fulcher (1997: 75) that speaking tests are particularly problematic
in terms of reliability, validity, practicality and generalisability.
All these difficulties have led Hughes (2002: 75) to raise the “question of
the extent to which the characteristics of natural spoken discourse can ever
lend themselves to existing assessment paradigms”. There seem to be so many
complexities and competing factors influencing the test scores that the
assessment of speaking has tended to focus on the more quantifiable aspects
18(1) 7
of speaking such as pronunciation or the number of grammatical errors. The
question then arises whether the test construct adequately reflects the nature
of oral proficiency or whether the test is still a test of speaking rather than a
test of more general language proficiency measured by structural complexity
and accuracy. Asked differently, is „speaking‟ still the real focus of the test?
Issues surrounding the nature of oral proficiency, questions about how to
best elicit it, and attempts to find effective ways to optimise the evaluation of
oral performances have motivated much research in this area. While many
aspects of testing speaking remain obscure, “it is important to recognise the
great improvements in the area that have been made over the last few
decades” (O‟Sullivan 2008: 1). Despite the problems surrounding the testing
of speaking, there seems to be agreement that there are ways of overcoming
or at least addressing some of these problems by careful development of the
testing procedures, including the careful construction of the tasks to elicit and
the tools to evaluate speech as well as continuous training of raters to ensure
the quality of their ratings. The following sections will describe in some more
detail the development process of a rating scale used to assess the speaking
ability of students after a two-semester speaking course at the Department of
English at the University of Vienna.
5 According to Alderson (1991) and Pollitt and Murray (1996), other purposes of rating scales include user-
orientation, with a reporting function, constructor-orientation guiding the construction of tests at
appropriate levels, and diagnosis-orientation for feedback purposes. Alderson argues that these different
purposes should not be confused, because one rating scale is rarely appropriate for several functions.
Therefore, it is important to determine the primary aim of the rating scale and develop it according to its
10 VIEWS
The PPOCS 2 speaking scale can be seen as resting on three major
assumptions. The first one classifies the scale as an ability-based scale and
refers to the notion that speaking ability is not a single unitary ability, but
consists of multiple components. Anyone who wants to speak a second
language must be able to use some of the grammar and vocabulary of the
language, and master its specific sound system. Learners must conceptualize,
formulate, articulate, monitor and, if necessary, repair their speech. They need
to be able to speak with some degree of accuracy and fluency if they want
their utterances to be considered acceptable. Furthermore, in interactive
activities the language user is both listener and speaker so as to negotiate
meaning and construct discourse conjointly. During interaction, reception and
production strategies are constantly employed. Yet another class of abilities
concerns the use of various cognitive and collaborative strategies to manage
co-operation and interaction. All these components make speaking a complex
multi-ability matter. A major issue of the scale development process was to
address the question of which of these abilities were to be included in the
construct definition that forms the basis of the scale. The second assumption,
which follows from the first, refers to the analytic nature of the scale. The
different components of speaking ability require separate analytic ratings for
each of the specific components in the construct definition as opposed to one
overall score of a holistic scale. In other words, an analytic rating scale
contains a number of criteria, each of which has descriptors at the different
levels of the scale. The third assumption is that the scale is criterion-
referenced. The scale is defined operationally in terms of criterion levels of
ability. Whereas norm-referenced assessment ranks test takers in relation to
their peers, criterion-referencing assesses the learners purely in terms of their
ability, irrespective of other test takers. Such criterion-reference scales allow
the tester to make inferences about a learner‟s ability, and not just the quality
of an individual‟s performance relative to other individuals. Summing up, all
these assumptions qualify the PPOCS 2 speaking scale as a criterion-
referenced ability-based analytic scale.
6. Validity
Such scales offering descriptions of a learner‟s proficiency at successive
levels of ability have become very popular in language testing. Raters quickly
embrace these scales and learn to use them quite successfully for their specific
specific context, rather than adopt available rating scales designed for some other purpose in some other
context.
18(1) 11
purposes. However, just because a scale is used with some efficiency does not
automatically mean that the inferences drawn from that scale are valid –
“there is no guarantee that the description of proficiency offered in a scale is
accurate, valid or balanced” (North and Schneider 1998: 219). Since the
effective use of a scale does not necessarily entail validity, the need for
validity studies is well established and recognized (Butler and Stevens 1998;
Matthews 1990; McKay 2000; McNamara 1996; Shohamy 1995). By
providing validity evidence, testers can make sure that the inferences drawn
from the scale offer an accurate picture of the underlying abilities or
constructs they want to measure.
However, just as undisputed as the need for validation is, as diverse are
the interpretations of the concept of validity. Earlier notions of validity were
concerned with the question of “whether a test really measures what it
purports to measure” (Kelley 1927: 14; cf. also Cronbach 1971; Henning
1987; Lado 1961). From this perspective, validity is regarded as a
characteristic of the actual test. While some writers find such a general
approach still useful (Davies 1990, Hatch and Lazaraton 1997), Messick
(1989, 1996) argues that the traditional conception of validity is incomplete
especially because it does not take into account evidence of the implications
of score meaning or the social consequences of score use. Validity is not a
quality of tests or test scores, but a quality of interpretations and uses of
assessment results. Instead of speaking of the validity of a particular test or of
the scores of a particular test it is more accurate to speak of the validity of the
uses of a test score, or of test scores as valid indications of a specific ability.
Messick sees validity as multifaceted and calls for different types of evidence
to substantiate any inferences drawn from the scores on a test:
Validity is broadly defined as nothing less than an evaluative summary of both the
evidence for and the actual – as well as the potential – consequences of score
interpretation and use (i.e., construct validity conceived comprehensively). This
comprehensive view of validity integrates considerations of content, criteria and
consequences into a comprehensive framework for empirically testing rational
hypotheses about score meaning and utility. (Messick 1995: 742)
For the purposes of this study, a restricted definition of validity is used.
According to this definition, there is evidence for the construct validity of the
variable in question, when Rasch analysis as detailed below shows little
misfit. Such an understanding of validity, which will be explicated in section
eight, is common in the context of Rasch analysis (Tyndall and Kenyon
1996).
Extensive validation studies for the PPOCS 2 scale have yet to be carried
out. The following sections summarize a few preliminary measures taken to
12 VIEWS
look for minimum validity evidence. At a minimum level, all test
development activities need to be documented. This documentation, however,
would not withstand scrutiny without further analysis of the scores obtained
from the rating scale. The remaining sections of this report provide this
documentation and data analysis for the PPOCS 2 scale in search of validity
evidence.
6 I would like to thank Thomas Martinek and all other members of the PPOCS lecturer team, Harriet
Anderson, Meta Gartner-Schwarz, Katharina Jurovsky, Gunther Kaltenböck, Sophie-Francis Kidd, Amy
Krois-Lindner, Christina Laurer, Karin Richter, and Andreas Weißenbäck, for their valuable contributions
to the scale development process.
18(1) 13
interaction with spontaneous, short turns. In a series of meetings the PPOCS
lecturer team compiled and discussed various components characteristic of
speaking proficiency at the intended level, which were then reorganized under
the following three labels: lexico-grammatical resources, fluency, and
delivery. Whereas these three criteria are used to assess both the presentations
and the interactions, two more criteria, one labelled relevance, development
and organisation of ideas and the other one interaction, refer to presentations
and interactions, respectively. Eventually, the first intuitive phase resulted in
two sets of analytic rating criteria, each of which consists of four categories,
which in turn are comprised of five level bands. While the first draft version
contained four bands, it was later agreed to add one more level to the scale,
mainly because teachers in Austria are used to a five-point grading system
and computing an overall grade between one and five would be more
straightforward. Moreover, the PPOCS team agreed that the targeted level
would range between C1 and C2 of the CEFR. Thus, the rating scale was
designed to stretch the CEFR levels C1 and C2 into four pass levels (C2, C2-,
C1+ , C1), whereas band five would indicate a level below C1 and failure in
the PPOCS 2 exam. The two extreme points of the pass range, i.e. bands C2
and C1, were put into words; the two middle bands of the scale and band five
indicating non-pass were left unworded (North 2003: 83).
A smaller team of PPOCS lecturers then developed a first draft of the
wording of the descriptors. The starting point was to consider the main
contents of the course syllabus and descriptors of existing scales, most
notably the CEFR speaking scales. The most relevant CEFR descriptors used
to formulate the PPOCS scale were taken from the bank of „illustrative
descriptors‟7. In addition, other holistic as well as analytic rating scales
informed the wording process.8 The information obtained from the CEFR,
other scales, the course syllabus, and the group discussions on rating criteria
was distilled into draft descriptors, which were subsequently submitted to the
lecturer team for a process referred to as vetting. Some preliminary decisions
needed to be taken as to which descriptors should go forward for detailed
7 These include the „Common Reference Level: Qualitative Aspects of Spoken Language Use‟, „Overall Oral
Production‟, „Sustained Monologue‟, „Addressing Audiences‟, „Overall Spoken Interaction‟, „Formal
Discussions and Meetings‟, „Vocabulary Range‟, „Vocabulary Control‟, „Grammatical Accuracy‟, and
„Phonological Control‟.
8 The Revised ACTFL Guidelines 1999, the Test of Spoken English scale and the analytic rating scale of the
Examinations Reform Teacher Support Project of the British Council Hungary were used for
consultation.
14 VIEWS
editing and trialling, and which ones should be revised or redrafted
immediately.
The following phase of trialling involved trying out the draft scale on a
representative sample of the test-taking group to gather information about the
usefulness of the scale. The presentation scale and the interaction scale were
trialled separately on two different occasions. A number of students were
asked to volunteer in mock exams, which were videotaped and rated
independently. Since the number of ratings obtained in these trial runs was
too small to carry out any quantitative analyses, it was the feedback from
raters that provided the most relevant information about the usefulness of the
scales at this stage. This feedback was used in a further attempt at honing the
descriptors, and final adjustments and corrections were made in the light of it.
The result of this phase was a final set of descriptors, which, by way of
illustration, stipulate that candidates at the C2 level “show great flexibility
formulating ideas in differing linguistic forms to convey finer shades of
meaning precisely” (lexico-grammatical resources), “elaborate all salient
points in the prompt in adequate detail with examples and ideas of relevance”
(relevance, development and organisation of ideas), and “contribute ideas of
relevance to the joint discourse and display great flexibility in responding to
others, e.g. by framing the issue, establishing a line of approach, proposing
and evaluating, recapping, summarising, etc.” (interaction).
9 The interested reader is referred to Baker (1997), who provides an accessible introduction to IRT and
points out the advantages over Classical Test Theory (CTT).
10 See McNamara (1996) for more on multi-faceted Rasch measurement.
11 All performances were video-recorded for rater training and research purposes.
16 VIEWS
analytic rating criteria? That is, do raters use all level bands for the test
population? b) How well do the facets examined in this analysis fit into a
multi-faceted Rasch model of speaking performance? Asked differently, can
the band level descriptors be distinguished adequately and thus performances
rated systematically? This question is particularly interesting since the middle
bands, C2- and C1+ , were left unworded.
The most general Rasch analysis output, conventionally referred to as the
„facet map‟, „all-facet vertical summary‟ or „vertical ruler‟ (Linacre 2008) is
given in Figure 1 below. It compares estimates of rater severity, person
ability, difficulty of discourse type, and rating criteria difficulty on one scale.
That is, the facet map shows rater harshness in terms of the probability of the
rater awarding a given score to a test taker at a given ability. Similarly, the
map displays the ability of candidates in terms of the probability of their
being awarded a given score, considering what is known about the severity of
the rater and the difficulty of the discourse type and rating criteria. The more
able candidates are placed at the top end of the „ruler‟ whereas the less able
are positioned at the bottom, i.e. candidates 22 and 32 are the most able and
candidates 6 and 8 are the least able in this analysis. Moreover, the map
displays rating category difficulty in terms of the probability of a candidate of
a given ability receiving a given score from a rater of a given severity. The
most difficult category appears towards the top, i.e. relevance, development
and organisation of ideas in this case. The five columns on the right display
difficulty estimates of all scale steps in each rating criterion. For example,
candidate 15 is likely to be assigned C2- for lexico-grammatical resources,
fluency and delivery and C2 for relevance, development and organisation of
ideas. The measure for all probability estimates is „logit‟ (log odds unit).12
12 Figure 1 is a visual representation of the relative harshness of raters, the relative abilities of candidates,
and the relative difficulty of discourse type as well as rating criteria. There are ten columns in this figure:
one for the scale of measurement used, and one for each of the facets raters, candidates, discourse type,
and rating criteria. Within the rating criteria facet, difficulty estimates of all scale levels are given in
more detail on the right side of the facet map: alphanumeric strings representing level steps are positioned
at integer expected scores; dashed lines (---) are positioned at expected half-score points. Raters are
identified by alphabetic strings, candidates by their ID numbers. The all-facet vertical summary acts as a
„ruler‟ that enables us to locate and compare the facet estimates.
18(1) 17
+-----------------------------------------------------------------------------------------+
|Measr|+Raters |-Candidates|+Discourse type|+Rating criteria|lexgr|fluen|deliv|relor|inter|
|-----+--------+-----------+---------------+----------------+-----+-----+-----+-----+-----|
| 7 + + 22 32 + + + + + + + |
| | | | | | C2| C2| C2| C2| C2|
| | | | | | | | | | |
| | | 7 23 | | | | | | | |
| 6 + + + + + + + + + |
| | | | | | | | | | |
| | | | | | | | | | |
| | | 10 | | | | | | | |
| 5 + + 26 + + + + --- + + + |
| | | 18 | | | | | --- | | |
| | | 30 36 | | | --- | | | | |
| | | 15 24 34 | | | | | | | --- |
| 4 + + + + + + + + + |
| | | 17 | | | | | | --- | |
| | | | | | | | | | |
| | | 28 | | | | | | | |
| 3 + + + + + + + + + |
| | | 12 33 | | | C2-| C2-| | | |
| | | | | | | | C2-| | C2-|
| | | 27 29 | | | | | | C2-| |
| 2 + + 13 14 + + + + + + + |
| | | 21 25 35 | | | | | | | |
| | | 2 | | | --- | | | | --- |
| | | 3 31 | | | | --- | | --- | |
| 1 + + + + + + + --- + + |
| | | | | | | | | | |
| | | 5 | | relorg | | | | | |
| | | 4 19 | | deliv lexgr | | | | | C1+|
* 0 * AA BB * * inter pres * * * * * * *
| | | 16 | | | C1+| | | | |
| | | 9 | | fluen inter | | C1+| C1+| C1+| |
| | | | | | | | | | --- |
| -1 + + + + + + + + + |
| | | | | | | | | | |
| | | 1 | | | --- | | --- | --- | |
| | | 11 20 | | | | --- | | | |
| -2 + + + + + + + + + |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | C1| |
| -3 + + + + + + + C1+ + C1|
| | | | | | C1| C1| | | |
| | | 6 8 | | | | | | --- | |
| | | | | | | | | | |
| -4 + + + + +below+below+below+below+below|
|-----+--------+-----------+---------------+----------------+-----+-----+-----+-----+-----|
|Measr|+Raters |-Candidates|+Discourse type|+Rating criteria|lexgr|fluen|deliv|relor|inter|
+-----------------------------------------------------------------------------------------+
13 Since the estimates of rater severity, candidate ability and item difficulty are extrapolations from the data
available, they are subject to error. Accordingly, estimates of the likely error are provided for each
measure. Ideally, the size of error should be small.
14 There are no hard-and-fast rules for the interpretation of fit statistics. According to McNamara
(1996:173), values in the range of 0.75 to 1.3 are generally acceptable. Linacre (2008:191) suggests that a
range from 0.5-1.5 is productive for measurement.
18(1) 19
is advisable to go back to the video-taped responses of the candidates and find
an explanation for the ones causing the disturbance. McNamara (1996) points
out that explanations should be considered in terms of failure of mastery of a
particular area (diagnostic feedback), failure of attention in the test-taking
situation, anxiety and the like. In general, misfitting responses suggest that the
individual‟s abilities are not being measured appropriately by this particular
test instrument.15
15 A re-examination of the video-taped performances is recommended, but goes beyond the scope of this
analysis.
16 The separation value is 5.44, which means that the candidates can be separated into five levels. The
person reliability is .97, which is the Rasch equivalent to the KR-20 Alpha statistic.
20 VIEWS
Rating criteria Measure S.E. Infit MnSq
(difficulty)
Relevance, development, organisation of ideas .51 .20 1.49
Delivery .37 .15 .83
Lexico-grammatical resources .13 .15 .88
Fluency -.44 .16 .82
Interaction -.57 .24 1.42
Mean (Count: 5) .00 .18 1.09
S.D. .43 .03 .30
9. Discussion
Although the scope of this study was somewhat limited in terms of sample
size, and more observations, particularly ratings from different raters, would
be needed to produce estimates of greater stability, the results of the multi-
facet Rasch analysis have shed some light on how the new PPOCS 2 rating
scale functions. The research questions posed earlier in this report addressed
(a) the effectiveness of the analytic rating scale and (b) the degree of model fit
of the facets. From the results it seems that most level steps in the five
categories function well. Generally, all level steps are associated with a
progression of candidate ability, and the raters used all level bands for the test
population. Band five (below C1) was used as well, albeit infrequently, which
was expected since most candidates taking PPOCS 2 can be assumed to have
achieved a proficiency level beyond C1 according to the CEFR.
Category statistics show good step calibrations for most rating criteria.
The scale for relevance, development and organisation of ideas, however,
might need further investigation, because the steps of this scale are not
consistent with those of other scales. A horizontal comparison of the scales
presented in the all facet summary in Figure 1 shows that the distance
between threshold estimates of band 2 and the neighbouring band 3 is rather
small. The step calibrations increase only by 1.18 logits, which is
considerably shorter than the distances between other threshold estimates and
not in accordance with the recommended guideline of a minimum increase of
1.4 logits. In other words, the descriptor of band C1 for relevance,
22 VIEWS
development and organisation of ideas is more demanding than descriptors of
other categories. It might be worth considering why this is the case. For
example, the two raters might have interpreted this category as more
demanding than necessary. One possible remedy would be to find explicit
descriptors for the in-between category steps C1+ and C2- for greater clarity.
However, these results should be interpreted cautiously since there is a
standard error of the step calibration of .4 and more observations would be
needed for more stable estimates.
While the difficulty level of the relevance, development and organisation
of ideas criterion may warrant further investigation, the fit statistics of each
facet suggest no problematic misfit for raters and the five rating criteria. Too
many candidates, however, show person misfit. This may indicate that their
abilities are not being measured appropriately. Closer inspection of these
candidates‟ responses might be needed to find explanations for the
disturbance.
In summary, it can be said that each analytic criterion generally seems to
function well, defining distinct points on the variable and making meaningful
steps in progression. Although the central level bands C2- and C1+ were left
unworded, it seems that raters were able to distinguish meaningfully between
them. This is an indication of the possibility that the CEFR levels C1 and C2
for speaking can be further divided into more subtle yet distinguishable levels.
Research will have to show whether greater explicitness and exact wordings
of the in-between levels would change the results significantly.
10. Conclusion
The present study aimed to describe the four-stage development process of
the new PPOCS 2 analytic rating scales and investigate the data obtained from
the first live administration in February 2009, thereby gathering validity
evidence for the scales. Firstly, rating criteria were intuitively selected and
drafted into two analytic five-point rating scales, one for formal presentations
and one for interaction. Secondly, the draft descriptors were refined according
to the feedback obtained from a number of informants. After trial runs with a
representative student sample in the third stage, the scales were used live and
the data obtained from this administration was analysed quantitatively. On the
condition that validity of a scoring procedure is defined fairly narrowly as
showing good model fit, the multi-faceted Rasch analysis has offered some
validity evidence. The analysis demonstrated appropriate reliability of the
rating scale, which can be seen as a suitable instrument to test speaking in this
context.
18(1) 23
What follows from Messick‟s approach, however, is that validity is “an
evolving property and the validation is a continuing process” (Messick 1989:
13) so that the development of this rating instrument should not end at this
point. Considering scale validation as an ongoing endeavour, it becomes clear
that this study can only be a small part of the scale validation process. It must
be acknowledged that many aspects of validity, including theory-based or
consequential features, are under-represented in this study and that this falls
short of Messick‟s unified concept of validity. Consequently, the results
should be interpreted cautiously in the light of their limitations. Even if the
findings indicate that the inferences from the scales about speaking
proficiency are valid enough and that the scales can be used as a framework
for language assessment, further investigation is needed.
In fact, one of the major cautions that need to be understood refers to the
limited scope of this study. The validity evidence presented here is based on
only one major source of information – PPOCS 2 teachers and their
interpretations of the scales. That is, other potentially relevant informants
such as students were ignored. It goes without saying that the number of
observations included in this analysis was limited and so the study should be
replicated, involving more candidates and, in particular, more ratings from
different raters. As a matter of fact, the video-recorded performances should
be rated by all other PPOCS lecturers and these ratings should be included in
the analysis. A larger number of raters would clearly produce estimates of
greater stability. Besides, it would be intriguing to see how raters not directly
involved in the design process interpret the rating scale categories. The fact
that the two raters here were native-speakers of English also raises the
question of whether ratings by non-native speakers of English would yield
different results.
Not only the limited number of observations included in this study but
also methodological constraints warrant further research. Although the scale
development process included intuitive, qualitative and quantitative elements,
more systematic triangulation of methods would be desirable to produce
stronger results. Weir and Roberts (1994) advocate such triangulation of
methods where possible. For instance, qualitative methods such as concurrent
verbal protocols that focus on the raters‟ perception of the descriptors while
rating performances or retrospective feedback questionnaires may provide
further insights into the validity of the assessment process. Even more
importantly, however, the scale needs an empirical underpinning to show that
the descriptors of each rating category really match the candidates‟
performance on a given level. This approach requires a discourse-based
analysis of performance and the description of key features of that
24 VIEWS
performance. In this way any significant mismatch between the level band
descriptors and representative samples of performances can be discovered.
The advantage of this approach is that the present rating scale can be revised
and refined by very concrete descriptions based on data.
Much as further research is required and desired, this study is a major step
towards more professional testing at Austrian language departments. Rarely
before has the development of a testing instrument at the Department of
English and American Studies at the University of Vienna received this much
attention, let alone psychometric analysis. It is hoped that this study has
produced some valuable results that serve as a good starting point for further
scale modifications. As explained above, the development of a testing
instrument is an ongoing process that is cyclical and iterative in nature and
requires continuous re-assessment. The findings presented here help to see
what improvements need to be made to the rating scale or the administrative
processes surrounding it.
In terms of language teaching, it is hoped that this study leads to a better
understanding of what is involved in speaking, teaching and testing a foreign
language at tertiary level, which in turn might result in a reconsideration of
current instructional practices. Indeed, a better understanding of the nature of
oral language ability and a clear idea of the construct to be measured can help
teachers guide and redirect their teaching, bringing essential communication
skills into sharper focus and enabling to give more specific feedback on the
learning progress. While students are awarded potentially fairer ratings for
their performances, teachers can have more confidence in their testing
practice and arrive at more informed judgements about their students‟
abilities. Overall, teachers and students alike might find language testing,
which is an inherent part of learning and teaching, more rewarding, since
greater accountability for decisions about individuals based on test results
leads to more accuracy and fairness. Ultimately, it is hoped that this study can
more deeply involve classroom testing practice in issues of language testing
and thus contribute to professionalism in this field.
18(1) 25
References
Alderson, Charles. 1991. “Bands and scores”. In Alderson, Charles; North, Brian (eds.).
Language testing in the 1990s: the communicative legacy. London: Macmillan, 71-94.
Bachman, Lyle. 1990. Fundamental considerations in language testing. Oxford: OUP.
Bachman, Lyle; Lynch, Brian K.; Mason, Maureen. 1995. “Investigating variability in
tasks and rater judgements in a performance test of foreign language speaking”. Language
Testing 12, 238-252.
Bachman, Lyle; Palmer, Adrian. 1996. Language testing in practice: designing and
developing useful language tests. Oxford: OUP.
Baker, Rosemary. 1997. Classical test theory and item response theory in test analysis.
LTU Special Report No 2. Lancaster: Centre for Research in Language Education,
Lancaster University.
Berry, Vivien. 2007. Personality differences and oral test performance. Frankfurt: Peter
Lang.
Bond, Trevor G.; Fox, Christine M. 2007. Applying the Rasch model: fundamental
measurement in the human sciences. London: Lawrence Erlbaum Associates.
Brindley, Geoff. 1991. “Defining language ability: the criteria for criteria”. In Anivan,
Sarinee (ed.). Current developments in language testing. Singapore: Regional Language
Centre, 139-164.
Brown, Annie. 2003. “Interviewer variation and the co-construction of speaking
proficiency”. Language Testing 20, 1-25.
Brown, Annie. 2005. Interviewer variability in oral proficiency interviews. Frankfurt: Peter
Lang.
Butler, Frances; Stevens, Robin. 1998. Initial steps in the validation of the second
language proficiency descriptors for public high schools, colleges, and universities in
California: writing. Los Angeles, CA: National Center for Research on Evaluation,
Standards, and Student Testing.
CEFR = Council of Europe. 2001. Common European framework of reference for
languages: learning, teaching, assessment. Cambridge: CUP. Available online at:
http://www.coe.int/t/dg4/linguistic/Source/Framework_EN.pdf.
Cronbach, Lee J. 1971. “Test validation”. In Thorndike, Robert L. (ed.). Educational
measurement. Washington, D.C.: American Council on Education, 443-507.
Davies, Alan. 1990. Principles of language testing. Oxford: Blackwell.
Davies, Alan; Brown, Annie; Elder, Cathie; Hill, Kathryn; Lumley, Tom; McNamara, Tim.
1999. Dictionary of language testing. Cambridge: CUP.
Fulcher, Glenn. 1993. “The construction and validation of rating scales for oral tests in
English as a foreign language”. PhD thesis, Department of Linguistics and English
Language, Lancaster University.
Fulcher, Glenn. 1997. “The testing of L2 speaking”. In Clapham, Caroline; Corson, David
(eds.). Language testing and assessment (Vol. 7). Dordrecht: Kluwer Academic Publishers,
75-85.
Fulcher, Glenn. 2003. Testing second language speaking. London: Pearson Longman.
Hatch, Evelyn; Lazaraton, Anne. 1997. The research manual: design and statistics for
applied linguistics. Boston: Heinle and Heinle.
26 VIEWS
Henning, Grant. 1987. A guide to language testing: development, evaluation, research.
New York: Newbury House.
Hughes, Rebecca. 2002. Teaching and researching speaking. London: Longman.
Kelley, Truman L. 1927. Interpretation of educational measurement. New York: Macmillan.
Kunnan, Anthony J. 1995. Test taker characteristics and test performance: a structural
modelling approach. Cambridge: CUP.
Lado, Robert. 1961. Language testing. London: Longman.
Linacre, Mike. 2008. A user’s guide to FACETS: Rasch model computer programs.
Chicago, IL: MESA Press.
LTC = Language Testing Centre/Sprachtestcentrum, Universität Klagenfurt. 2009. “About
the LTC”. Available online at http://www.uni-klu.ac.at/ltc/inhalt/145.htm. (10 June 2009).
Matthews, Margaret. 1990. “The measurement of productive skills: doubts concerning the
assessment criteria of certain public examinations”. English Language Teaching Journal
44, 117-121.
McKay, Penny. 2000. “On ESL standards for school-age learners”. Language Testing 17,
185-214.
McNamara, Tim. 1996. Measuring second language performance. London: Longman.
McNamara, Tim; Lumley, Tom. 1997. “The effect of interlocutor and assessment mode
variables in overseas assessments of speaking skills in occupational settings”. Language
Testing 14, 140-156.
Messick, Samuel. 1989. “Validity”. In Linn, Robert (ed.). Educational measurement (3rd
ed.). New York: Macmillan, 13-103.
Messick, Samuel. 1995. “Validity of psychological assessment: validation of inferences
from persons‟ responses and performances as scientific inquiry into score meaning”.
American Psychologist 50, 741-749.
Messick, Samuel. 1996. “Validity and washback in language testing”. Language Testing
13, 241-256.
Milanovic, Michael; Saville, Nick; Pollitt, Alastair; Cook, Annette. 1996. “Developing
ratings scales for CASE: theoretical concerns and analyses”. In Cumming, Alister;
Berwick, Richard (eds.). Validation in language testing. Clevedon: Multilingual Matters,
15-38.
North, Brian. 1995. “The development of a common framework scale of descriptors of
language proficiency based on a theory of measurement”. System 23, 445-465.
North, Brian. 2000. The development of a common framework scale of language
proficiency. New York: Peter Lang.
North, Brian. 2003. “Relating assessments, examinations, and courses to the CEF”. In Morrow,
Keith (ed.). Insights from the Common European Framework. Oxford: OUP, 77-90.
North, Brian; Schneider, Günther. 1998. “Scaling descriptors for language proficiency
scales”. Language Testing 10, 217-262.
O‟Sullivan, Barry. 2000. “Exploring gender and oral proficiency interview performance”.
System 28, 373-386.
O‟Sullivan, Barry. 2006. Modelling performance in oral language tests: language testing
and evaluation. Frankfurt: Peter Lang.
18(1) 27
O‟Sullivan, Barry. 2008. “Notes on assessing speaking”. Cornell University Language
Research Centre. http://www.lrc.cornell.edu/events/past/2008-2009/papers08/osull1.pdf
(15 May 2009).
Pollitt, Alastair; Murray, Neil L. 1996. “What raters really pay attention to”. In Milanovic,
Michael; Saville, Nick (eds.). Performance testing, cognition and assessment. Cambridge:
CUP, 74-91.
Shohamy, Elana. 1988. “A proposed framework for testing the oral language of
second/foreign language learners. Studies in Second Language Acquisition 10, 165-179.
Shohamy, Elana. 1994. “The validity of direct versus semi-direct oral tests”. Language
Testing 11, 99-123.
Shohamy, Elana. 1995. “Performance assessment in language testing”. Annual Review of
Applied Linguistics, 188-211.
Stansfield, Charles; Kenyon, Dorry Mann. 1992. “Examining the validity of a scale used in
a performance assessment from many angles using the many-faceted Rasch model”.
Education Resources Information Center:
http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80
/23/f1/7d.pdf (20 April 2009).
Tyndall, Belle; Kenyon, Dorry Mann. 1996. “Validation of a new holistic rating scale
using Rasch multi-faceted analysis”. In Cumming, Alister; Berwick, Richard (eds.).
Validation in language testing. Clevedon: Multilingual Matters, 39-57.
Weir, Cyril J.; Roberts, Jon. 1994. Evaluation in ELT. Oxford: Blackwell.
Wigglesworth, Gillian. 1993. “Exploring bias analysis as a tool for improving rater
consistency in assessing oral interaction”. Language Testing 10, 305-336.
28 VIEWS
1. Introduction
This paper investigates the use of the that-complementizer in object noun
clauses following complement-taking predicates (CTPs for short) such as I
think. Typical examples are given in (1).
1 I would like to thank the VIEWS team for their constructive feedback and Lotte Sommerer for pointing out
Table 1: Complement-taking predicates in the spoken section of ICE-GB with and without
that-complementizer (raw figures in brackets)
2 In an attempt to increase semantic and syntactic homogeneity of the group and consequently comparability
of the individual predicates I have excluded those containing negation (I don’t think, I don’t know), those
expressing certainty (I’m sure, I know), and those followed by an extraposed complement (it seems, it
appears).
3 Similar results have been found e.g. by Thompson & Mulac (1991) for think in spoken American English
(91%), and Tagliamonte & Smith (2005) for I think in British dialects (91%).
18(1) 31
specific semantic content, e.g. positive expectations in the case of hope.4 I
believe and I’m afraid, however, require some further explanation.
Clause initial I believe may show different degrees of assertion depending
on the type of proposition it introduces. Compare the examples in (2).
4 In the case of I suspect the higher frequency of that may be attributed to more frequent occurrence in the
formal text type Public dialogue, although traces of historically earlier uses, e.g. „to expect with
apprehension‟ (OED s.v. suspect v., 5) or a tinge of „imagining something undesirable/being suspicious‟
(ibid., 1) cannot be excluded.
5 Of course the type of proposition also plays a part with (i) triggering more strongly a matrix clause reading:
(i) I’m afraid that John will lose his job
Crucially, however, it is the tense of the „object‟ clause that shapes the reading. Past tense, as in (ii), is
less compatible with the meaning „I fear‟ for pragmatic reasons and therefore makes such a reading less
likely.
(ii) I’m afraid that John lost his job
6 Diessel and Tomasello‟s (1999) study of the use of the that-complementizer in early child language seems
to point in a similar direction. They found that the complementizer is generally absent with evidential
32 VIEWS
the presence or absence of that may be a signal for suggesting either one or
the other interpretation. In other words, the corpus results in Table 1 can be
taken as an indication that that may be used to mark the difference between a
matrix- or a comment clause reading of the CTP, i.e. acting as a genuine
subordinator. However, the low figures for most of the predicates do not
warrant far-reaching conclusions. Moreover, this view fails to explain why an
almost fully routinised and formulaic hedge such as I think still takes a fair
amount of that-complementizers. In what follows I will therefore focus
mainly on I think, not only because of the larger number of data available for
analysis, but also, because as an „extreme case‟ of a semantically weakened
predicate, it represents a particular challenge for an explanation of the use of
the complementizer.
markers such as I think, I guess, I bet, I mean, I know. Only three verbs are commonly used with a
complementizer: say, tell and pretend. As explanation Diessel and Tomasello (1999: 96) point out that
“[t]hese three verbs have a more concrete meaning than all other verbs in our sample”.
18(1) 33
However, the same seems to be true for clauses with that, as in (5) (cf. Aijmer
1997: 8), although Hand (1993: 501) marks its acceptability as questionable.
7 I am using the term grammaticalised here, as defined for instance by Hopper & Traugott (2003) or Brinton
& Traugott (2005), although the term pragmaticalised, suggested by Erman & Kotsinas (1993), might be
more appropriate for CTP-phrases owing to their pragmatic function.
18(1) 35
subordinate clause as “one whose profile is overridden by that of the main
clause … I know she left designates the process of knowing, not of leaving”,
where “profile” refers to the “relative prominence accorded to various
substructures” (Langacker 1991: 4). Thompson (2002: 131) interprets the
notions of “profile” and “relative prominence” in terms of the interactional
actions that an utterance is performing in a particular context (cf. Goodwin &
Goodwin 1992, Linell 1998, Pomerantz & Fehr 1997, Schegloff 1990). From
the analysis of her corpus examples she concludes that “the talk doing the
actions that the participants are jointly engaged in doing is either in a main
clause turn or in a finite indicative complement” (op. cit.: 134), while “the
CTP-phrases do not constitute the speakers‟ interactional agenda” (ibid.). The
“action” or “interactional agenda” is thus roughly equivalent with the “‟issue‟
around which the talk centers” (op. cit.: 133) or, presumably, the discourse
topic. Example (10) illustrates her point with the CTP I don’t care and an if-
clause complement, which in her view is functionally equivalent to a that-
clause (= Thompson‟s ex. 13; boldface indicates the talk accomplishing the
action).
(10) [Frank and his young son Brett have noticed that Brett‟s sister Melissa appears to
be about to mark on Brett‟s art project]
1 MELISSA: are you gonna add like the little lines that jut out of [these]?
2 FRANK: [get your pen] back from that
3 BRETT: …yeah
4 MELISSA: it’s erasable,
5 and I am not marking on it.
6 BRETT: …I don‟t care if it’s erasable.
7 don’t touch it.
8 MELISSA: (HI I didn’t HI)
9 BRETT: …I know
10 …don’t
Although Thompson‟s analysis is a compelling one, her identification of
discourse prominence does not provide a satisfactory explanation for all data.
A case in point is example (10) above, where prominence could also be
analysed in terms of information structure. This would yield an entirely
different result for the construction in question: the complement clause, which
is entirely retrievable from the preceding co-text, has to be seen as
informationally backgrounded, while the CTP represents the communicatively
salient, i.e. new (irretrievable) bit of information, which contributes to the
further development of the communication (cf. e.g. Firbas‟ 1992 notion of
communicative dynamism). In contrast to Thompson, it is therefore the
assertion of the CTP I don’t care that is the main point of the utterance. This
36 VIEWS
problem has also been noted by Boye and Harder (2007: 576f), who conclude
that epistemic stance cannot automatically be equated with secondary
discourse function. In their model they consequently distinguish between
“stance-marking as an aspect of lexical meaning and stance-marking as an
inherently secondary, „parenthetical‟ discourse or usage function” (Boye &
Harder 2007: 577) (cf. below for further discussion).
It seems, however, that information structure cannot always provide a
clear-cut answer either. Compare, for instance, the following example from
Thompson (2002: 132), which contradicts a simple equation of informational
retrievability and non-assertion.8
(11) [at a birthday party, after Kevin was discovered to have lettuce on his tooth,
everyone has jokingly commented on it, and Kendra has asked for a toothpick]
WENDY: …everybody‟s getting uh, tooth obsessed
KEN: I guess we a=re.
Here the complement we are represents given (retrievable) information,
which is clearly reflected in its elliptical form. Nonetheless, the main point of
the utterance is not the CTP I guess but the complement. The reason is that
what is at issue here (the „action‟ in Thompson‟s terms) is the act of agreeing
or affirming the previous utterance, for which the CTP, owing to its semantic
vagueness, is not a suitable candidate. The example illustrates that
establishing the communicatively salient part of the construction is not always
a simple straightforward matter of equating givenness with backgrounding but
has to take into account a variety of factors: „interactional action‟, information
structure, and the semantic value of the predicate.
In spoken language, however, there is an additional means available to the
speaker to signal prominence: prosody. And it is prosodic highlighting which
in example (11) above will help to identify the complement as the main point
of the utterance rather than the CTP I guess. As noted for instance by Halliday
(1985: 277), new or „newsworthy‟ information is information that is presented
by the speaker as such. Prosodic prominence is such a means of presentation
(I will return to prosodic prominence in the subsequent section).
The examples above show that from a cognitive-functional perspective the
distinction between main- and comment clause may depend on a variety of
factors and is far from clear-cut: instead of a neat binary distinction, a
functional view suggests a gradient link between main- and comment clause.
8 Although Boye and Harder seem to argue for a view of prominence in terms of newness (cf. Boye &
Harder 2007: 576) in contradistinction to Thompson, they, somewhat surprisingly, fully accept
Thompson‟s analysis of this example.
18(1) 37
Such a scale has been proposed by Boye and Harder (2007), who identify the
following three categories, which are seen as three different stages in the
development of CTPs such as I think: (1) primary lexical CTPs, (2) secondary
lexical CTPs, and (3) secondary grammatical CTPs. This classification takes
into account both the structural and the usage status, each of which is
described by a binary set of values: lexical vs. grammatical structural status
and primary vs. secondary usage status. While the first stage is easily
identifiable as matrix clause and the last stage as (clause internal and final)
comment clause, the second stage is a hybrid category, which exhibits a
discrepancy between usage status and structural status and as such is
descriptively ambiguous (cf. Boye & Harder 2007: 586). I think in clause-
initial position seems to qualify for precisely this intermediate stage:
structurally, its morphosyntactic form is that of a lexical clause (Boye &
Harder 2007: 591) and its syntactic position that of a prototypical matrix
clause. However, in terms of its discourse function initial I think is typically
secondary.9
A similar view is presented by Nuyts (2000: 122ff), who sees epistemic
modal expressions such as I think as a “battleground” where two conflicting
functional forces are at work: an information structural force and an iconic (or
conceptual semantic) force. From the perspective of iconicity the status of the
epistemic evaluation is that of an operator (i.e. a meta-representational
element) over a state of affairs, which suggests main clause status for the
epistemic expression “since it directly reflects the meta-status of the
qualification relative to the state of affairs” (Nuyts 2000: 123). In terms of
information structure, on the other hand, the epistemic qualification is
backgrounded and the state of affairs foregrounded, i.e. it carries the focal
information. The information structural force therefore works against a main
clause interpretation for the epistemic expression, since main clauses
prototypically carry foregrounded information and embedded clauses
backgrounded information (cf. Brandt 1984, Givón 1984, Mackenzie 1984,
Sadock 1984, Tomlin 1985).
Taking up Nuyts‟ metaphor, we can think of clause-initial epistemic
markers such as I think as „undecided battles‟ where the different forces
outbalance each other and allow for different interpretations of the status of I
think. In other words, the result of these conflicting forces is one of
9 Cases where the epistemic qualification of I think is the main point of the utterance are possible but
extremely rare. Cf.
(i) A: So you‟re telling me John is in Paris
B: I THINK John is in Paris (but I‟m not sure)
38 VIEWS
neutralisation and indeterminacy in the sense of Boye and Harder‟s hybrid
category „stage 2‟. In spoken language, as pointed out above, an additional
force enters the „battleground‟ and may „tip the scales‟, viz. prosody. As an
iconic reflection of prominence (cf. Bolinger 1985) prosodic signals may be
seen as decisive factor for the interpretation of the syntactic status of the CTP.
The following section therefore takes a closer look at the prosody of initial I
think.
4. Prosodic analysis
This section provides a detailed prosodic analysis of initial I think with a view
to establishing whether its prosodic realisation suggests foregrounding and
therefore a matrix clause interpretation. This is interesting, as a matrix clause
analysis of I think would entail a classification of that as a subordinator.
Section 4.1 first analyses the prosody of I think + zero, which is then
compared in Section 4.2 with the results for I think + that.
(i) I think with an independent tone unit is exemplified in (12), which has a
nuclear accent on think and is followed by a tone unit boundary, indicated by
a change in pitch level (cf. Cruttenden 1997: 35 on boundary markers). As a
possible alternative the nucleus may also be on the pronoun I rather than on
the predicate think (cf. Simon-Vandenbergen 2000: 50; Kaltenböck 2009
forthc. for the function of such uses).
18(1) 39
(12) Yes I I think it‟s infinitely more entertaining (s1b-024-12)
300
200
100
40
Yes I I think it's infintely more entertaining
0 2.34506
Time (s)
(ii) An example of I think integrated into the head is given in (13), where
think represents the first accented syllable in the tone unit, the so-called onset
(e.g. Wells 2006: 207) but is less prominent than the nuclear accent on silly
(cf. Cruttenden 1997: 54 for a definition of head).10
10 To distinguish between heads and nuclei the following criteria were applied:
a. Onset syllables are generally on a higher pitch level than the nucleus owing to declination within a
tone unit, i.e. the fact that pitch tends to be lower at the end of a tone unit than at the beginning (e.g.
Couper-Kuhlen 1986: 82-83, Wichmann 2000: 103-105).
b. If at the beginning of a tone unit, i.e. not preceded by a pre-head, the onset will often be anacrustic,
i.e. produced with greater speed (cf. Cruttenden 1997: 32).
c. Only in case of a separate nucleus is I think followed by a tone unit boundary, as indicated by
features such as anacrusis, final syllable lengthening, change of pitch level or pitch direction of
unaccented syllables (cf. Cruttenden 1997: 35).
d. Onsets are less prominent than nuclear accents, which is reflected phonetically in a smaller range of
pitch movement and/or weaker energy pulses.
40 VIEWS
(13) I think it would be silly just to sling mud around (s1b-022-19)
250
200
150
100
50
I think it would be silly just to sling mud around
0 2.26619
Time (s)
Typically in such cases the accent will be on the predicate think as in example
(13) above. As unstressed element, I represents the pre-head but may be
suppressed altogether as in (14) (where <,> indicates a short pause).
300
250
200
150
100
45
<,> Think the turorials are helpful
0 1.77825
Time (s)
Occasionally, however, the accent occurs on the I (rather that on think), which
then starts the head and gives the I an implicit contrastive interpretation (I as
opposed to someone else), as in example (15).
18(1) 41
400
300
200
70
I think it's all jolly good fun
0 1.15
Time (s)
(iii) The third prosodic pattern is that of integration in the form of a pre-head,
i.e. an unaccented (typically unstressed and anacrustic) syllable preceding the
head (cf. Wells 2006: 214-15).11 This pattern is exemplified in (16), where
the string I think it’s forms the pre-head, followed by an accented syllable
some, which starts the head, and the nucleus on quarter.
11 The term „stress‟ is used here as rhythmically stressed, while „accent‟ refers to a syllable made prominent
by rhythmic stress and pitch prominence, i.e. by a change in pitch, movement in pitch, or the start of a
pitch movement (cf. Wells 2006: 93).
42 VIEWS
(16) I think it‟s something like a quarter (s1b-030-29)
300
250
200
150
100
50
<,> I think it's something like a quarter <,>
0 1.93388
Time (s)
For the corpus analysis only subsection Public dialogue (s1b) in ICE-GB was
taken into account, which is the only text category that has a sufficiently large
number of that-clauses (viz. 52, cf. Table A in the Appendix). The prosodic
analysis of I think + zero is based on 148 random instances (of a total of 434
in Public dialogue), which were analysed both auditorily and instrumentally
with the help of the acoustical analysis programme PRAAT (Boersma &
Weenink 2008). The results are summarised in Table 2 below.
n %
Prosodically independent 7 (of which nucleus on I: 3) 4.7%
Right-bound: part of head 112 (of which accent on I: 9) 75.7%
Right-bound: part of pre-head 29 19.6%
Total 148 100%
Table 2. Prosodic patterns of initial I think in Public dialogue followed by a zero that-
clause
We can see that the dominating pattern is that of I think being realised as part
of the head (75.7%), followed by its realisation as pre-head (19.6%). An
independent tone unit for I think is extremely rare (4.7%). This lack of nuclear
prominence is, however, not really surprising, as I think is clearly the most
grammaticalised (pragmaticalised) of all comment clauses and has therefore
been subject to a high degree of semantic bleaching (e.g. Mindt 2003). This
semantic reduction makes I think an unlikely candidate for nuclear
highlighting.
18(1) 43
In previous studies I have identified various functions of comment clauses
and I think in particular (Kaltenböck 2008, 2009 forthc.), showing that
comment clauses can be further grammaticalised from epistemic markers into
pleonastic structuring devices (cf. also Van Bogaert 2006). These uses tend to
be phonetically reduced and lack prosodic prominence. Initial I think realised
as pre-head can be equated with this structural function.
What about the remaining prosodic realisations for I think, nuclear tone
and head accent? If, as noted above, we take prosodic prominence as an
iconic reflection of the syntactic status of I think, it is tempting to correlate
prosodic prominence in the form of a separate nuclear tone (tone unit) with
matrix clause status and reduced prominence in the form of a head with
comment clause status. Such a correlation of syntactic status with prosodic
prominence would also seem to fit in with the presumed diachronic
development of comment clauses, which are seen by Thompson and Mulac
(1991) and Traugott (1995) to have started out as matrix clauses which have
grammaticalised into epistemic markers/comment clauses (and further into
discourse markers with filler function; cf. Kaltenböck 2008, 2009b). Although
Thompson and Mulac‟s matrix clause hypothesis has been dismissed by
Brinton (1996) and Fischer (2007a, b), who suggest a derivation from
adverbial clauses (cf. as I think), it may still be assumed that the starting point
was a fully lexical item, i.e. Boye and Harder‟s (2007) primary lexical CTPs
(cf. Aijmer‟s 1997 full lexical meaning „cogitation‟).
However, while a simple correlation of prosodic prominence with the
syntactic status of I think may be intuitively appealing and may have indeed
some theoretical value, it falls short of providing a complete explanation for
the corpus data. Simply correlating hierarchical status, i.e. main- vs. comment
clause, with degrees of prosodic prominence ignores the fact that prosody not
only has „vertical‟ function in the sense of foregrounding/backgrounding or
mise en relief, but may also have linear or „horizontal‟ function by linking and
rhythmically structuring elements of speech. A closer look at the corpus data
shows that there are indeed cases where prosodic prominence seems to have
been prompted by rhythmic considerations. Compare, for instance, example
(17), where the separate chunking of uh I think as an independent tone unit
with nucleus on I may have been triggered by an implicit desire to conform to
a rhythmical pattern which involves chunks of roughly 6 milliseconds: / uh I
think / they have every / authority /
44 VIEWS
(17) Uh I think they have every authority both from their governments and from the UN
resolutions to do that (s1b-027-103)
150
140
120
100
80
60
50
uh I think they have every authority
0 0.6 1.23 1.85319
Time (s)
This temporal or linear aspect may be more prominent where I think is used in
a hesitation phase as a staller, whose function is to „buy time‟. Giving I think
more prominence (e.g. a nuclear rather than an onset accent) may allow the
speaker to do precisely that. It may also be assumed that rhythmic
considerations come more into play in public speaking with experienced
speakers (i.e. the text category under investigation). Note also that the
insertion of a that-complementizer in the above example would disrupt the
regularity of the rhythm. I will discuss this issue in more detail in the
following section.
350
300
200
100
50
I think that any woman who wanted to join the MCC
0 2.74344
Time (s)
250
200
150
100
45
And I think that they must be encouraged
0 1.76194
Time (s)
46 VIEWS
(20) I mean I think that if you take as it were a theological attitude (s1b-039-93)
250
200
150
100
50
30
I mean I think that if you take as it were <,> a theological attitude
0 3.33019
Time (s)
In example (18) think takes a nuclear tone with a tone unit boundary after the
complementizer, as indicated by the pitch change on any. In example (19), on
the other hand, think represents the onset of the head, which leads up to (and
includes) the initial syllable of encouraged. Think is preceded by the
unstressed syllables and + I, which represent the pre-head. In example (20)
the pre-head includes both I mean and I think, with the head starting on that.
As noted for zero that-clauses, the accent (both nuclear and non-nuclear) may
shift away from think to the pronoun I, as for instance in example (21) below.
If we compare, as a next step, the distribution of the three prosodic
patterns for I think + that-clause with that of I think + zero, we find that they
closely correspond. Table 3 shows that the most frequent pattern by far is
again that of heads (75%), followed by pre-heads (13.5%) and independent
tone units (11.5%).
n %
Prosodically independent 6 (of which nucleus on I: 2) 11.5%
Right-bound: part of head 39 (of which accent on I: 5) 75.0%
Right-bound: part of pre-head 7 13.5%
Total 52 100%
12 This view is also implied, but not overtly expressed, in Thompson (2002) and Boye and Harder (2007).
48 VIEWS
(21) I think / that he is the most neglected of that uh number of composers around the
turn of the century (s1b-032-103)
250
200
150
100
45
<,> I think that he is the most neglected
0 3.20538
Time (s)
(22) I think that uh she takes a far too hard line (s1b-035-20)
200
150
100
50
25
I think that uh she takes a far too hard line
0 1.11 2.15 3.19 4.07338
Time (s)
18(1) 49
The prosodic realisation of that therefore does not necessarily reflect the
syntactic analysis of the construction, which identifies the complementizer as
part of the subordinate clause.13 Such a mismatch between syntax and
prosody is not really surprising and has been noted before for various other
constructions (e.g. Brazil 1997, Wichmann 2001). It is interesting, however,
that there is a tendency for that to be prosodically grouped with I think rather
than the following clause.14
How can we explain this lack of correspondence between syntax and
prosody? Associating the complementizer on the usage level with I think (and
indeed inserting it in the first place) seems to result from the speaker wanting
to add weight to the CTP in the form of an extra syllable. 15 The reason for
this may be twofold:
First, adding an extra syllable to the initial clause makes it longer and
therefore more effective as clause initial staller used for bridging a hesitation
phase, which is one of the main functions of initial I think (cf. e.g. Stenström
1994, 1995). Compare, for instance, the following example where that has a
staller function similar to that of uhm and and (cf. also example 22 above).16
13 Cf. however the semantic analyses by Davidson (2001), Lepore & Loewer (1989) and Hand (1993) for a
different view.
14 This is also reflected in the fact that in the corpus I think and that are never separated by intervening
material (e.g. hesitation sound, filler), whereas that is frequently separated from the clause of which it is
the head. The level of performance therefore seems to suggest a closer association of that with the main
clause rather than the subordinate clause.
15 With two syllables I think is one of the shortest of all comment clauses (cf. Kaltenböck 2006b, 2008),
which incidentally also seems to have contributed to its advanced stage of grammaticalisation.
16 The nuclear accent on think in this example may in fact have resulted from the stalling function of I think:
giving it more emphasis allows the speaker to gain time and extend the hesitation phase. As noted in
Section 4.1, prosodic prominence not only has a hierarchical/foregrounding function but also a
temporal/linear one.
50 VIEWS
(23) Uhm I I think that uhm once you‟ve spent your money on that the thing to spend
your money on is a subscription to the local horticultural society (s1b-025-133)
250
200
150
100
60
uhm I I think that uhm once you've spent your money on that
0 2.757
Time (s)
(24) then I think that we ought to ask Rabbi Sacks t uh uh to uh uh to say more because
of course he has said two important things (s1b-028-63)
200
150
100
50
25
<,> then I think that we ought to ask Rabbi Sacks
0 4.07038
Time (s)
(25) But I think that we haven‟t in the sense that we have just classification still (s1b-
012-104)
350
300
200
100
55
<,> but I think that we haven t
0 1.68637
Time (s)
(26) I think that in the Labour Party we believe that uh <,> one year of sanctions would
be preferable to one day of war (s1b-035-29)
230
200
150
100
50
20
I think that in the Labour Party we believe that uh <,> one year of sanctions would be preferable to one day of war
0 0.89 1.91 2.72 3.72 5.05 6.12 7.02 8.03487
Time (s)
52 VIEWS
While it is clear that the principle of rhythmic harmony cannot be pressed too
far, it seems that the text type of public conversation is particularly
susceptible to it, especially the text categories broadcast discussions and
broadcast interviews, which typically involve highly experienced public
speakers and incidentally have the highest proportion of that in the corpus
(6.6 and 3.2 occurrences per 10,000 words respectively as opposed to 1.2
occurrences for Private dialogue).
Note, however, that the function of adding extra weight to initial I think
and making it conform to an overall rhythmical pattern is by no means
restricted to that. Other items may fulfil a similar function, such as then in
example (27).
(27) Then I think little nations round the world will sleep easier (s1b-027-69)
250
200
150
100
40
then I think little nations round the world will sleep easier
0 0.66 1.31 1.91 2.86338
Time (s)
The rhythm of the construction can also be linked to the type of subject in the
that-clause. Consider, for instance, example (28), where the subject of the
second clause consists of an unstressed syllable (there), which is followed by
two further unstressed syllables (is a). The resulting rhythmic pattern of the
entire construction I think there is a certain arrogance (which has an accent
on I) is thus: . Note that a that-complementizer
would add an extra unstressed syllable to the row of three unstressed
syllables, which is not desirable for rhythmical reasons.
18(1) 53
(28) I think there is a certain arrogance on the part particularly of the extreme left in
Britain on this matter (s1b-027-136)
300
200
100
40
I think there is a certain arrogance
0 0.45 0.84 1.28 1.89813
Time (s)
A closer analysis of the corpus data shows that unstressed subjects in the form
of existential there + be or the pronoun it + be strongly prefer omission of
that: with these subjects that is used in only 3.1 percent (3/91) in Public
dialogue (s1b), in contrast to all other subjects where that occurs 12.6 percent
of the time (49/389). This preference for that-omission in fact proves to be
Similar results are obtained if we take into account all pronouns that are
typically unstressed, i.e. existential there, anticipatory it, and all personal
pronouns, but disregard all other types of subject, such as full NPs, clauses,
stressed pronouns (e.g. this, that, mine) as well as pronouns preceded by
intervening adverbials or hesitation markers, i.e. pronouns not immediately
following I think (that). The statistical analysis shows that for initial I think in
Public dialogue a preference for zero that is highly significantly affected by
the presence of these unstressed subjects (cf. Table 4).
Table 4. Occurrence of that with unstressed pronominal subjects in the „object‟ clause
54 VIEWS
This finding ties in with Elsness‟ (1984) observation that complex subjects
correlate with that-retention. As a possible explanation for this he notes that
“[a]lthough there is no risk of ambiguity in such constructions, one may see
the selection of that connective as a contribution to greater syntactic clarity”
(Elsness 1984: 532). This may be true for written texts. For spoken language,
however, it is necessary to take into account rhythmic considerations, viz.
unstressed subjects favouring that-omission and, closely associated with this,
memory constraints in online production: production of a syntactically
complex subject, which can also be expected to have high informational
value, will normally require extra „thinking time‟ (cf. Rohdenburg‟s 1998
complexity principle), which is provided for by the that-complementizer.17
Elsness (1984) also mentions adverbials occurring at the boundary
between matrix verb and object clause subject as a factor favouring that
insertion. He attributes this to “a (conscious or unconscious) desire on the part
of the writer to avert ambiguity” (Elsness 1984: 532). In other words, that
insertion identifies the adverbial as belonging to either the matrix- or the
object clause. In the case of spoken I think (in Public dialogue), however, all
adverbials preceding the subject of the object clause are clearly part of the
that clause. There are no instances where that insertion would indicate
association of an adverbial with the CTP. That always immediately follows I
think. With that omission, on the other hand, all adverbials in pre-subject
position (adverbs, PPs, clauses) are unambiguously identifiable as part of the
„object‟ clause on semantic (and grammatical) grounds, cf. for instance (29).
(29) a. I think [according to your evidence] Ferndale Business Services got in touch
with you … (s1b-064-97)
b. Uhm <,> eh uh I think [when I was younger] I was more self-confident and
arrogant and perhaps ruthless you know ... (s1b-041-204)
Disambiguation therefore can be excluded as a conditioning factor for that
insertion with spoken I think. The high proportion of that omission (80.9%)
with pre-subject adverbials also attests to this. Nonetheless, that insertion is
still significantly affected by adverbials preceding the „object‟ clause subject
(cf. Table 5). Compare, for instance, example (30).
17 Elsness (1984) notes coreferentiality of the pronominal object clause subject with the matrix clause
subject as a further conditioning factor for that omission in written texts. For spoken I think, however,
coreference of the two subjects does not play a major role: only 4.6 percent (20/434) of all zero that-
clauses in Public dialogue have I as their subject, compared to 1.9 percent (1/52) of Is in that-clauses.
18(1) 55
(30) And I think that that [perhaps in the lectures] there was there wasn‟t really a a a an
appreciation of the positive benefit to religious traditions of the cultural
engagements which took place (s1b-028-37)
Table 5. Occurrence of that with adverbials preceding the „object‟ clause subject
The reason why pre-subject adverbials favour that insertion again seems to lie
in the greater syntactic complexity of the „object‟ clause. Just like complex
subjects, adverbials in pre-subject position increase the syntactic weight of the
„object‟ clause in unusual, i.e. initial, position (cf. end-weight principle),
making it „nose-heavy‟, as it were. This, in turn, increases production effort
and favours the insertion of a filler in the form of that.
The view of that functioning as a filler rather than a genuine subordinator
marking the boundary between main- and subordinate clause is also supported
by the following example from the corpus, where the position of that (if
understood as a subordinator) would suggest a clause boundary after the
adverbial (in the present climate linked with her disability). Semantically,
however, the adverbial can only be understood as being part of the „object‟
clause, which disqualifies that as a marker of subordination.
(31) I think in the present climate linked with her disability that finding a full-time
tenured post will be (s1b-062-49)
The that-complementizer, in other words, has an important temporal function,
like typical fillers, which allow the speaker to „buy time‟. This, in turn, can
help alleviate production difficulties, as noted for instance by Jaeger (2005)
(cf. also Clark 2004). Close analysis of the corpus data shows that there is
indeed a trade-off between the use of that and production difficulties, with
insertion of that correlating with fewer instances of repetition and/or restarts
immediately preceding or following I think that. More precisely, with that
omission we find such disfluencies in 16.7 percent (27 instances) of all cases,
such as example (32). With that insertion, on the other hand, such disfluencies
occur in only 3.4 percent (1 instance) of all cases.
5. Conclusion
In this paper I have tried to show that in spoken language the that-
complementizer in object clauses no longer functions as a genuine marker of
subordination (i.e. indicating syntactic hierarchy) but rather as a filler (i.e.
functioning on the linear plane). This erosion of grammatical meaning of that
can be linked to the semantic erosion of the CTP-phrase: more fully
grammaticalised (pragmaticalised) CTP-phrases, involving high-frequency
weak assertives, such as I think, I suppose, are no longer syntactically
interpreted as main clauses, which consequently reduces the need for an overt
marker of subordination. If the that is still used, it is usually simply a „filling‟
device inserted for rhythmical purposes or to alleviate production difficulties.
Since the role of that is closely linked to the syntactic status of the CTP-
phrase, I have first tried to show that syntactic tests intended to demonstrate a
difference between CTPs followed by that and zero do not provide conclusive
18 A similar view has recently also been expressed by Kearns (2007: 501), who argues that “[t]he modifier
sense of an epistemic verb and its subject in matrix position promotes zero in the complement clause”.
18(1) 57
evidence (Section 3.1). I have then turned to a discussion of cognitive-
functional arguments, which indicate that, although epistemic CTPs are
frequently downgraded, their status is largely indeterminate and depends on
actual contextual realisation (Section 3.2). I have subsequently suggested
prosodic realisation as a possible decisive factor for signalling foregrounding
and backgrounding (i.e. main and comment clause status) of the CTP. The
prosodic analysis in Section 4 has focussed on I think as an extreme case of
grammaticalisation and shows that presence or absence of the that-
complementizer does not correspond with different prosodic behaviour.
Although I think + that reveals a slightly higher propensity to occur with a
separate nuclear tone than I think + zero, both constructional types exhibit a
similar distribution of the three prosodic patterns identified. This means that
the two formal signals available for indicating relative prominence of I think,
prosody and an explicit marker of subordination, do not match. If we take
relative pitch prominence as an indication of matrix clause status, we have to
conclude that both constructional variants may qualify for main clause status
but at the same time very rarely do. This equivalence in actual use of the two
syntactic types casts additional doubt on the subordinator function of the that-
complementizer in spoken language, which is corroborated by prosodic
evidence and co-occurrence facts (subject type, adverbials, disfluency
features) suggesting that the that-complementizer is mainly used as a filler
inserted for rhythmical reasons or to alleviate production difficulties,
especially if followed by a syntactically complex „object‟ clause.
58 VIEWS
Appendix
Text type (number of words) - that + that Total
Private dialogue s1a (205,627) 94.9% (466) 5.1% (25) 100% (491)
Public dialogue s1b (171,062) 89.3% (434) 10.7% (52) 100% (486)
Public monologue s2a 87.9% (80) 12.1% (11) 100% (91)
(152,829)
Scripted speech s2b (108,164) 81.4% (57) 18.6% (13) 100% (70)
Total 91.0% (1036) 9.0% (102) 100% (1138)
Table A. Clause-initial I think followed by that- and zero in ICE-GB (raw figures in
brackets)
Table B. Clause-initial I suppose followed by that- and zero in ICE-GB (raw figures in
brackets)
Table C. Clause-initial I hope followed by that- and zero in ICE-GB (raw figures in
brackets)
18(1) 59
Text type (number of words) - that + that Total
Private dialogue s1a (205,627) 100% (1) 0% (0) 100% (1)
Public dialogue s1b (171,062) 55.6% (15) 44.4% (12) 100% (27)
Public monologue s2a 0% (0) 100% (5) 100% (5)
(152,829)
Scripted speech s2b (108,164) 46.2% (6) 53.8% (7) 100% (13)
Total 47.8% (22) 52.2% (24) 100% (46)
Table D. Clause-initial I believe followed by that- and zero in ICE-GB (raw figures in
brackets)
Table E. Clause-initial I guess followed by that- and zero in ICE-GB (raw figures in
brackets)
Table F. Clause-initial I’m afraid followed by that- and zero in ICE-GB (raw figures in
brackets)
Table G. Clause-initial I suspect followed by that- and zero in ICE-GB (raw figures in
brackets)
60 VIEWS
References
Aijmer, Karin. 1972. Some aspects of psychological predicates in English. Stockholm:
Almqvist & Wiksell.
Aijmer, Karin. 1997. “I think – an English modal particle”. In Swan, T.; Westvik, O. J.
(eds.). Modality in Germanic languages. Historical and comparative perspectives. Berlin:
Mouton de Gruyter, 1-47.
Asher, Nicholas. 2000. “Truth conditional discourse semantics for parentheticals”. Journal
of Semantics 17 (1), 31-50.
Biber, Douglas. 1999. “A register perspective on grammar and discourse: variability in the
form and use of English complement clauses”. Discourse Studies 1 (2), 131-150.
Biber, Douglas; Johansson, Stig; Leech, Geoffrey; Conrad, Susan and Finegan, Edward.
1999. Longman grammar of spoken and written English. Harlow: Longman.
Boersma, Paul & Weenink, David (2008). Praat: doing phonetics by computer (Version
4.4.33) [Computer programme]. Retrieved December 2005, from http://www.praat.org
Bolinger, Dwight. 1985. “The inherent iconism of intonation”. In Haiman, John (ed.).
Iconicity in syntax. Amsterdam: Benjamins, 97-108.
Boye, Kaspar; Harder, Peter. 2007. “Complement-taking predicates: usage and linguistic
structure”. Studies in Language 31 (3), 569-606.
Brandt, Margareta. 1984. “Subordination und Parenthese als Mittel der
Informationsstrukturierung in Texten“. Sprache und Pragmatik 32, 1-37.
Brazil, David. 1997. The communicative value of intonation in English. Cambridge: CUP.
Brinton, Laurel J. 1996. Pragmatic markers in English. Grammaticalization and discourse
functions. Berlin: Mouton de Gruyter.
Brinton, Laurel J.; Traugott, Elizabeth Closs. 2005. Lexicalization and language change.
Cambridge: CUP.
Clark, H. 2004. “Pragmatics of language performance”. In Horn, L.R.; Ward, G. (eds.).
Handbook of pragmatics. Oxford: Blackwell, 365-382.
Couper-Kuhlen, Elizabeth. 1986. An introduction to English prosody. London: Edward
Arnold.
Cruttenden, Alan. 1997. Intonation (2nd ed.). Cambridge: CUP.
Davidson, D. 2001. Inquiries into truth and interpretation (2nd ed.). Oxford: Clarendon.
Diessel, Holger; Tomasello, Michael. 1999. “Why complement clauses do not include a
that-complementizer in early child language”. Proceedings of the 25th Annual Meeting,
Berkeley Linguistics Society, 86-97.
Elsness, J. 1984. “That or zero? A look at the choice of object clause connective in a
corpus of American English”. English Studies 65, 519-33.
Erman, Britt; Kotsinas, Ulla-Britt. 1993. “Pragmaticalization: the case of ba and you
know”. Studier I modern språkvetenskap (New series 10, Acta Universitatis
Stockholmiensis), 76-93.
Erteschik-Shir, N., Lappin, S. 1979. “Dominance and the functional explanation of island
phenomena”, Theoretical Linguistics 6 (1), 41-86.
Firbas, Jan. 1992. Functional sentence perspective in written and spoken communication.
Cambridge: CUP.
18(1) 61
Fischer, Olga. 2007a. Morphosyntactic change. Functional and formal perspectives.
Oxford: OUP.
Fischer, Olga. 2007b. “The development of English parentheticals: a case of
grammaticalization?”. In Smit, Ute et al. (eds.) Tracing English through time. Wien:
Braumüller, 99-114.
Givón, Talmy. 1984. Syntax: a functional-typological introduction. Vol. 1. Amsterdam:
Benjamins.
Goodwin, Charles; Goodwin, Marjorie Harness. 1992. “Assessments and the construction
of context”. In Goodwin, Charles; Duranti, Alessandro (eds.). Rethinking context.
Cambridge: CUP, 147-189.
Green, Georgia M. 1976. “Main clause phenomena in subordinate clauses”, Language 52,
382-397.
Hand, M. 1993. “Parataxis and parentheticals”. Linguistics and Philosophy 16, 495-507.
Halliday, M.A.K. 1985. An introduction to functional grammar. London: Edward Arnold.
Hand, Michael. 1993. “Parataxis and parentheticals”. Linguistics and Philosophy 16, 495-507.
Hooper, Joan B, Thompson, Sandra. 1973. “On the applicability of root transformations”.
Linguistic Inquiry 4 (4), 465-497.
Hooper, Joan B. 1975. “On assertive predicates” In Kimball, J.P. (ed.). Syntax and
semantics. Vol 4. New York: Academic Press, 91-124.
Hopper, Paul J.; Traugott, Elizabeth Closs. (2003). Grammaticalization (2nd edition).
Cambridge: CUP.
Huddleston, Rodney; Pullum, Geoffrey K. 2002. The Cambridge grammar of the English
language. Cambridge: CUP.
Jaeger, Florian T. 2005. “Optional that indicates production difficulty: evidence from
disfluencies”. Proceedings of DiSS’05 Disfluency in Spontaneous Speech Workshop. 10-12
September 2005, Aix-en-Provence, France, 103-109.
Kaltenböck, Gunther. 2006a. “„...That is the question‟: complementizer omission in
extraposed that-clauses”. English Language and Linguistics 10 (2), 371-396.
Kaltenböck, Gunther. 2006b. “Some comments on comment clauses: a semantic
classification”. In Povolná, Renata; Dontcheva-Navratilova, Olga (eds.). Discourse and
interaction. Brno: Masarykova Univserszita, 71-87.
Kaltenböck, Gunther. 2008. “Prosody and function of English comment clauses”. Folia
Linguistica 42 (1), 83-134.
Kaltenböck, Gunther. 2009a. “Initial I think: main or comment clause?”, Discourse and
Interaction 2 (1), 49-70.
Kaltenböck, Gunther. 2009b. “English comment clauses: position, prosody, and scope”.
Arbeiten aus Anglistik und Amerikanistik 34 (1), 49-75.
Kaltenböck, Gunther. 2009 forthc. “Pragmatic functions of parenthetical I think”. In
Kaltenböck, Gunther; Mihatsch, Wiltrud; Schneider, Stefan. (eds.). New approaches to
hedging. Amsterdam: Elsevier.
Kärkkäinen, Elise. 2003. Epistemic stance in English conversation. Amsterdam: Benjamins.
Kärkkäinen, Elise. 2009 forthc. “Position and scope of epistemic phrases in planned and
unplanned American English”. In Kaltenböck, Gunther; Mihatsch, Wiltrud; Schneider,
Stefan. (eds.). New approaches to hedging. Amsterdam: Elsevier.
62 VIEWS
Kearns, Kate. 2007. “Epistemic verbs and zero complementizer”. English Language and
Linguistics 11 (3), 475-505.
Knowles, John. 1980. “The tag as a parenthetical”. Studies in Language 4, 370-409.
Kruisinga, E. 1932. A handbook of present-day English. Part II. Groningen: Noordhoff.
Langacker, Ronald W. 1991. Foundations of cognitive grammar. Vol II: Descriptive
applications. Stanford: Stanford University Press.
Lepore, E. & B. Loewer (1989). “You can say that again”. In French, P. et al. (eds.) Midwest
studies in philosophy. Vol XIV. Notre Dame, IN: University of Notre Dame Press, 338-56.
Linell, Per. 1998. Approaching dialogue: talk, interaction and contexts in dialogical
perspectives. Amsterdam: Benjamins.
Mackenzie, J. Lachlan. 1984. “Communicative functions of subordination”. In Mackenzie,
J.L.; Wekker, H. (eds.). English language research: the Dutch contribution I. Amsterdam:
Free University Press, 67-84.
Mindt, Ilka. 2003. “Is I think a discourse marker?”. In Mengel, Ewald et al. (eds.)
Proceedings Anglistentag 2002 Bayreuth. Trier: WVT, 473-483.
Nelson, Gerald; Wallis, Sean; Aarts, Bas. 2002. Exploring natural language. Working with
the British Component of the International Corpus of English. Amsterdam, Philadelphia:
Benjamins.
Nuyts, Jan. 2000. “Tensions between discourse structure and conceptual semantics: the
syntax of epistemic modal expressions”. Studies in Language 23 (1), 103-135.
Peterson, Peter. 1999. “On the boundaries of syntax: non-syntagmatic relations”. In
Collins, Peter; Lee, David (eds.). The clause in English. Amsterdam, Philadelphia:
Benjamins, 229-250.
Pomerantz, Anita; Fehr, B. J. 1997. “Conversation analysis: an approach to the study of
social action as sense making practices”. In Dijk, Teun A. van (ed.). Discourse as social
interaction. London: Sage, 65-91.
Quirk, Randolph; Greenbaum, Sidney; Leech, Geoffrey; Svartvik, Jan. 1985. A
Comprehensive grammar of the English language. Harlow: Longman.
Rohdenburg, Günther. 1998. “Clausal complementation and cognitive complexity in
English”. In Neumann, Fritz-Wilhelm; Schülting, Sabine. (eds.). Anglistentag 1998 Erfurt
Proceedings. Trier. Wissenschaftlicher Verlag, 101-111.
Ross, John R. 1973. “Slifting”. In Gross, Maurice, Halle, Morris, Schützenberger, Marcel-
Paul (eds.). The formal analysis of natural languages. The Hague: Mouton, 133-169.
Sadock, Jerrold M. 1984. “The pragmatics of subordination”. In Geest, Wim de; Putseys,
Y. (eds.). Sentential complementation. Dordrecht: Foris, 205-213.
Schegloff, Emanuel. 1990. “On the organization of sequences as a source of „coherence‟ in
talk-in-interaction”. In Dorval, B. (ed.). Conversational organization and its development.
Norwood, NJ: Ablex, 51-77.
Simon-Vandenbergen, Anne-Marie. 2000. “The functions of I think in political discourse”.
International Journal of Applied Linguistics 10 (1), 41-63.
Stenström, Anna-Brita. 1994. An introduction to spoken interaction. London: Longman.
Stenström, Anna-Brita. 1995. “Some remarks on comment clauses”. In Aarts, Bas; Meyer,
Charles F. (eds.). The verb in contemporary English. Cambridge: CUP, 290-299.
18(1) 63
Svensson, Jan. 1976. “Reportindicators and other parentheticals”. In Karlsson, F. (ed.).
Papers from the Third Scandinavian Conference of Linguistics. Turku: Textlinguistics
Research Group, Academy of Finland, 369-380.
Tagliamonte, Sali; Smith, Jennifer. 2005. “No momentary fancy! The zero
„complementizer‟ in English dialects”. English Language and Linguistics 9 (2), 289-309.
Thompson, Sandra A. 2002. “‟Object complements‟ and conversation. Towards a realistic
account”. Studies in Language 26 (1), 125-164.
Thompson, Sandra A.; Mulac, Anthony. 1991. “The discourse conditions for the use of the
complementizer that in conversational English”. Journal of Pragmatics 15, 237-251.
Tomlin, Russell. 1985. “Foreground-background information and the syntax of
subordination”. Text 5, 85-122.
Traugott, Elizabeth Closs. 1995. “Subjectification in grammaticalisation”. In Stein, Dieter;
Wright, Susan (eds.). Subjectivity and subjectivisation. Cambridge. CUP, 31-54.
Urmson, J. O. 1952. “Parenthetical verbs”. Mind 61, 480-496.
Van Bogaert, Julie. 2006. “I guess, I suppose and I believe as pragmatic markers:
garammaticalization and functions”. Belgian Journal of English Language and Literatures
4, 129-149.
Wells. John C. 2006. English intonation: an introduction. Cambridge: CUP.
Wichmann, Anne. 2000. Intonation in text and discourse. Beginnings, middles and ends.
Harlow: Longman.
Wichmann, Anne. 2001. “Spoken parentheticals”. In Aijmer, K. (ed.). A wealth of English.
Studies in honour of Göran Kjellmer. Göteborg: Acta Universitatis Gothoburgiensis, 177-193.
64 VIEWS
1. Introduction
The notion of intelligibility is a highly complex matter which is thought to
consist of a great number of factors. This particularly holds true for situations
in which the language of communication is a non-native language for all
participants, in other words a lingua franca (Seidlhofer 2001: 146). English
has come to be the world‟s global language, with an estimated number of 300-
400 million second and approximately 500-700 foreign language users
(Crystal 2000: 10). Therefore, the amount of speakers who use English as a
second or foreign language clearly exceeds the estimated number of 350-450
million first language users (Crystal 2000: 9). Moreover, it is assumed that
approximately 80% of all communication occurs in the absence of native
speakers (cf. Carter 1998). For this reason, the question of what hinders or
promotes intelligibility in such communicative situations is a crucial one.
One ground-breaking approach to this issue was Jenkins‟ (2000) empirical
study of communication breakdown in naturally occurring conversations
between non-native speakers (NNS) of English. Furthermore, a considerable
number of studies exploring factors involved in intelligibility have been
carried out in psycholinguistics and also acoustic-phonetics (henceforth
referred to as intelligibility studies). Naturally, these approaches have differed
considerably from Jenkins‟ approach, not only with regard to methodology
but also to their various underlying assumptions (cf. 3.3.). Moreover,
intelligibility studies have rarely concerned themselves with the intelligibility
of NNS to non-native listeners (NNL)1 and, to the author‟s knowledge, no
extensive psycholinguistic studies on the role of segmentals for intelligibility
2. Defining intelligibility
2 This paper is based on the author‟s MA thesis Aspiration, [θ]/[ð] und /r/ in Englisch als Lingua Franca –
eine psycholinguistische Studie zu drei Vorschlägen des Lingua Franca Core, written at the Department
of General and Applied Linguistics under the supervision of Prof. Dr. Wolfgang U. Dressler.
66 VIEWS
Derwing & Munro (1997), Munro & Derwing (1995a), Munro, Derwing &
Morton (2006) also clearly differentiate intelligibility and comprehensibility
from accentedness, which is defined as “the degree to which the pronunciation
of an utterance sounds different from an expected production pattern” (Munro,
Derwing & Morton 2006: 112). These three terms are seen as “related but
partially independent dimensions” (Derwing & Munro 1997: 2). It was found
that “although some features of accent may be highly salient, they do not
necessarily interfere with intelligibility” (Derwing & Munro 1997: 11). Their
results showed that accent sometimes had a negative effect on intelligibility but
that this effect did not correlate with the degree of accent and that even strong
accents did not necessarily result in poor intelligibility (cf. Munro & Derwing
1995a: 301, 1995b: 74; Munro 1998: 139ff). It therefore seems necessary to
differentiate between these three terms, especially when applied to an ELF
context, as in the pilot study in this paper.
3 Cf. also Niedzielski & Preston (2000) for the influence of language attitudes on perception.
4 For an extensive literature review cf. Osimk (2007: 33-44).
68 VIEWS
to which aspects to prioritise in pronunciation teaching. Regarding the pilot
study, a high consensus for listeners with similar language backgrounds and
experiences would indicate a stronger influence of listener-related factors.
However, if listener groups – regardless of their language experience –
concurred as to which language properties are easy to understand, this would
point to a greater importance of item-related factors (Hazan & Markham
2004: 3109; Munro, Derwing & Morton 2006: 114).
The importance of item-related factors over the listener-related factors was
suggested by a number of studies. In an early study, Flege (1988) noticed that
the listeners made similar intelligibility judgements concerning accentedness.
Similarly, Smith & Bisazza (1982) observed that the listeners agreed as to
which speakers were most difficult to understand and reported that “89% of
the subjects responded that the Indian speaker was the most difficult to
comprehend” (1982: 267). Later studies gave similar findings: Munro,
Derwing & Munro, for example, showed that two listener groups, English
native listeners (ENL) and NNL with different first languages, largely agreed
on “which of the 48 speakers were the easiest and most difficult to
understand; between-group effect sizes were generally small” (2006: 111).
They conclude that the listeners‟ listening experience contributed less to
understanding than item-related factors (2006: 125). Hazan and Markham
(2004) describe the correlation between NL children and adults as to which of
the NS were the easiest or most difficult to understand as a “striking fact”
(2004: 3112). Finally, Major et al.‟s (2002) findings show a strong tendency
that the listeners agree on which accents were rather intelligible or
unintelligible.
5 Hirschfeld‟s and Jenkins‟ findings were similar with regard to the importance of aspiration, vowel quantity
over quality and the problematic nature of alternating pronunciation of the central vowel (cf. Osimk 2007
for a more detailed discussion of the two investigations).
70 VIEWS
conducted intelligibility studies.6 Most importantly, Jenkins considers ELF as
a phenomenon in its own right, and not merely as an inferior variant of ENL.
Even though the LFC is based on L1-varieties, Jenkins does not contrast ENL
and ELF in a judgemental way and views non-native utterances, if intelligible,
as “perfectly acceptable instances of L2 sociolinguistic variation” (Seidlhofer
2004: 217). Contrary to a large number other intelligibility studies, “a genuine
difference perspective” as opposed to “a deficit [...] perspective” (Seidlhofer
2004: 217) is assumed, on the basis of demonstrable irrelevance of certain
features.
The deficit perspective addressed by Seidlhofer (2004) is apparent in the
majority of experimental intelligibility studies with NNL. It manifests itself in
four underlying assumptions, which are problematic regarding intelligibility
from the position of the NNL and especially for ELF communication. These
four assumptions are that NS are more suitable than NNS to judge which
factors influence intelligibility; the use of NS approximation as primary goal
in pronunciation teaching; the supposition that accentedness equals poor
intelligibility; and that NNS-communication is, per se, less successful than
“purely „native‟ speech communication” (van Wijngaarden, Steeneken &
Hourgast 2002: 1906).7 However, in the light of the expansion of English as a
global language these assumptions are in urgent need of reconsideration and
re-evaluation, especially as the growing body of ELF-research shows that
NNS do not, by any means, communicate unsuccessfully but rather highly
effectively in ELF situations (e.g. Firth 1996, Meierkord 1996).
Unfortunately, a detailed, critical discussion of the application of these four
underlying assumptions is beyond the scope of this paper, but can be found in
Osimk (2007) or Rajadurai (2007).
4. Pilot study
This section introduces the pilot study that was conducted to assess
intelligibility from a segmental point of view, assessing the importance of
aspiration and of the different realisations of the interdental fricative and /r/
for intelligibility.
8 The author is aware that standard language ideology is a highly complex and controversial issue. However,
for the purpose of this paper, the term standard was chosen to refer to Received Pronunciation (RP) and
General American (GA). This paper is directed, to a large degree, towards pedagogical applications and
RP and GA are the varieties most commonly taught in ESL/ESP (Jenkins 2003: 31). One of the aims of
the pilot study was to compare GA/RP standard pronunciations to alternating realisations to determine
which realisations were most intelligible. For this reason, the issues of standardisation and ownership of
English are not discussed in detail.
9 The SAA was used for this study with the kind permission of Steven H. Weinberger.
72 VIEWS
inventory. The large number of speakers and a search function for specific
realisations, (e.g. non-aspiration) made the SAA a suitable tool for stimuli
selection for this pilot study. However, the shortness of the text, and thus the
limited number of test words available, are an unavoidable disadvantage for
the reliability of the study. The chosen realisations were segmented with the
programme STx10 and saved as mono files. In order to gain a larger number
of stimulus words, two sets of data were created, whereby the same
realisations occurred in both sets of data and only the speakers and words
varied.
4.3. Methods
The study was conducted in two parts using the dictation method. In test 1,
the stimulus words were tested in isolation, while test 2 assessed the same
stimuli in their phrasal co-text or with their syntactic constituents. To prevent
effects of familiarity, the listeners only participated either in test 1 or in test 2
of the study.
4.4. Participants
4.4.1. Speakers
The readings of the text for 13 speakers, 8 male and 5 female, aged 18 to 66
were chosen from the SAA. The distribution of first languages was 3 Spanish,
2 Italian, 4 French and 4 German (3 German, 1 Swiss German). For the pilot
study, the speakers were chosen according to their first languages and
according to how many of the tested features (aspiration and different
realisations of the interdental fricative and /r/) were produced while reading
the text. The distribution of countries of origin was highly diverse; for
example, the three Spanish speakers came from Venezuela, Nicaragua and
Spain. It can not be excluded that these accent variations influenced the
intelligibility scores, e.g. regarding the experience of the listeners with these
accents.
10 The programme STx (Version 3.7.5) was provided by the Vienna Acoustics Research Institute and used
with their permission.
18(1) 73
4.4.2. Listeners
In total, 64 listeners aged 19-31, 23 male and 33 female, with the L1s French,
German, Italian and Spanish participated in the study. Most of the participants
were exchange students at the University of Vienna at the time the study was
conducted, and therefore, their level of education can be assumed to be fairly
similar. All but one participant (sp13) came from European countries, i.e.
depending on L1, from Spain, Italy, France or Austria. The participants
indicated that they had grown up monolingually, with the exception of the 5
Spanish speakers (4 were bilingual in Spanish/Catalan and 1 in
Spanish/Galician), and one participant who was bilingual in
French/Hungarian.
From the years of language instruction and experience which the listeners
had indicated in the questionnaire, all listeners can be regarded as advanced
learners of English. The exchange students were considered suitable for this
study, as they were assumed to be regular ELF users in their daily lives (and
this was also confirmed by the questionnaire answers). At the same time,
however, the participants represent a specific target group with similar ages
and educational backgrounds which may or may not be representative of other
segments of the population (e.g. who differ in their educational background).
4.5. Procedure
The participants listened to the words in isolation (test 1) or in their syntactic
and phrasal co-text (test 2) and were asked to transliterate what they
understood. If needed, it was possible for the participants to listen to the
stimuli a second time. Furthermore, the listeners were requested to note down
any comment they wished to add. For every L1, 8 listeners participated in test
1 (total number =32). The order in which the stimuli were played varied. For
test 1, the listeners heard 38 words distributed as follows: 7 words for
aspiration, 9 words with variations of the interdental fricative and 11
variations of /r/. Additionally, 10 „dummy‟ words, which contained none of
the tested features, were played in between to minimise an effect of
familiarity with regard to the tested features.
In test 2, the words from test 1 were examined in context. As the main
focus was on investigating the intelligibility of the words in isolation, only 24
listeners were tested, 6 for each of the 4 first languages. For each feature, the
text was divided into 12 parts (cf. 8.2.), leaving gaps for the words containing
this feature. Each feature was tested separately and the listeners were only
tested on one of the features to avoid familiarity effects. In testing the
interdental fricative, for example, the listeners heard phrase (1) and were
74 VIEWS
asked to fill in the missing words (snack – dummy word and brother –
interdental fricative).
4.6. Aspiration
11 VOT can be seen as indicator for aspiration rather than voicing, as there are languages with voiceless,
unaspirated plosives, e.g. Austrian German, where lenis plosives are not voiced but have a shorter VOT
than fortis plosives. Moreover, VOT is only an indicator for languages with aspirated, voiced plosives
(e.g. Hindi) which have a long VOT but also vocal fold vibration (comment by Sylvia Moosmüller,
personal communication, 2007)
18(1) 75
differences in the context in which the words were tested and the tested
variety of English, as well as the fact that the border between voiced and
voiceless plosives is assumed to be a continuum. For the purposes of this
study, it was useful to consider the variability of the VOT-measures and to
divide the plosives into 0-20ms (unaspirated) and 40-60ms (aspirated) and to
assume a transition area between 20-40ms.
The listeners‟ first languages differ in the way they subcategorise their
plosives. Germanic languages, such as German, contrast between aspirated
and unaspirated voiceless plosives. Therefore, there is no vocal fold vibration
in these languages. Romance languages (French, Italian and Spanish in this
study) however, differentiate between voiceless-unaspirated and voiced
plosives (Ladefoged 2005: 137). While the English velar plosive /k/ has a
VOT of about 50-60ms, the VOT for the same sound in Spanish is only about
20ms. It could, therefore, be expected that listeners with a Romance L1 will
recognise fortis plosives with a shorter VOT more easily than listeners with
the L1 German.
For the study, VOT was measured with spectrograms and waveforms,
using the programme STx. The measurement was taken from the plosive
release until the start of the vocal fold vibration of the following vowel, the
first positive zero-crossing. The eleven plosives were first analysed according
to recognised words and then according to recognised feature. The feature /k/,
for example, was counted as recognised when the plosive was identified by
the listener, e.g. when call instead of the target word car was transliterated.
4.6.2. Results
The analysis shows the clear tendency that words containing fortis plosives
with a longer VOT were recognised more often than those with a shorter
VOT. Words with a VOT of 0-20ms were recognised 5.7 times on average
(n=16), words with a VOT of 40-60ms 12.7 times (p=0.05) (cf. Figure 1).
Despite the margin of error involved, a clear enhancement in intelligibility for
plosives with a longer VOT could be shown.
76 VIEWS
No advantage was observed when the fortis plosives were realised with the
VOT which is standard for the listeners‟ L1. The plosives with a shorter VOT
were not recognised any more often by the Spanish, Italian and French
listener groups than by the listeners with L1 German, where the boundary
between voiceless and voices is similar to that of English (cf. Ladefoged
2005: 137).
4.7.2. Results
Both for the lenis and the fortis interdental fricative it was found that the
lowest intelligibility scores were attained if substituted by /s/ and /z/. This is
observed for both the percentage of recognised words, as well as the
recognised feature (cf. Table 1).
% Correctly
recognised % Correctly Total number
Words Words recognised F of words
[ð] as /s z/ these 0.0% 18.8% 16
the, brother,
[ð] as /t d/ these (2) 34.8% 60.7% 112
[ð] as /f v/ with (2) 43.8% 43.8% 32
Table 1: Percentages of correctly recognised words and features (F) of [ð]and [θ] .(2) = this
word was produced by two speakers, spoken with the same variant. Total number of words
= the total number of times words containing this variant were listened to.
For all listener groups, [θ] was understood well in all realisations except as an
alveolar fricative (cf. Table 2). [ð] realised as a labiodental fricative reached
slightly lower intelligibility scores with the Spanish listeners than with the
other listener groups. Apart from the French listeners, 50% of whom
78 VIEWS
recognised /z/ as [ð] in these, all listener groups reached low intelligibility
scores for this realisation. [θ] realised as /s z/ in the words things and three
was better understood by the French and German-speaking listener groups
than it was by the Spanish and Italian listener groups. In addition, the
intelligibility scores for these (with [ð] realised as /t/ were low for all listener
groups. The reason for this may lie in the fact that the dental fricative was
pronounced fortis and not lenis, as commonly pronounced in standard
English.
Number of
words
Spanish Italian French German
containing this
Feature realisation Word(s)
things,
[θ] as /s z/ 3 three 25,0% 41,7% 50,0% 75,0%
[ð] as z/ 1 these 25,0% 0,0% 50,0% 0,0%
thick,
[θ] as [θ] 4 things 75,0% 75,0% 50,0% 75,0%
Table 2. Percentage of recognised features [θ] and [ð] of the test words according to
listener groups.
4.8.2. Results
The total number of recognised words showed a marked difference between
rhotic and non-rhotic realisation, whereby non-rhotic /r/ produced higher
intelligibility scores (rhotic 11.5%, n=5 vs. non-rhotic 58.3%, n=2). This
advantage held true for all listener groups (see Table 3). Compared to the total
score of recognised feature /r/, however, this was not the case (rhotic 64.6%
vs. non-rhotic 70.8%). A difference was only observed for the L1 German
listener group. The recognition of the feature /r/ for this group was 53.1% for
rhotic realisation and 83.3% for non-rhotic realisation. For all other listener
groups, the difference of recognition for the two realisations was less than
10%. (cf. Table 4).
Table 3: Comparison of recognised words with rhotic and non-rhotic variants, according to
listeners‟ L1.
80 VIEWS
Table 4: Comparison of recognised feature with rhotic and non-rhotic variants, according
to listeners‟ L1.
For all listeners taken together, no significant difference was observed for the
realisation of /r/ as an alveolar, uvular or standard variant. This held true both
for the number of correctly recognised words, as well as the recognition of the
feature /r/. There was a visible tendency however, which showed that the
uvular realisation received slightly lower intelligibility scores than the
alveolar and standard pronunciation. For recognition of the feature /r/, there
was only an insignificant difference between alveolar (70.8%) and standard
pronunciation (76.3%). For the uvular realisation, feature /r/ was recognised
only 51.1% of the time.
Figure 2: Comparison of correctly recognised words and feature /r/ for all listener groups,
according to place of articulation. /r/ uvular: words 32.8%, feature 51.6%; /r/ apical trill:
words 37.5%, feature 70.8%, /r/ norm words 42.5%, feature 76.3%.
An advantage for all groups except the German-speaking group was observed
for the realisations which occur commonly in the listeners‟ L1s. While for
Spanish and Italian, alveolar /r/ is common, the common realisation in French
and German is uvular /r/. The Spanish speakers attained higher intelligibility
scores for an alveolar realised /r/ produced by Spanish speakers than by
Italians. The same held true for Italian listener groups. Equally, the French
18(1) 81
benefited slightly when uvular /r/ was produced by French speakers compared
to when produced by German speakers. This was non-beneficial only to the
German-speaking group when hearing uvular /r/ produced by other German
speakers. They did attain higher scores for uvular /r/ produced by the French
speakers (cf. Table 5). This advantage was, however, merely a tendency and
would need to be investigated further in order to be able to draw more
meaningful conclusions.
Feature
Spanish German
recognised Italian listeners French listeners
listeners listeners
according to L1
alv. /r/
Spanish
speakers 75.0% 65.0%
alv. /r/
Italian speakers 66.7% 75.0%
uvl /r/ French
speakers 75.0% 45.0%
uvl /r/ German
speakers 68.8% 37.5%
5. Discussion
The analysis above shows that the results regarding aspiration and realisations
of the interdental fricative conform to Jenkins‟ (2000) observations. There
was a strong tendency that plosives with a longer VOT (40-60ms) were
recognised more easily than those with a shorter VOT (0-20ms). Regarding
the realisations of the phonemes [θ]/[ð], the realisations /s z/ gained lower
scores for both the lenis and the fortis variant. Additionally, there was a
tendency in the L1-German and L1-French groups towards displaying a slight
advantage in recognising the interdental fricative when realised as an alveolar
fricative. This might indicate an advantage of the familiarity with commonly
used substitutions of the interdental fricative used by EFL-learners of German
and French (who commonly substitute the interdental fricative with an
alveolar fricative).
Regarding different realisations of /r/, the results differ from Jenkins‟
assumption that rhotic pronunciation aids intelligibility more than non-rhotic.
All listener groups recognised the words which contained a non-rhotic
realisation of /r/ more often than those which contained a rhotic pronunciation
of /r/. Additionally, it needs to be mentioned that all words with non-rhotic
realisation were pronounced with standard pronunciation. Therefore, it is not
clear if the results point towards a high intelligibility of non-rhotic /r/, as
standard pronunciation was generally highly intelligible. Moreover, the
18(1) 83
considerable difference in preference for non-rhotic pronunciation was, apart
from for the German-speaking group, only visible for the correct recognition
of the words, not the feature. Moreover, there was a tendency showing that
the uvular realisation of /r/ reached lower intelligibility rates with all listener
groups but the French. This could be an indication of the fact that uvular /r/ is
less intelligible than other realisations of /r/. These results regarding rhotic
and non-rhotic variants as well as different places of articulation of /r/ add to
Jenkins‟ findings and point towards an interesting tendency worth further
investigation.
Regarding the relevance of listener- and item-related aspects, the results
indicate that both could possibly influence intelligibility. However, the effect
of listener-related aspects seemed to be rather inconsistent. For listener-
related aspects, such as previous language experience, no correlation between
overall familiarity with other accents, which listeners had indicated in the
questionnaires, and intelligibility scores was found. The German-speaking
group, who were not exchange students, had stated less experience with other
accents but this had no visible effect on the intelligibility scores (cf. Table 6).
However, the familiarity of accents could not be measured objectively and
was subject to the participants‟ own estimations.
For the correctly identified features in the words, both an effect of item-
related as well as of listener-related aspects was observed. All listener groups
had low intelligibility scores for the interdental fricative produced as /s z/ and,
apart from the French, the uvular realisation of /r/. This low decoding of
particular realisations, regardless of the listeners‟ L1, points towards the
importance of item-related factors for intelligibility.
It would appear that familiarity with particular substitutions of the foreign
languages sometimes, but not consistently, eases intelligibility. In the case of
uvular /r/, which is common in French and German, the French-speaking
group apparently benefited from familiarity with this realisation. However,
for the German-speaking group this was not advantageous. These results are
84 VIEWS
in line with the studies of Smith & Bisazza (1982), Major et al. (2002) and
Munro, Derwing & Morton (2006), in which it was observed that listening to
one‟s own accent was not consistently beneficial for all tested listener groups.
As to the effect of phrasal and syntactic co-text on intelligibility, the
results showed a considerably higher intelligibility for words in their co-text
than in isolation. This implies that the listeners can benefit from the syntactic
context given. This is not in line with e.g. Bond, Moore & Gable (1996) and
Jenkins (2000) who propose that L2 listeners are primarily dependent on the
acoustic signal and do not benefit much from knowledge of the context. At
least in a phrasal and syntactic co-text, this cannot be confirmed by this study
as the listeners often adapted their interpretations of the stimulus-words to the
environment. This indicates that the co-text might play a major role for NNL
in the interpretation of utterances.
Finally, it was found that standard pronunciation was relatively well
understood for all three tested aspects, i.e. plosives with common VOT-length
for English, the interdental fricative realised as [ð]/[θ] and /r/ realised as
alveolar or retroflex approximant. It is important to add that most other
realisations (apart from the ones mentioned) did not impair intelligibility to a
large degree. However, the experience of the listeners with NS-language and
context might have shaped these results. Whilst considering the limitations of
this dataset, two suggestions may be made for the teaching of English
phonetics and phonology for an ELF-speaking target group: Firstly, for the
aspects tested in this study, the standard pronunciation, as it has been largely
taught, is a variety of English which is intelligible to speakers of different first
languages. Secondly, other variants of [ð]/[θ] and /r/ can be tolerated and
possibly even taught, especially if this eases teachability and learnablity.
6. Conclusion
Although no definite conclusions can be drawn, due to the framework of this
study with its aforementioned limitations, some clear tendencies have been
illustrated. Firstly, Jenkins‟ observations about two of the three features,
namely aspiration and different realisations of the interdental fricative, could
be confirmed with regards to mutual intelligibility in ELF when tested with
the dictation method. For the third aspect, realisations of /r/, some tendencies
could be shown which might be interesting for future research. In order to be
able to draw further conclusions for ELF-research and language teaching, the
research would need to be extended to a larger dataset, a larger number of
participants with a greater variety of first languages and elements on the
segmental and suprasegmental levels, which were not considered in this
18(1) 85
study. Additionally, a comprehensive investigation according to sub-
categorisation into syllable positions of the phonemes and for realisations of
/r/ according to manner of articulation would be necessary. This could also be
extended to investigate the role of aspects such as vowel quality and quantity.
As ELF plays an increasingly important role around the world, there is a
need for a greater number of studies on intelligibility in a NNS-NNL context.
These will also largely contribute to answering other psycholinguistic
questions, such as the differences in L1 and L2 speech perception, e.g. the
degree to which top-down and bottom-up processes are involved in both.
86 VIEWS
Appendix
Text
Please call Stella. Ask her to bring these things with her from the store: Six
spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack
for her brother Bob. We also need a small plastic snake and a big toy frog for
the kids. She can scoop these things into three red bags, and we will go meet
her Wednesday at the train station.
References
Anderson-Hsieh, Janet; Johnson, Ruth; Koehler, Kenneth. 1992. “The relationship between
native speaker judgements of nonnative pronunciation and deviance in segmentals,
prosody and syllable structure”. Language Learning 42, 529-555.
Bent, Tessa; Bradlow, Ann. 2003. “The interlanguage speech intelligibility benefit”.
Journal of the Acoustical Society of America 114, 1600-1610.
Bond, Z. S.; Moore, Thomas J.; Gable, Beverly. 1996. “Listening in a second language”.
Proceedings of the forth international conference on spoken language 4, 2510-2513.
www.asel.udel.edu/icslp/cdrom/vol4/038/a038.pdf (26 April 2007).
Bradlow, Ann R.; Pisoni, David M. 1999. “Recognition of spoken words by native and
non-native listeners: Talker-, listener-, and item-related factors”. Journal of the Acoustical
Society of America 106, 2074-2085.
Brodkey, Dean. 1972. “Dictation as a measure of mutual intelligibility: A pilot study”.
Language Learning 22, 203-220.
Burda, Angela N.; Scherz Julie A.; Hagemann, Carlin F.; Edwards, Harold T. 2003. “Age
and understanding of speakers with Spanish or Taiwanese accents”. Perceptual and Motor
Skills 97, 11-20.
Carter, Ronald. 1998. “Orders of reality: CANCODE, communication, and culture“. ELT
Journal 52, 43-56.
Cho, Taehong; Ladefoged, Peter. 1999. “Variations and universals in VOT: evidence from
18 languages”. Journal of Phonetics 27, 207-229.
Crystal, David. 2000. “The future of English”. In Lynch, Donal; Pilbeam, Adrian (eds.).
Heritage and Progress. Proceedings of the SIETAR Europa Congress 1998. Bath: LTS
Training and Consulting, 6-16.
Derwing, Tracey M.; Munro, Murray J. 1997. “Accent, intelligibility and comprehensibility
– evidence from four L1s”. Studies in Second Language Acquisition 19, 1-16.
Docherty, Gerald J. 1992. The timing of voicing in British English obstruents. (Netherlands
Phonetic Archives). Berlin/New York: Foris Publications.
Einstein, Miriam; Verdi, Gail. 1985. “The intelligibility of social dialects for working class
adult learners of English”. Language Learning 35, 287-298.
Fayer, Joan M.; Krasinksy, Emily. 1987. “Native and non-native judgments of
intelligibility and irritation“. Language Learning 37, 313-326.
Field, John. 2005. “Intelligibility and the listener: The role of lexical stress”. TESOL
quarterly 39, 399-423.
Firth, Alan. 1996. “The discursive accomplishment of normality. On „lingua franca‟
English and conversation analysis”. Journal of Pragmatics 26, 237-259.
Flege, James E. 1988. “Factors affecting degree of perceived foreign accent in English
sentences”. Journal of the Acoustical Society of America 84, 70–79.
Florentine, Mary. 1985. “Non-native listeners‟ perception of American English in noise”.
Proceedings of Inter-noise 85, 1021-1024.
Foulkes, Paul; Docherty, Gerry. 2001. “Variation and change in British English /r/“. In
Van de Velde, Hans; van Hout, Roeland (eds.). ‘r-atics: Sociolinguistic, phonetic and
phonological characteristics of /r/. (Etudes & Travaux 4). Brussels: Free University of
Brussels, 27-44.
88 VIEWS
Hazan, Valerie L.; Markham, Duncan. 2004. “Acoustic-phonetic correlates of talker
intelligibility for adults and children”. Journal of the Acoustical Society of America 116,
3108-3118. http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=JAS
MAN000116000005003108000001&idtype=cvips&prog=normal (18 May 2007).
Hirschfeld, Ursula. 1994. Untersuchungen zur phonetischen Verständlichkeit
Deutschlernender. Frankfurt am Main: Forum Phoneticum.
Jenkins, Jennifer. 2000. The Phonology of English as an international language: new
models, new norms, new goals. Oxford: Oxford University Press.
Jenkins, Jennifer. 2002. “A sociolinguistically based, empirically researched pronunciation
syllabus for English as an international language”. Applied Linguistics 23, 83-103.
Jenkins, Jennifer. 2003. World Englishes. A research book for students. London: Routledge.
Khattab, Ghada. 2000. “VOT Production in English and Arabic bilingual and monolingual
children“. Leeds working papers in linguistics 8, 95-122.
http://www.leeds.ac.uk/linguistics/WPL/WP2000/Khattab.pdf (25 May 2009).
Ladefoged, Peter; Maddieson, Ian. 1996. The sounds of the world’s languages. Oxford:
Blackwell.
Ladefoged, Peter. 2005. Vowels and consonants: an introduction to the sounds of
languages. (2nd edition). Malden: Blackwell.
Magen, Harriet. S. 1998. “The perception of foreign-accented speech“. Journal of
Phonetics 26, 381-400.
Major, Roy. C., Fitzmaurice, Susan F.; Bunta, Ferenc; Balasubramanian, Chandricka.
2002. “The effects of nonnative accents on listening comprehension: implications for ESL
assessment”. TESOL Quarterly 36, 173-190.
Meierkord, C. 1996. Englisch als Medium der interkulturellen Kommunikation.
Untersuchungen zum non-native/non-native speaker Diskurs. Frankfurt am Main: Lang.
Munro, Murray J. 1998. “The effects of noise on the intelligibility of foreign-accented
Speech”. Studies in Second Language Acquisition 20, 139-154.
Munro, Murray J.; Derwing, Tracey M. 1995a. “Foreign accent, comprehensibility, and
intelligibility in the speech of second language learners“. Language Learning 45, 73-97.
Munro Murray J.; Derwing, Tracey M. 1995b. “Processing time, accent, and
comprehensibility in the perception of native and foreign-accented speech”. Language and
Speech 38, 289-306.
Munro, Murray J.; Derwing Tracey M.; Morton, Susan L. 2006. “The mutual intelligibility
of L2 speech“. Studies in Second Language Acquisition 28, 111-131.
Niedzielski, Nancy A.; Preston, Dennis R. 2000. Folk linguistics. Berlin [u.a.]: Mouton de
Gruyter.
Osimk, Ruth. 2007. Aspiration, [θ]/[ð] und /r/ in Englisch als Lingua Franca – eine
psycholinguistische Studie zu drei Vorschlägen des Lingua Franca Core. Unpublished MA
thesis, University of Vienna.
Rajadurai, Joanne. 2007. “Intelligiblity studies: a consideration of empirical and
ideological issues”. World Englishes 26, 87-98.
Seidlhofer, Barbara. 2001. “Closing a conceptual gap: the case for a description of English
as a lingua franca”. International Journal of Applied Linguistics 11, 133-158.
Seidlhofer, Barbara. 2004. “Research perspectives of teaching English as a Lingua
Franca”. Annual Review of Applied Linguistics 24, 209-239.
18(1) 89
Smith, Larry E.; Bisazza, John A. 1982. “The comprehensibility of three varieties of
English for college students in seven countries”. Language Learning 32, 259-269.
van Wijngaarden, Sander J., Steeneken, Herman J. M.; Houtgast, Tammo. 2002.
“Quantifying the intelligibility of speech in noise for non-native listeners”. Journal of the
Acoustical Society of America 111, 1906–1916.
Widdowson, Henry G. 2004. Text, Context, Pretext: Critical Issues in Discourse Analysis.
Oxford: Blackwell Publishing.
Yao, Yao. 2007. “Closure duration and VOT of word-initial voiceless plosives in English
in spontaneous speech“. UC Berkeley Phonology Lab Annual Report, 183-225.
Zielinski, B. 2004. “Measurement of speech intelligibility: What are we actually
measuring?” Paper presented at the annual meeting of the American Association for
Applied Linguistics, Portland, OR.
Online Resources:
Weinberger, Steven. H. “Speech Accent Archive” http://accent.gmu.edu/ (2. January 2007).
90 VIEWS
18(1) 91
92 VIEWS
How to contact us:
c/o
Institut für Anglistik & Amerikanistik der Universität Wien
Universitätscampus AAKH, Spitalgasse 2-4, Hof 8 .3
A – 1090 Vienna; Austria
fax (intern.) 43 1 4277 9424
eMail [email protected]
[email protected]
W3 http://anglistik.univie.ac.at/views/
(all issues available online)
IMPRESSUM:
EIGENTÜMER, HERAUSGEBER & VERLEGER: VIEWS, C/O INSTITUT FÜR
ANGLISTIK & AMERIKANISTIK DER UNIVERSITÄT WIEN, UNIVERSITÄTSCAMPUS
AAKH, SPITALGASSE 2, A - 1090 WIEN, AUSTRIA. FÜR DEN INHALT
VERANTWORTLICH: THERESA-SUSANNA ILLES, MARIE-LUISE PITZL
WEBMASTER: MONIKA WITTMANN REDAKTION: HEIKE BÖHRINGER,
ANGELIKA BREITENEDER, CHRISTIANE DALTON-PUFFER, OLGA FISCHER,
CORNELIA HÜLMBAUER, JULIA HÜTTNER, THERESA-SUSANNA ILLES, BRYAN
JENNER, GUNTHER KALTENBÖCK, THERESA KLIMPFINGER, URSULA LUTZKY,
BARBARA MEHLMAUER-LARCHER, MARIE-LUISE PITZL, ANGELIKA RIEDER-
BÜNEMANN, NIKOLAUS RITT, HERBERT SCHENDL, BARBARA SCHIFTNER,
BARBARA SEIDLHOFER, UTE SMIT, LOTTE SOMMERER, BARBARA SOUKUP,
JOHANN UNGER, H.G. WIDDOWSON. ALLE: C/O INSTITUT FÜR ANGLISTIK &
AMERIKANISTIK DER UNIVERSITÄT WIEN, UNIVERSITÄTSCAMPUS AAKH,
SPITALGASSE 2, A - 1090 WIEN. HERSTELLUNG: VIEWS