Topic and Focus
Topic and Focus
VOLUME 82
Managing Editors
GENNARO CHIERCHIA, University of Milan
KAI VON FINTEL, M.I.T., Cambridge
F. JEFFREY PELLETIER, Simon Fraser University
Editorial Board
JOHAN VAN BENTHAM, University of Amsterdam
GREGORY N. CARLSON, University of Rochester
DAVID DOWTY, Ohio State University, Columbus
,
GERALD GAZDAR University of Sussex, Brighton
IRENE HEIM, M.I.T., Cambridge
EWAN KLEIN, University of Edinburgh
BILL LADUSAW, University of California, Santa Cruz
TERRENCE PARSONS , University of California, Irvine
The titles published in this series are listed at the end of this volume.
TOPIC AND FOCUS
CROSS-LINGUISTIC PERSPECTIVES ON MEANING
AND INTONATION
edited by
CHUNGMIN LEE
Seoul National University
Seoul, Republic of Korea
MATTHEW GORDON
University of California
Santa Barbara, CA, USA
and
..
DANIEL BURING
University of California
Los Angeles, CA, USA
A C.I.P. Catalogue record for this book is available from the Library of Congress.
Published by Springer,
P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
www.springer.com
Preface vii
Gorka Elordieta 1
Constraints on Intonational Prominence of Focalized Constituents
Ardis Eschenberg 23
Polish Narrow Focus Constructions
David Gil 41
Intonation and Thematic Roles in Riau Indonesian
Matthew Gordon 69
The Intonational Realization of Contrastive Focus in Chickasaw
Carlos Gussenhoven 83
Types of Focus in English
Nancy Hedberg and Juan M. Sosa 101
The Prosody of Topic and Focus in Spontaneous English Dialogue
Emiel Krahmer and Marc Swerts 121
Perceiving Focus
Manfred Krifka 139
The Semantics of Questions and the Focusation of Answers
Chungmin Lee 151
Contrastive (Predicate) Topic, Intonation, and Scalar Meanings
Kimiko Nakanishi 177
Prosody and Scope Interpretations of the Topic Marker ‘wa’ in Japanese
Ho-Hsien Pan 195
Focus and Taiwanese Unchecked Tones
Elisabeth Selkirk 215
Bengali Intonation Revisited: An Optimality Theoretic Analysis in which
FOCUS Stress Prominence Drives FOCUS Phrasing
Mark Steedman 245
Information-Structural Semantics for English Intonation
Klaus von Heusinger 265
Discourse Structure and Intonational Phrasing
PREFACE
these results, Gordon suggests that focus may be marked phonetically even in a
language in which focus has an overt morphological realization.
Carlos Gussenhoven’s work provides an overview of how various types of focus
are expressed syntactically and prosodically. Basing his classification on data from
several languages, Gussenhoven suggests that focus may differ along a number of
pragmatically conditioned dimensions. He finds that different categories of focus are
expressed through different intonational contours, with identificational focus
seeming to occupy a special status in its reliance on morphological as opposed to
prosodic cues.
Nancy Hedberg and Juan Sosa investigate the evidence for a prosodic distinction
between topic accents and focus accents in their paper. In an analysis of naturally
occurring English speech, they do not find any differences in pitch accent type
pointing to separate categories of topic and focus accent. On the other hand, they
find extensive marking of information structure categories with high pitch accents.
In their paper, Emiel Krahmer and Marc Swerts discuss a dialogue reconstructing
experiment designed to examine the role of pitch accents in perceiving focus in two
languages, Dutch and Italian, differing in the importance of pitch accents as a marker
of focus. Krahmer and Swerts find that Dutch listeners rely more on pitch accent
cues to reconstruct focus than Italian listeners, in keeping with the greater role of
pitch in signalling focus in Dutch. Results of an audiovisual experiment employing
talking heads suggest that visual cues can also play a role in the perception of focus,
though primarily when pitch cues are indecisive.
Manfred Krifka’s paper explores the proper semantic treatment of focus patterns
in response to constituent questions. He finds that neither the framework of
Alternative Semantics nor a theory that works with givenness rather than semantic
focus as a basic concept offers an adequate analysis of focus arising in answers to
questions. On the other hand, Krifka argues that the theory of Structured Meaning
provides a superior account of this type of focus.
In his paper, Chungmin Lee characterizes Contrastive Topic and Contrastive
Predicate Topic, particularly in connection with their ‘conventional’ scalar implicatures.
He distinguishes a typical kind that evokes a ‘conventional’ implicature from list
contrastive topics, which lack any implicature. The Contrastive Topic marker in
Korean gets a high tone responsible for focality, analogously to the fall-rise contour
in English. Lee’s paper explores the scalar meaning of type-subtype scalarity and
subtype, arguing for the inherent tendency of subtype scalarity even in entities. It
also explores scope relations between scope bearers and Contrastive Topic and CT’s
narrow-scope nature. The apparent non-narrow-scope of CT is claimed to be a
topicalization effect. Predicates are claimed to be inherently subtype-scalar when
CT-marked just like numerals and quantifiers. In conclusion, the uttered part is a
concessive admission with the intent of conveying a forceful implicature in the
unuttered part.
In her paper, Kimiko Nakanishi examines the prosodic and semantic properties
associated with the Japanese topic marker wa. She shows that the two pragmatic
functions of wa, as a marker of theme and contrast, are distinguished prosodically.
She further claims that the theme vs. contrast distinction is accounted for by an
PREFACE ix
Chungmin Lee
Matthew Gordon
Daniel Büring
January 2006
GORKA ELORDIETA
1. INTRODUCTION
2. BACKGROUND
It is well known that languages differ in the overt cues they use to make the hearer
identify clearly the focalized constituent. On the one hand, there are languages
which signal focalized elements intonationally, without overt syntactic or
morphological cues. These are languages of the so-called English type, in which
focalized elements receive main prosodic prominence in-situ, with no movement
from their base position.1
Other Germanic languages such as Dutch and German also have this strategy of
English for signaling narrow focus. However, in some cases these languages may
also resort to syntactic movement operations to cue focus. When the verb is the
focus of the sentence and a definite object is used, scrambling of the object may take
place so that the verb is interpreted as narrow focus (Reinhart and Neeleman 1998).
The verb receives main prosodic prominence by virtue of being in clause-final
position.2
1
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 1–22.
© 2007 Springer.
2 GORKA ELORDIETA
Some languages display two kinds of strategies for narrow focus manifestation:
one in which words are assigned main prominence in their base-generated syntactic
position (a strategy of the English-type), and another one in which syntactic
displacement operations are produced such that these words or constituents end up
occupying a syntactically specified position for narrow focus, by means of
scrambling or fronting, or some other means. Unlike Dutch or German, the latter
option is available for all constituents and is not subject to definiteness constraints,
and perhaps most importantly, in these languages focalized words which are
syntactically displaced are also assigned main prosodic prominence in the sentence
(cf. Bolinger 1954, Ladd 1980, Culicover and Rochemont 1983, Vallduví 1990,
Cinque 1993, Reinhart 1995, Selkirk 1995, Zubizarreta 1998, Frota 1998, among
others). Spanish and Italian constitute examples of this type of languages. But in
Spanish focalized words can also occur in non-in-situ positions, such as sentence-
end position or also a fronted position, in both cases accompanied by main sentence
stress (cf. Bolinger 1972, Contreras 1978, 1980, Uriagereka 1995, Zubizarreta 1998
among others, for discussion of the different options).
Then, there are languages which signal focus morphologically, by the addition
of a suffix, a prefix or some other overt marker that indicates focalization. This
strategy can be combined with syntactic displacement, intonational marking, or a
combination of both. In Wolof, for instance, a so-called ‘emphatic marker’ inserted
before the verb indicates which constituent is being focalized, whether it is the
subject, a complement, or the verb (cf. Rialland and Robert 2001). Narrow focus is
also cued syntactically in Wolof, as focalized constituents have to appear in
sentence-initial position.3 It is important to point out that in this language no
intonational prominence or phrasing effects are manifested on the focalized element.
English Creoles could be similar to Wolof in this respect, as focus is marked
morphologically and also syntactically, by fronting the focalized constituent, and
prosodic marking may be absent (cf. Bickerton 1993).
In Japanese, morphological marking combines with prosodic marking to
indicate focus: prosodic prominence and phrasing effects leave clear which
constituent(s) have to be interpreted as narrow focus, and the focus particle -ga
follows a focalized subject (cf. Pierrehumbert and Beckman 1988, Haraguchi 1991,
Kubozono 1993, among others).
In other languages, focus is both syntactically and intonationally identified.
That is, narrowly focalized elements not only receive main prosodic prominence
and/or are accompanied by intonational phrasing boundaries, but they also occupy a
syntactic position structurally defined for focalized expressions, be it Spec-CP,
Spec-FocusP, the most embedded position in the sentence, the position immediately
preceding the verb, or some other position. Hungarian, Turkish, Quechua, Basque
and Hausa are examples of this type of language (cf. among others Horvath 1986,
Kiss 1995, 1998 for Hungarian; Vogel and Kenesei 1987, 1990 for Turkish; Ortiz de
Urbina 1989, 1999, Hualde et al. 1994, Elordieta 2001, Arregi 2001, Etxepare and
Ortiz de Urbina 2003 for Basque; Inkelas and Leben 1990 for Hausa). The following
paradigm from Basque illustrates this pattern in which focalized elements must
appear immediately preceding the verb. Thus, examples (1e-g) are ill-formed
CONSTRAINTS ON INTONATIONAL PROMINENCE 3
because they contain focalized constituents which are either postverbal or not
immediately preverbal. Sentence (1a) represents a neutral declarative sentence, and
4
the rest are sentences with focalized constituents (capitalized):
(1) a. Jonek Mireni liburua eman dio
John-erg Miren-dat book-abs give aux
‘John has given the book to Miren’
These examples show that, although Basque is a language with flexible word
order, there is a syntactic constraint in this language on the relative word order
between focus constituents and the verb, namely that they must be left-adjacent to it
(cf. the references mentioned in the previous paragraph for details on syntactic
analyses that could explain this constraint).5 But apart from this syntactic restriction,
in Basque the focalized expression receives main prominence in the sentence, that is,
focus is cued both syntactically and intonationally.
Serbo-Croatian offers a particularly rich case in focus marking possibilities
(Godjevac 2000, Frota 2002). Like in English, prosodic phrasing and prominence
with canonical word order serves to cue narrow focus. Another strategy to signal
narrow focus is to produce a marked word order by scrambling operations, assigning
at the same time prosodic phrasing and prominence cues to the constituent that is
focalized (i.e., the Hungarian-Basque type). Finally, it is also possible to mark
narrow focus by scrambling operations under a neutral intonation, leaving the
focalized constituent in sentence-final position, so that it receives default sentence
4 GORKA ELORDIETA
stress (like in Dutch or German for verb focus). Thus, three different strategies or
options are available in Serbo-Croatian to signal narrow focus.
The different possibilities for signaling narrow focus discussed above might not
constitute an exhaustive typology, although they might suffice for expository
purposes. The table in (2) summarizes this typology of the different possibilities for
signaling focus by means of syntax, morphology or prosody, or a combination of
more than one of these strategies. A few representative languages are also included.
Slots with a ‘?’ are those that to my knowledge do not have representatives.
(2)
Strategy for focus marking Sample languages
(a) Prosody alone - Only strategy: English, European Portuguese
- One of the strategies: Dutch, German, Spanish,
Italian
(b) Morphology alone ?
(c) Syntactic displacement - Only strategy: ?
alone - One of the strategies: Serbo-Croatian, Dutch,
German
(d) Prosody and morphology - Only strategy: Japanese
- One of the strategies: ?
(e) Morphology and syntactic - Only strategy: Wolof
displacement - One of the strategies: ?
(f) Prosody and syntactic - Only strategy: Hungarian, Basque, Turkish
displacement - One of the strategies: Serbo-Croatian, Spanish,
Italian
(g) Prosody, syntactic ?
displacement and morphology
Despite all these possibilities of marking focus, I will show that in pitch-
accent dialects of Basque (i.e., Northern Bizkaian Basque, NBB) there are cases in
which words which constitute the narrow focus of the utterance are not singled out
by syntactic, morphological or intonational means. In these dialects, intonational
highlighting of narrow focus is restricted to words which bear a lexical or derived
accent, or for some speakers, to words that constitute Accentual Phrases (APs) by
themselves. That is, not any independent word can bear intonational prominence
even though it may be the pragmatic focus of the utterance. I discuss these cases
in the following section.
lexical distinction between unaccented and accented roots, stems and affixes, like in
Japanese (cf. Poser 1984, Pierrehumbert and Beckman 1988, Haraguchi 1991,
Kubozono 1993 among others for details on Japanese tone and intonation structure).
An accented root or affix is sufficient to render an accented word, which surfaces
with prominence on a non-final syllable in all contexts. In most NBB varieties, the
syllable preceding the leftmost accented morpheme surfaces with main prominence,
as illustrated in (3) below for the Gernika variety (accented morphemes are indicated
by an apostrophe). In a few varieties, it is always the penultimate syllable that is
accented (as in the Lekeitio variety, cf. Hualde et al. 1994, Hualde 1997, 1999,
Elordieta 1997, 1998):
As explained in section 2 above and illustrated in (1), in NBB only words contained
in an immediately preverbal syntactic constituent can be focalized. The focalized
word does not have to immediately precede the verb, but it has to be contained in a
syntactic constituent that is immediately preceding the verb. Thus, in the following
examples, (6b) is grammatical, as well as (6a). (6c) is ungrammatical, however, as
the syntactic constituent it is contained in is not immediately preverbal (syntactic
constituents are separated by square brackets):
However, in cases of utterances where one of the words constitutes the narrow
focus of the utterance, even if that word is contained in the immediately preverbal
constituent, there is a further constraint it must obey in order to be intonationally
singled out. In the variety of NBB I have investigated, LB, a focalized word can
be the most prominent intonationally if it has a lexical pitch accent (i.e., if it is a
lexically accented word) or if it has a derived accent (i.e., it is an unaccented word
immediately preceding the verb). Let us illustrate this constraint with sentence (7)
(repeated from (5b)), containing only one preverbal constituent with two accented
words, amúmen ‘grandmother’s’ and liburúak ‘books’. The intonational structure
corresponding to this constituent is thus the following:
That is, in the immediately preverbal syntactic constituent there are two APs,
each of them containing one accented word. Let us now describe the main patterns
observed in contexts of narrow focus, that is, in cases in which the focalized word
replaces the variable introduced by a wh-word in a previous question. The two
words in (7) would become the narrow focus of an utterance if they formed part of a
response to the questions in (8a,b), respectively:
Since amúmen and liburúak have lexical H*+L pitch accents, they can be
pronounced standing out as the most prominent words in the utterance. An interest-
ing aspect worth mentioning is that in narrow focus cases in which the first word
is focalized the pronunciation of such utterances is not usually distinguished from
cases of broad focus. That is, the first word will not necessarily show a boosted pitch
level and/or a following decreased pitch level. In the data I have analyzed from
five female speakers of LB, only one speaker produced some utterances in which the
10 GORKA ELORDIETA
first word was pronounced with a higher pitch followed by a lower level on the
following word. This might be due to the fact that in broad focus cases the
difference in pitch between the first peak and the following peaks is already quite
big (cf. Fig. 2). However, when the second word is focalized, there are more
instances in which the word is made more prominent intonationally and perceptually
distinguishable from broad focus cases. The focalized word may present a higher
pitch level (although the peak is still lower than the first peak, due to downstep),
followed by a decreased pitch level. Quite often there may also be a displacement of
the peak of the first word to the posttonic syllable. This strategy signals old
information or topic status for that word.8 For sentence (9), which would be an
answer to (8b), Figure 3 illustrates a case without peak delay at the end of the
preceding word, and Figure 4 illustrates a case with peak displacement, indicated in
the tone tier with a ‘>’ sign:
If the first word were the narrow focus of the sentence, most commonly it
would not receive more prominence than in broad focus cases. If the second word
were the narrow focus, however, it would be made more prominent by presenting a
higher pitch level than in broad focus cases, accompanied or not by peak delay in the
first word (interestingly, when there is peak delay in the previous word a bigger
pitch level on the focalized word is not necessary). An example with peak delay in
the first word is illustrated below in Figure 5, corresponding to (11). As described
above, however, this pattern is not obligatory, and it is also quite normal to find
cases which are intonationally very similar to broad focus utterances.9
As for the second word in sentences such as (12)-(13), we do not find a uniform
pattern across speakers. However, such interspeaker variation reveals important
facts about constraints on the intonational realization of main prominence in
contexts of narrow focus. For two of the five speakers recorded, the second words in
those cases would be able to receive main prominence if they were the narrow focus
of the utterance, as in (14b), responding to a question such as (14a). An observed
strategy in these cases is a continuation rise at the end of the preceding word,
signaling old or known information. This rise cannot be due to an accent in the first
word, so it must be due to H- (cf. Fig. 7). Another possibility is to have a sustained
pitch at the end of the preceding word followed by a rise in pitch level on the
focused word (other non-intonational features such as higher intensity may also
be present). In both cases, a decrease in pitch level follows the focalized word. The
same pattern is observed in cases in which the second word is lexically accented, as
in (15):
Importantly, three of our five speakers did not produce utterances like (14b), or
could not pronounce the second word in (15) with main intonational prominence.
That is, they cannot highlight a word intonationally if it is preceded by an
unaccented word. For these speakers, not only the leftmost word but also the second
word cannot be prosodically highlighted. Regardless of which word is the corrective
focus of the utterance, the whole AP (i.e., the two words) would have to be
pronounced together. The explanation for this pattern is that these speakers have a
stricter constraint on the intonational highlighting of focalized words. This
constraint states that only words which constitute APs by themselves can be made
intonationally prominent. In cases of two words with accent, such as the ones in (7)/
(10), each word constitutes its own AP, and can thus be singled out intonationally.
But in cases in which the first word is unaccented, the second word does not
constitute an AP by itself. Rather, it continues the AP that the first word started. As
the intonational schemas in (12)-(13) show, the unaccented word starts an AP, with
CONSTRAINTS ON INTONATIONAL PROMINENCE 15
the initial %L H- tone sequence, but since it does not have a pitch accent, the phrasal
H- tone spreads onto the next word, until the H*+L accent (lexical or derived) of the
following word ends the AP (cf. Jun and Elordieta 1997; Elordieta 1998). There is thus
only one AP before the verb, containing the two words. Since neither word forms an
independent AP, they cannot be made intonationally prominent on their own. The two
words have to be pronounced in the same pitch level, in the same AP. The contour
observed in these instances is similar to the one illustrated in Figure 6, which showed
the impossibility of having the leftmost word as the most prominent word in the
utterance. The important issue at work here is that no pitch accent is specially
inserted to the first unaccented word, even if it is the narrow focus of the sentence
from a pragmatic or information-structure point of view, as already mentioned
above. Hence no AP boundary can be inserted at the right edge of the first word.
That is, the lexical association of pitch accents is respected by focus in NBB.
Thus, a mismatch between semantics and intonation arises in cases where a
word which does not constitute an AP by itself is the corrective focus of an
utterance. No intonational cues are used within the utterance containing the
contrastively focalized word alone in order to convey the intended meaning. There is
no way to single out the focalized word syntactically, as the word occurs with other
words in the preverbal constituent. Disambiguation can only come from the
preceding linguistic context. This mismatch situation between semantics and
prosody does not arise in languages surrounding NBB (Spanish and French) or in
Indo-European languages. And an insufficiency of syntax and/or morphology to
mark focalized words is unattested in the languages for which there are descriptions
of focus realization, a summary of which was provided in section 2. Thus, this
property of NBB is interesting from a typological point of view as well.
The patterns of realization of intonational highlighting change slightly when
corrective focus is considered. Corrective focus refers to those instances in which
the speaker corrects one of the words or syntactic phrases that her interlocutor has
stated incorrectly. For instance:
In (16b) above, the first accented word Amáien can be made more prominent,
usually by having a boosted pitch level followed by a decreased pitch level in the
rest of the material in the sentence. Thus, in corrective focus the first word is
distinguishable from cases of broad focus, unlike in narrow non-corrective focus.
The second word in (16b) would also be made more prominent, by means of a
delayed peak in the preceding word, signaling the character of topic or old
information of that word. This type of contour is illustrated in Figure 8, for a
16 GORKA ELORDIETA
sentence such Es, Amáien ALABIÁ topa dot ‘No, I came across Amaia’s
DAUGHTER’. Another option is to have simply a higher pitch level on the
focalized word, without a preceding peak displacement. Quite often, the focalized
word is accompanied by higher intensity levels and longer duration.10 As already
described above, the same options would be available for sentences in which the
second word were lexically accented.
But the interesting cases are those in which the first word is unaccented,
forming an AP with the following word. As described above, in narrow non-
corrective focus some speakers could not intonationally highlight either of the two
words, due to a constraint that a word has to constitute an AP by itself in order to be
the most prominent word in the utterance, rather than simply having a pitch accent.
In corrective focus, however, these speakers can place main intonational prominence
in a word even if it does not constitute an AP by itself. The sufficient condition is
that the word has an accent, lexical or derived, like in narrow non-contrastive focus
for the other speakers. Words bearing an accent and following an unaccented word
may surface with main prominence, cued by a rise in pitch on the focalized word
coming from a sustained pitch of the unaccented word, or by a rise at the end of the
prefocal unaccented word. In both cases, usually the focalized word displays higher
intensity and duration (cf. Elordieta and Hualde 2001, 2003). It is important to
bear in mind, however, that this type of prosodic realization are scarce in the
production of the most restrictive speakers, that is, those for whom a word has to
constitute an AP by itself in order to stand out as the most prominent word.11 Figure
9 illustrates an F0 contour for a sentence such as (17b), in which the first option is
realized, and Figure 10 illustrates the second possibility, with a rise at the end of the
first word.
CONSTRAINTS ON INTONATIONAL PROMINENCE 17
a. Neither word can be highlighted; they a. Neither word can be highlighted; they
are uttered in the same AP are uttered in the same AP
b. Only the word with an accent can be b. Only the word with an accent can be
highlighted highlighted (more frequent than in non-
corrective focus)
H*L H*L
| |
AP[Unaccented–Unaccented] – Verb AP[Unaccented–Unaccented] – Verb
a. Neither word can be highlighted; they a. Neither word can be highlighted; they
are uttered in the same AP are uttered in the same AP
b. Only the word with an accent can be b. Only the word with an accent can be
highlighted highlighted (more frequent than in non-
corrective focus)
CONSTRAINTS ON INTONATIONAL PROMINENCE 19
In this paper I have described the main constraints on the realization of prosodic
prominence on focalized words in a pitch accent dialect of Basque. It has been
shown that the minimum condition a word has to satisfy to receive main prosodic
prominence if pragmatically focalized is that it has an accent, whether lexical or
derived. However, in cases of narrow non-corrective focus some speakers reveal the
existence of a more restrictive constraint, which demands that a word must
constitute an AP by itself in order to surface with main prominence. In corrective
focus the sufficient condition for the five speakers recorded is that a word has an
accent. In either case, the interesting fact is that an unaccented word which does not
have an accent cannot receive an accent even if it is pragmatically focalized. The
context seems to prevent possible ambiguities between neutral and narrow focus
readings of unaccented words without an accent. To my knowledge, these are
crosslinguistically unattested constraints, and in this regard NBB is different even
from a language like Tokyo Japanese, which also has a lexical distinction between
accented and unaccented words, but which allows any unaccented word to be proso-
dically highlighted (cf. Pierrehumbert and Beckman 1988).
NOTES
*
Many thanks are due to Matthew Gordon and José Ignacio Hualde for comments on earlier versions of
this article, as well as to Sónia Frota, Carlos Gussenhoven and Kiwa Ito for help with section 2. Of course,
this article would not have been made possible without my native informants, to whom I am indebted
immensely. This work was funded by research grants from the Department of Education, Universities and
Research of the Basque Government (PI-1998-127), the University of the Basque Country (UPV-HA-
8025/20 and 9/UPV 00033.130-13888/2001) and the Ministry of Science and Technology of Spain
(BFF2002-04238-C02-01/FEDER).
1
For the sake of expository purposes, we exclude cleft and pseudo-cleft sentences from the discussion, as
we will compare this type of language with another type of language that marks focus constituents
syntactically without clefting, by having focalized constituents occupy a certain syntactic position below
in the text. Thus, we want to distinguish languages which have a structural position for focus from
languages such as English that do not, although they may make use of cleft sentences to mark focus.
2
Scrambling is disfavored or does not apply with indefinite objects. In such cases, there is simply main
prosodic prominence on the verb.
3
However, when an object is focalized and there is a nonpronominal subject, the focalized object has to
follow the subject, which obligatorily appears thematized (i.e., topicalized, cf. Rialland and Robert
2001:897-898).
4
The following abbreviations will be used: abl = ablative, abs = absolutive, all = allative, aux = auxiliary,
dat = dative, erg = ergative, gen = genitive, ines = inessive, loc = locative, pl = plural, sg = singular.
5
It is possible for focalized constituents to appear after the verb, but they are usually uttered as separate
intermediate or intonational phrases. They are usually preceded by pauses, fillers such as e ‘err…/um…’,
or final lengthening of the verb ending in a rising intonation. It appears that copulas can be followed by
focalized constituents even without a pause (Hualde et al. 1994). In central and eastern dialects it is
possible to have focalized elements postverbally without a pause (cf. Hidalgo 1994, Elordieta 2003), apart
20 GORKA ELORDIETA
from the usual preverbal position, but the speakers I have consulted cannot have postverbal focus as an
answer to a wh-word. In that case preverbal focus is the only option. Perhaps only informational, non-
contrastive focus (Kiss 1998) presented by the speaker in her own discourse can appear postverbally in
these dialects, but more research is needed on this topic before making any generalizations.
6
Jun and Elordieta (1997) found that in APs up to four syllables long the peak of H- is reached on the
second syllable, and in APs more than fours syllables long it was reached on the third syllable. This H- is
not phonetically realized when the second syllable is associated to a pitch accent.
7
For some speakers, in sequences of four or more unaccented words certain dips in pitch can be observed
between two unaccented words. Jun and Elordieta (1997) and Elordieta (1998) take these to be AP-
boundaries, in the absence of H*+L pitch accents. However, the dips were difficult to perceive and were
much smaller than regular drops after H*+L pitch accents (see relevant pitch tracks in the mentioned
articles). Also, the factors conditioning these breaks were not very well established; desire for heaviness
reduction and slower rate of speech were suggested as factors involved in the insertion of these breaks,
but no systematic study was carried to prove these claims. Moreover, these facts were subject to speaker
dependence; some speakers always produce plateaus in sequences of four or more unaccented words,
without breaks. This issue deserves a more systematic study, which I plan to undertake in future
research.
8
The delayed peaks at the end of prefocal words were already observed for some speakers of LB by Ito
et al. (2003). However, their data involved cases of corrective focus, which we also discuss below. The
patterns presented in this paper show that it is possible to find such delayed peaks in non-corrective
narrow focus as well. Other strategies of main prominence that can be observed in these contexts and
which are not intonational in nature are higher intensity and duration on the focalized word.
9
Indeed, the speakers of LB on which Elordieta (2003) based his findings did not produce utterances in
which the second word was most prominent intonationally, and this lead to positing the absence of such a
possibility. That conclusion must now be corrected to capture the facts presented in this article.
10
Although the results in Elordieta and Hualde (2001, 2003) showed that lengthening applied to words
in corrective focus, it must be pointed out that in those utterances speakers were instructed to put special
emphasis on those words. In other recordings in which speakers were not told to put emphasis on the
correction, I have observed that lengthening did not occur significantly. It seems that a specific
experiment (left for future research) is needed to clarify the role of lengthening as a cue to corrective
focus.
11
Thus, highlighting words following an unaccented word without an accent is possible, but not frequent
in LB. Its frequency is speaker dependent, but as stated in note 10, the possibility of finding such patterns
has to be incorporated into the intonational grammar of LB, contra what was assumed in Elordieta (2003).
12
Interestingly, the two speakers that patterned differently from the other three speakers in contexts of
narrow non-corrective contexts in being able to highlight a word following an unaccented word also
patterned differently in other respects. For contexts in which the first unaccented word was correctively
focalized, they produced contours in which this word was prosodically set apart, by having a higher pitch
level followed by a fall in pitch for the following word, or by being pronounced with greater intensity and
duration. However, such cases were few in number, compared to the majority of cases in which the
unaccented word did not surface with main prominence, thus patterning with the other three speakers. At
this point I consider it premature to conclude that highlighting the unaccented word in these contexts is
a solid possibility in LB, and leave the issue open for further research based on data from more speakers
and based on more tokens of each type of context.
REFERENCES
Arregi, Karlos. “Focus and Word Order in Basque.” Manuscript, Massachusetts Institute of Technology,
2001.
Bickerton, Derek. “Subject Focus Pronouns.” In Francis Byrne and Donald Winford (eds.), Focus and
Grammatical Relations in Creole Languages, pp. 189-212. Amsterdam: John Benjamins, 1993.
Bolinger, Dwight. “English Prosodic Stress and Spanish Sentence Order.” Hispania 37 (1954): 152-156.
CONSTRAINTS ON INTONATIONAL PROMINENCE 21
Bolinger, Dwight. “Accent is Predictable (If You’re a Mind-reader).” Language 48 (1972): 633-644.
Cinque, Guglielmo. “A Null Theory of Phrase and Compound Stress.” Linguistic Inquiry 24 (1993):
239-297.
Contreras, Heles. El Orden de Palabras en Español. Madrid: Cátedra, 1978.
Contreras, Heles. “Sentential Stress, Word Order, and the Notion of Subject in Spanish.” In Linda Waugh
and C.H. van Schooneveld (eds.), The Melody of Language, pp. 45-53. Baltimore: University Park
Press, 1980.
Culicover, Peter, and Michael Rochemont. “Stress and Focus in English.” Language 59 (1983): 123-165.
Elordieta, Arantzazu. Verb Movement and Constituent Permutation in Basque. Utrecht: LOT, 2001.
Elordieta, Gorka. “Accent, Tone and Intonation in Lekeitio Basque.” In Fernando Martínez-Gil and
Alfonso Morales-Front (eds.), Issues in the Phonology and Morphology of the Iberian Languages,
pp. 4-78. Washington, DC: Georgetown University Press, 1997.
Elordieta, Gorka. “Intonation in a Pitch-Accent Dialect of Basque.” International Journal of Basque
Linguistics and Philology 32 (1998): 511-569.
Elordieta. Gorka. “Intonation.” In José I. Hualde and Jon Ortiz de Urbina (eds.), A Grammar of Basque,
pp. 72-113. Berlin: Mouton de Gruyter, 2003.
Elordieta, Gorka, and José I. Hualde. “The Role of Duration as a Correlate of Accent in Lekeitio Basque.”
In Proceedings of Eurospeech 2001 - Scandinavia, 105-108, 2001.
Elordieta, Gorka, and José I. Hualde. “Tonal and Durational Correlates of Accent in Contexts of
Downstep in Northern Bizkaian Basque.” Journal of the International Phonetic Association, 33
(2003): 195-209.
Etxepare, Ricardo and Jon Ortiz de Urbina. “Focalization”. In José I. Hualde and Jon Ortiz de Urbina
(eds.), A Grammar of Basque, pp. 459-515. Berlin: Mouton de Gruyter, 2003.
Frota, Sónia. Prosody and Focus in European Portuguese. University of Lisbon: Doctoral dissertation,
1998 [Published by Garland in 2000].
Frota, Sónia. Review of Intonation, Word Order and Focus Projection in Serbo-Croatian (Godjevac
(2000). Glot International 6 (2002): 251-256.
Godjevac, Svetlana. Intonation, Word Order and Focus Projection in Serbo-Croatian. Doctoral
Dissertation, Ohio State University, 2000.
Haraguchi, Shosuke. A Theory of Stress and Accent. Dordrecht: Foris, 1991.
Hidalgo, Bittor. Hitz Ordenaren Estatistikak Euskaraz. Doctoral dissertation, University of the Basque
Country, 1994.
Horvath, Julia. Focus in the Theory of Grammar and the Syntax of Hungarian. Dordrecht: Foris, 1986.
Hualde, José I. Euskararen Azentuerak. Bilbao: Servicio Editorial de la Universidad del País Vasco,
1997.
Hualde, José I. “Basque Accentuation.” In Harry van der Hulst (ed.), Word Prosodic Systems in the
Languages of Europe, pp. 947-993. Berlin: Mouton de Gruyter, 1999.
Hualde, José I. “On System-Driven Sound Change: Accent Shift in Markina Basque.” Lingua 110 (2000):
99-129.
Hualde, José I., Gorka Elordieta and Arantzazu Elordieta. The Basque Dialect of Lekeitio. Bilbao and San
Sebastián: Servicio Editorial de la Universidad del País Vasco, 1994.
Hualde, José I., Gorka Elordieta, Iñaki Gaminde and Rajka Smiljanic. “From Pitch-Accent to Stress-
Accent in Basque.” In Carlos Gussenhoven and Natasha Warner (eds.), Papers in Laboratory
Phonology VII, pp. 557-584. Berlin: Mouton de Gruyter, 2002.
Inkelas, Sharon, and William Leben. “Where Phonology and Phonetics Intersect: The case of Hausa
Intonation.” In John Kingston and Mary Beckman (eds.), Papers in Laboratory Phonology I, pp.
17-34. Cambridge: Cambridge University Press, 1990.
Ito, Kiwako, Gorka Elordieta, and José I. Hualde. “Peak alignment and intonational change in Basque.”
Proceedings of the 15 th International Congress of Phonetic Sciences. Barcelona. Spain, pp. 2929-2932.
Barcelona, 2003.
Jun, Sun-Ah, and Gorka Elordieta. “Intonational Structure of Lekeitio Basque.” In Antonis Botinis,
Georgios Kouroupetroglou and George Carayiannis (eds., Intonation: Theory, Models and
Applications, pp. 193-196. Proceedings of an ESCA Workshop. Athens, Greece, 1997.
Kiss, Katalin É. “Introduction.” In Katalin É. Kiss (ed.), Discourse Configurational Languages, pp. 3-27.
New York, Oxford: Oxford University Press, 1995.
22 GORKA ELORDIETA
Kiss, Katalin É. “Identificational Focus Versus Information Focus.” Language 74 (1998): 245-273.
Kubozono, Haruo. The Organization of Japanese Prosody. Tokyo: Kurosio, 1993.
Ladd, Robert D. The Structure of Intonational Meaning: Evidence from English. Bloomington, Indiana:
Indiana University Linguistics Club, 1980.
Ortiz de Urbina, Jon. Parameters in the Grammar of Basque. Dordrecht: Foris, 1989.
Ortiz de Urbina, Jon. “Focus in Basque.” In Georges Rebuschi and Laurice Tuller (eds.), The Grammar of
Focus, pp. 311-333. Amsterdam and Philadelphia: John Benjamins, 1999.
Pierrehumbert, Janet, and Mary Beckman. Japanese Tone Structure. Cambridge, Mass.: MIT Press, 1988.
Poser, William. The Phonetics and Phonology of Tone and Intonation in Japanese. Doctoral Dissertation,
MIT, 1984.
Reinhart, Tanya. “Interface Strategies.” Manuscript, Utrecht University, 1995.
Reinhart, Tanya, and Ad Neeleman. “Scrambling and the PF Interface.” In W. Gueder and Myriam Butt
(eds.), Projecting from the Lexicon. Stanford: CSLI Publications, 1998.
Rialland, Annie, and Stéphanie Robert. “The Intonational System of Wolof.” Linguistics 39 (2001):
893-939.
Selkirk, Elisabeth. “Sentence Prosody: Intonation, Stress, and Phrasing.” In John Goldsmith (ed.), The
Handbook of Phonological Theory, pp. 550-569. Cambridge: Blackwell Publishers, 1995.
Uriagereka, Juan. “An F Position in Western Romance.” In Katalin É. Kiss (ed.), Discourse
Configurational Languages, pp. 153-175. Oxford: Oxford University Press, 1995.
Vallduví, Enric. The Informational Component. University of Pennsylvania: Doctoral dissertation, 1990.
Vogel, Irene, and István Kenesei. “The Interface between Phonology and Other Components of
Grammar: The Case of Hungarian.” Phonology Yearbook 4 (1997): 243-263.
Vogel, Irene, and István Kenesei. “Syntax and Semantics in Phonology.” In Sharon Inkelas and Draga
Zec (eds.), The Phonology-Syntax Connection, pp. 365-378. Chicago: University of Chicago Press,
1990.
Zubizarreta, María Luisa. Prosody, Focus and Word Order. Cambridge, Mass.: MIT Press, 1998.
ARDIS ESCHENBERG
1. INTRODUCTION1
Polish, a western Slavic language, is a so-called ‘free word order’ or ‘scrambling’
language. SVO ordering has been posited to be basic for Polish (Szober 1963), and a
study by Klemensiewicz (1949) found the majority of isolated sentences to conform
to this ordering. However, other constituent orders are still common.
Variations in word order have often been explained in terms of information
structure (Szwedek 1976; Willim 1989), as well as constituent length (Siewerska
1993). However, a single word order can occur with various types of information
structure (Eschenberg 1999). In such cases, prosody may provide a way to
distinguish between the differing information structure types. Analyses which rely
on textual data or fail to consider prosody will be unable to account for cases where
one word order is used for differing information structures.
This paper explores Polish constructions involving focus on a single constituent,
narrow focus constructions. Not only word order but also intonation, particularly
sentence stress, is considered. First, declarative sentences are examined. Then,
wh-questions are turned to. Word order alone cannot be used to account for narrow
focus in Polish; prosody is crucial. Failure to consider prosody will be seen to cause
confusion between construction types. Differences in word order will be shown to
be motivated by different types of presupposition, as proposed by Dryer (1996). A
more restricted definition of focus type offered by Kiss (1998) will be seen to apply
in this situation.
23
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 23–40.
© 2007 Springer.
24 ARDIS ESCHENBERG
This paper explores argument focus, which has the communicative function of
identifying a referent. Argument focus has also been called ‘narrow focus’ (Van
Valin & LaPolla 1997) and occurs when one constituent is focal. The term narrow
focus captures the fact that this constituent may be not only an actual argument
(subject, object, indirect object), but also an oblique NP or PP or a nucleus (V).
Narrow focus can be further divided into marked and unmarked narrow focus where
unmarked narrow focus occurs when the focal constituent occurs in the unmarked
focus position in the sentence for the given language. For example, the final
position2 is the unmarked focus position for English. Thus, as English is SVO,
objects, which occur finally, are unmarked for narrow focus.
Similarly, Polish has a final focus position which is unmarked. The following
section will examine narrow focus in Polish, beginning with marked narrow focus
on the subject and continuing to unmarked narrow focus on the object.
Both SV (1a) and VS (1b) ordered sentences are felicitous replies placing narrow
focus on the subject.3 In each case, the subject is prosodically marked, receiving
intonational prominence.
Similarly, an answer containing (unmarked) narrow focus on an object can be
ordered in two ways, where in each case the object is intonationally prominent.
In (2a) the object is sentence initial and prominent, and in (2b) the object is sentence
final and prominent.
Although the above example (2) did not involve an overt subject, a similar
situation arises when an overt subject is present (3).
Again, the object can occur in its canonical position, sentence finally (a). It can also
occur pre-verbally after the subject (b), but is less felicitous sentence initially (c,d).
Note that in each of the above, while the word order changes, the pitch accent
placed upon the focal constituent is similar. This can be seen in a pitch curve, such
as in Figure 1.
.
26 ARDIS ESCHENBERG
Figure 1. Comparison of pitch curves for (a) SVO and (b) SOV ordered sentences.
In Figure 1(a), the final focal object begins at 6.6 seconds as the pitch curve rises
and continues until the end of the sentence. In (b) the medial focal object again
begins on the ascent of the curve and continues through its peak and descent. The
final constituent in each case is lengthened. Therefore, the curve associated with the
final object is lengthened compared to the medial object's curve. However, the
general shape and range in hertz associated with the focal object is similar for both
the final and medial focal objects.
Error correction paradigms provide another way to elicit narrow focus
constructions, yielding similar results to wh-question elicitation (4).
The error correction paradigm in (4) provides the same grammaticality judgments
and intonational contours as the similar wh-question paradigm in (3). This can be
seen in a comparison of the plots of the pitch curves as well (Figure 2).
Figure 2. Pitch curves of (a) wh-question and (b) error correction paradigm responses:
Jan kocha MARIĉ.
POLISH NARROW FOCUS CONSTRUCTIONS 27
Thus, for both error corrections and replies to wh-questions, variable word
orderings exist in Polish. Subjects can occur initially or finally, and objects can
occur pre-verbally or finally. In all cases the focal argument receives prosodic
prominence.
3. PREVIOUS ANALYSES
Variability in Polish word order is not a newly discovered phenomenon. Indeed, as
with many Slavic languages, Polish has been studied extensively by Prague School
linguists, who call the principles underlying the flexibility in word order the
“functional sentence perspective (FSP).” To describe how information is distributed
in a sentence, that is to give the information structure of sentences, Mathesius (1929:
127) divides the parts of an utterance into “theme” and “rheme.” The theme is what
“one is talking about, the topic,” and the rheme is “what one says about it, the
comment” (Danes & 1970: 134). These have also been explained as a distinction
between new information, rheme, and given information, theme.
Using the latter interpretation, Szwedek (1976: 51) states that it is “not true that
order of sentence elements in Polish is free or is a matter of style,” but that it is
“strictly determined” and “reflects the organization of the utterance according to the
new/given information distribution which, of course, is dependent on the context
and situation.” Thus, for the above, both the focal object and focal subject are
predicted to come last due to this organization. Szwedek notes that canonically
ordered, insitu focus (SVO) is more colloquial or conversational for focal subjects
than focus final placement (VOS). He does not discuss SOV constructions.
Variation in word-ordering has also been studied by linguists from other schools
of thought. Willim (1989: 38) notes that subjects are often introduced into discourse
in final position. She calls these VOS ordered sentences ‘presentational.’ However,
she does not note which argument in these presentational sentences is prosodically
prominent, and, thus, it is difficult to apply her analysis to the above. In her analysis,
OSV ordering with a prosodically prominent object is a case of ‘topicalization,’
where the object is focal and non-presupposed (122-3). Like Szwedek, she also does
not discuss SOV constructions. Neither of these analyses completely accounts for all
of the variation seen in (1-4).
4. EFFECTS OF PRESUPPOSITION
.
28 ARDIS ESCHENBERG
Both the simple focus focus sentence (5a) and the cleft sentence (5b) felicitously
answer the wh-question. The speaker of the wh-question believes that someone saw
John and is asking who that person is. Lambrecht (1994: 283) notes that the speakers
of wh-questions typically presuppose that there is an answer which fulfills the
question. He states that one does not normally ask questions one does not expect
answers to. However, the replies to the wh-question do not necessarily contain this
presupposition. Dryer (1996: 188), following Rochemont (1986: 130), claims that
cleft constructions necessarily contain this pragmatic presupposition but simple
focus sentences do not. In the above, (5b) necessarily presupposes that someone saw
John but (5a) does not. The following provides an overview of Dryer’s arguments as
relevant to this paper.
The presuppositional content of the replies becomes apparent in situations where
the question does not contain such a pragmatic presupposition. Example (6), adapted
from Dryer (1996: 510), provides an example of a question where the speaker does
not assume that someone did in fact see John.
Note that only the simple focus sentence (6a) is a felicitous reply to the question
when there is no presupposition that someone saw John. The cleft cannot be used as
a felicitous reply (6b) because it inherently presupposes that someone did see John,
and the question does not presuppose this.
Although the questions in (5) and (6) differ for presupposition, both activate the
proposition ‘someone saw John.’ In (5), the first speaker believes this proposition to
be true; s/he presupposes it. In (6), the speaker does not have such a belief.
Therefore, the presupposition cannot be part of the common ground between the two
speakers.
When the presupposition of the answer is negated in the reply, the cleft cannot
occur (7).
The answer (7a) does not presuppose that someone saw John. In fact, it asserts just
the opposite, that no one saw John. The cleft cannot felicitously assert this due to the
fact that cleft contains a presupposition that someone saw John (Rochemont
POLISH NARROW FOCUS CONSTRUCTIONS 29
1986: 130, Dryer 1996: 188). Thus, while clefts inherently contain pragmatic
presupposition, simple focus sentence answers do not.
Similar to (6), the question in (8) does not contain the presupposition that someone
actually sang. The speaker of the question has activated the proposition ‘someone
sang,’ but does not necessarily believe it to be true. The SV ordered sentence with
focus on the subject is felicitous (8a). It contains but does not presuppose the
proposition that someone sang. However, VS ordering is not grammatical if
prosodic prominence is placed on the subject (8b). Behaving analogously to an
English cleft construction, the VS construction presupposes that someone sang and
cannot felicitously answer a question which does not contain such a presupposition.
To use the VS construction would entail that the presupposition is part of the
common ground between speakers, but the question shows that it is not. The VS
ordering is felicitous if the sentential stress is perceived to be on the verb (8c). This,
however, is not a case of narrow focus on the just the subject, but rather the entire
sentence is in focus. Indeed, in a spectrogram, the pitch curve actually shows stress
on both the verb and the subject in such a construction. While (8a) places narrow
focus on Jan, (8c) places focus on the entire proposition. Both necessarily assert that
someone sang as the question does not presuppose this. However, one focuses on the
actor, entailing the event, and the other focuses on the entire event.
Narrow focus on subjects can occur with either sentence initial or sentence final
subjects (1). However, sentence final subjects contain pragmatic presupposition
which their sentence initially placed counterparts do not (8). Focal objects have also
been seen to occur both initially and finally (2). The following explores the effects
of presupposition on object word order (9).
.
30 ARDIS ESCHENBERG
In example (9), the reply to a question with object focus but no pragmatic
presupposition felicitously occurs only with canonical ordering (SVO), as in (9a).
SOV ordering, similar to a cleft construction in English cannot felicitously answer
the question.
Thus, it can be seen that non-canonical word orderings with prosodic
prominence on an argument entail pragmatic presupposition. Canonically ordered
SVO sentences with prosodic prominence on an argument do not entail such a
presupposition. Without pragmatic presupposition, focus must occur in-situ, that is,
the word ordering must be SVO. In all constructions, the focal constituent receives
prosodic prominence.
Examples such as (8) and (9) necessarily lead to a revision of Lambrecht’s
formulations of assertion, presupposition and focus. Presupposition is not simply the
set of lexico-grammatically evoked propositions the speaker assumes the hearer
knows, believes, or will take for granted at the time of the utterance (Lambrecht
1994: 52). Rather, it is only the set of propositions that the speaker assumes the
hearer believes at the time of the utterance. His definition of assertion as the
proposition expressed by a sentence which the hearer is expected to know or believe
or take for granted as a result of hearing the sentence uttered still holds true (52).
However, the focus can no longer be defined as the semantic component of a
pragmatically structured proposition whereby the assertion differs from the
presupposition (213). Both (8a) and (8c) assert that someone sang and that Jan is the
person who sang. Neither contain a presupposition about the beliefs of the hearer.
However, the focus in these two constructions is not the same. In (8a), the focus is
the argument ‘Jan’ and in (8c) it is the entire sentence. Focus is determined not by
subtracting presupposition from assertion but rather by prosody.
Using tests developed by Szabolcsi (1981) and Farkas (p.c. to Kiss 1998), Kiss
demonstrates that identificational focus expresses exhaustive identification in
Hungarian pre-verbal focus constructions and in English cleft sentences. One test
involves a pair of sentences where the first contains two coordinated objects and the
second contains only one of the two objects. If the second sentence involves
exhaustive identification, it cannot be a logical entailment of the first. That is, if the
second sentence expresses exhaustive identification, it contradicts the first. The
following provides such a test in Polish using both canonical and non-canonical
word order.
.
32 ARDIS ESCHENBERG
The preverbal focal object placement is not felicitous for ‘also’ phrases (12a),
‘even’ phrases (12b) and an existential quantifier (12c). All of these constructions
are possible for final objects (13).
Whereas focal objects placed non-canonically were not felicitous for such phrases,
focal objects in-situ (clause final) are felicitous for ‘also’ phrases (13a), ‘even’
phrases (13b), and an existential quantifier (13c). Thus, the identificational focus
constructions are not felicitous, but the informational focus constructions, which are
not associated with movement, are felicitous in these examples.
In the analyses in sections 2 and 4, both focal subjects and objects were found to
behave in similar ways based on in-situ versus non-canonical word ordering and
focus. Although Kiss does not explore subjects, a thorough investigation of the
Polish phenomena presented thus far requires such an examination. The following
presents sentences similar to (12, 13) involving focal subjects rather than focal
objects.
‘Also’ phrases (14a, 14a’) and ‘even’ phrases (14b, 14b’) are felicitous for focal
subjects regardless of whether the subject is placed initially or finally. Although in
such constructions focal objects could only occur in the canonical position of
informational focus (13), focal subjects can occur in canonical or non-canonical
positions. However, focal existential quantifiers are not felicitous in initial position
(14c) but are felicitous in final position (14c’).
POLISH NARROW FOCUS CONSTRUCTIONS 33
Although the felicity judgements of (14c, 14c’) seem very odd considering the
results seen earlier, Kiss notes that existential quantifiers cannot function as either
identificational or informational focus. Thus, (14c’) must be a different type of
construction; it cannot be an identificationally focused final subject as in (1b).
Indeed, it is a presentative with a pitch accent on the introduced element ktoĞ. This is
an example of the VOS ordered sentences Willim (1989: 38) refers to. These
constructions introduce a new element rather than providing the contrastive reading
(section 3) of identificational focus due to exhaustive identification. Here, rather
than exhaustive identification, a constituent is introduced.
Similarly, the non-canonically ordered subjects in (14a’) and (b’) are not
examples of identificational focus, but rather presentatives. In the earlier examples
(1, 2, 3, 4) informational focus and identificational focus constituents have similar
pitch accents but different word orderings. This is confirmed by both native speaker
judgment and spectrographic analysis (figure 1). However, speakers do not judge the
SV ordered and VS ordered sentences in (14) to have the same pitch accents.
Whereas speakers state that in (14a) and (14b) the strongest pitch accent is on the
adverb (and a lesser pitch accent occurs on the noun4), they consistently judge
(14a’) and (14b’) to place the strongest pitch accent on the noun (and a lesser pitch
accent on the adverb). Spectrographic analysis confirms speaker judgments of
prosodic prominence (Figure 3).
Figure 3. Pitch curves of (14a) and (14a’), ‘also’ phrases with prosodically prominent
subjects.
In Figure 3, the highest points in the pitch curve differ for (a) and (b). In (a), the
highest point is over ‘also,’ but in (b) it is over ‘Maria.’ This confirms native
speaker judgements. Identificational focus with a subject noun phrase results in
prosodic prominence on the adverb in ‘also’ and ‘even’ phrases. Speakers also judge
the strongest pitch accent to be on the adverb in such constructions when the object
is focal (13).
.
34 ARDIS ESCHENBERG
That (14a’) and (14b’) are not identificational focus is further supported by the
fact that their pitch curves differ from clear examples of subject identicational focus
(Figure 4).
Figure 4. Comparison of pitch curves for a final focal subject (a) and presentational final
subject (b).
Whereas Maria begins when the pitch curve is already mid-ascent (4.5 sec.) in the
focal subject construction (a), it begins on the lowest point of the pitch curve (7.96
sec.) in the presentative construction. That is, a local minimum occurs in the pitch
curve well before the subject in (a) but coincides with the subject in (b). The fact
that the pitch curves are not identical is due to the fact that the VS sentences in
(14) are not instances of identificational focus, but rather are presentational
constructions. Thus, careful analysis of prosody can distinguish between sentence
final identificational focus subjects and sentence final presentational subjects.
Thus, in Kiss’ analysis, SVO and SVO sentences are examples of informational
focus, while SOV and VOS sentences are instances of identificational focus.
Additionally, VOS sentences can occur as sentences involving introduction of a
constituent.
6. RELATED PHENOMENON
Accordingly, use of ciebie coincides with the structure and intonation used for
identificational focus. It is placed in non-canonical position, pre-verbally, and given
prosodic stress (15b). It is less felicitous in the canonical (final) object position
reserved for informational focus (15d). Conversely, the non-presupposed ciĊ occurs
most felicitously in canonical object position (15a) and less felicitously pre-verbally
(15c). This phenomenon further supports the above analysis of identificational
versus informational focus in Polish.
.
36 ARDIS ESCHENBERG
6.2. Wh-questions
In the literature, wh-questions are often assumed to be a type of narrow focus with
properties similar to non-wh focus. For example, Kiss (1998: 249) states that for
Hungarian, a wh-phrase other than ‘why’ is ‘always placed in the preverbal
identificational focus position…’ However, she notes that wh-questions can be
answered by identificational or informational focus. This leads to an ambiguity as to
whether wh-question words are a type of identificational focus or not.
Polish, however, provides clear evidence that wh-focus is not the same as
identificational focus in a declarative (16).
In (16a), the felicitous wh-question, the subject is both initial and focal. This is
similar to the informational focus position of a subject (14a,b). It is unlike
identificational focus subjects, which have been seen to occur finally (8). In (16b)
the focal subject is final and the resulting sentence is ungrammatical. Example (16c)
shows that a ‘wh’ subject can occur finally, but only when it is not prosodically
prominent, or focal. In such a case, it also does not receive a wh-reading. Unlike in
Hungarian, Polish focal wh-subjects are clearly not in the identificational focus
position.
The fact that (16c) does not have a wh-reading can be seen by looking at its
felicitous answers:
Only answers which do not presuppose that someone did indeed die are felicitous,
such as (A) with canonical order and prosodic prominence on the subject
(informational focus). (A”), an example of identificational focus, has the pragmatic
presupposition that someone died and is not grammatical. The answer ‘no’ (A””) is
a felicitous reply here but would not be for the wh-question ‘who sang?’ This
POLISH NARROW FOCUS CONSTRUCTIONS 37
paradigm proves different from an actual wh-question, such as (1), and, rather, is
similar to a question involving an indefinite pronoun (8). This focal whquestion/
non-focal indefinite pronoun patterning can also be seen in Siouan languages such
as Omaha and Lakhota where words which function as wh-words when focal act as
indefinites when non-focal.
Just as focal wh-subjects occur initially (16), focal wh-objects also occur initially
(18):
.
38 ARDIS ESCHENBERG
differ in word order but not prosodically prominent constituent (for example, 1a and
1b), Dryer’s notion of presupposition proves valuable (8, 9). Kiss’ definition of
identificational focus proves equally applicable (10,11). In both cases, a stipulation
that the construction provides exhaustive identification needs to be integrated.
In addition to refining the concept of narrow focus to include presupposi-
tion, Kiss’ analysis additionally provides that movement is not associated with
informational focus. Supporting this, in Polish informational focus occurs in-situ
(10, 14), while identificational focus is associated with non-canonical position (11,
13). Use of Polish clitics versus full pronouns provides additional evidence for the
distinction between informational and identificational focus (15).
However, Kiss’ observation that wh-words in Hungarian tend to occur in the
identificational focus position does not hold for Polish. A different type of focus,
wh-word focus behaves differently than focus in declaratives. Wh-word focus in
wh-questions entails placing the wh-word in initial position and giving it prosodic
prominence. This is true regardless of the argument type of the wh-word. Again,
accounting for prosody proved crucial in that a non-prosodically prominent
wh-word can occur sentence finally. However, in this case, an indefinite and not a
wh-reading is attained.
Thus, Polish, as a flexible word order language, provides an ideal testing ground
for theories of focus. Just examining prosodic accent on single constituents leads to
evidence for identificational focus, informational focus, wh-question focus, and
presentatives. Word order and/or prosody can distinguish each; there are no overlaps
where two constructions are homophonous and only distinguishable through
context. Positing a focus position applicable regardless of semanticosyntactic roles
proves valid for wh-words, but not for other forms of narrow focus. The position of
constituents involved in presentatives, informational focus and identificational focus
is best explained as in-situ versus non-canonical position, rather than as fixed
positions. Table 1 provides a summary of the word orders and prosody involved for
the constructions examined in this paper.
POLISH NARROW FOCUS CONSTRUCTIONS 39
Ardis Eschenberg
University at Buffalo
Nebraska Indian Community College
8. NOTES
1 I would like to thank Janina Aniszewska, Jolanta àapat, Maágorzata àapat, Czesáaw Prokopczyk, and
Piotr Szewczyk, and for their patience, teaching and insight into the Polish language. Any mistakes here
are the responsibility of the author, but all the truth obtained is due to the kindness of these consultants. I
would also like to thank Daniel Büring for his insightful comments.
2 Final position in the core, not the clause, where the core consists of the predicate and its arguments.
3 Bold underline represents prosodically accented constituent. Small caps are used to indicate sentence
stress in sample sentences.
4 The stronger pitch accent is indicated by bold small caps, while the lesser is in small caps.
9. REFERENCES
Daneš, FrantiĞek. “One instance of Prague school methodology: functional analysis of utterance and
text.” In Paul L. Garvin (ed.), Method and Theory in Linguistics. Paris: Mouton & Co, 1970.
Dryer, Matthew. “Focus, pragmatic presupposition, and activated propositions.” Journal of Pragmatics
26 (1996): 475-523.
Eschenberg, Ardis. Focus in Polish. M.A. thesis. University at Buffalo, 1999.
Kiss, Katalin. “Identificational versus information focus.” Language 74.2 (June 1998): 245-273.
Klemensiewicz, Zbigniew. Lokalizacja podmiotu i orzeczenia w zdaniach izolowanych. Biuletyn PTJ 9
(1949): 8-19.
Lambrecht, Knud. Information Structure and Sentence Form: a theory of topic, focus, and the mental
representations of discourse referents. New York: Cambridge University Press, 1994.
.
40 ARDIS ESCHENBERG
Mathesius, Vilem. Functional linguistics. In M. Mayenova, ed., O spojnosci tekstu, pp. 121-42. Warsaw:
1987.
Siewierska, Anna. “Syntactic weight vs. information structure and word order variation in Polish.”
Journal of Linguistics 29.2 (1993): 233-266.
Szabolcsi, Anna. “The semantics of topic-focus articulation.” In Jan Groenendijk, Theo Janssen, and
Martin Stokhof (eds.), Formal methods in the study of language, pp. 513-41. Amsterdam:
Matematisch Centrum, 1981.
Szober, Stanislaw. Gramatyka jĊzyka polskiego. Warsaw: PWN, 1963.
Szwedek, Aleksander. Word Order, Sentence Stress and Reference in English and Polish. Edmunton:
Linguistic Research, Inc, 1976.
Van Valin, Robert and Randy LaPolla. 1997. Syntax: Structure, meaning and function. New York:
Cambridge University Press, 1997.
Willim, Ewa. On word order: a government binding study of English and Polish. Krakow: Uniwersytet
Jagellonski, 1989.
DAVID GIL
1. INTRODUCTION
What kinds of meanings may be expressed by intonation? There is general
agreement that intonation may convey emotions, and, related to this, speakers’
attitudes towards the propositional content of utterances. It is also well-known that
certain intonation contours may be associated with specific speech acts such as
questions. Moreover, as reflected by the title of this volume, intonation may encode
various pragmatic functions such as topic and focus.
Another, rather more indirect way in which intonation may express meanings is
via its relationship to syntactic structure. In general, intonation contours parse an
utterance into intonation groups, which correspond closely, albeit not always
perfectly, to syntactic constituents. However, in many cases, a given string of words
may be associated with two or more different constituent structures, each of which
in turn is associated with a different meaning. In such cases, the different syntactic
structures and corresponding meanings may be reflected by different intonation
groups.
Nevertheless, the range of meanings expressible by intonation is highly
constrained. For example, no language has intonation contours which, when applied
to any sentence, add meanings such as past tense, ‘in the rain’, or ‘because John
came to the party’. Thus, a major goal of any theory of intonation must be to
determine the set of meanings potentially encodable by intonation in one or more
human languages.
This paper contributes to the above goal through the examination of one specific
semantic domain, namely thematic roles: actor, undergoer, goal and the like. Most
commonly, thematic roles are encoded with various morphosyntactic features,
typically some combination of word order, case marking and verbal agreement. One
might wonder whether there are any languages in which thematic roles can also be
expressed by means of intonation. This paper addresses the question through an
empirical examination of intonation and thematic roles in one particular language,
namely the Riau dialect of Indonesian. The results of the study are negative: no
evidence is found that might point towards any correlation between intonation and
thematic roles in Riau Indonesian. This, in turn, is suggested to lend greater cogency
41
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 41–68.
© 2007 Springer.
42 DAVID GIL
to the question whether in fact it is possible in any language for thematic roles to be
encoded by intonation.
Speakers of Hebrew occasionally claim that the two meanings can be distinguished
by intonation. But when asked how, they do not provide systematic answers. In
general, the most readily available interpretation is that in which the actor precedes
the undergoer, as in (1/i) above. In order to obtain the less readily available
interpretation, that in (1/ii), speakers of Hebrew sometimes offer a distinctive
intonation contour, involving greater pitch variation and greater duration for certain
syllables. However, when questioned, they will generally concede that even with the
distinctive intonation contour, the sentence can also be understood as in (1/i); and
then they will often admit that even with an ordinary intonation contour, the
sentence can also be understood as in (1/ii). Similar facts are reported also for Persian
and other Middle-Eastern languages by Stilo (1984, personal communication).
As suggested by the above, there would seem to be a rather striking mismatch
between the widespread conviction that intonation can be used to differentiate
between thematic roles, and the absence of any detailed empirical studies testing the
veracity of such claims. To the best of my knowledge, then, this paper represents the
first attempt to subject the possible relationship between intonation and thematic
roles to systematic empirical investigation.
3. RIAU INDONESIAN
Riau Indonesian is the variety of Indonesian spoken in informal situations by the
inhabitants of Riau province in east-central Sumatra. Riau Indonesian is quite
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 43
usir
send.away
[Complaining about his younger brother Pai, who won’t have
anything to do with him]
‘His very own brother he sent away’
In each of the above examples, a word denoting an activity is in boldface, and its
two associated participants are in italics. In (2) the activity word occurs before its
two participants, in (3) it occurs between them, and in (4) it occurs after them both.
Within each of the three sentence pairs, the activity word is the same; however, the
actor precedes the undergoer in the first sentence while following it in the second
sentence. Thus, in (2a) actor aku ‘I’ precedes undergoer laser ‘laser’ while in (2b)
actor aku ‘I’ follows undergoer nasi goreng ‘fried rice’; in (3a) actor saya ‘I’
precedes undergoer kaca mata ‘glasses’ while in (3b) actor abang Elly ‘Elly’
follows undergoer Honda ‘motorcycle’; and in (4a) actor si Pai ‘Pai’ precedes
undergoer aku ‘I’ while in (4b) actor dia ‘he’ follows undergoer abang dia sendiri
‘his very own brother’. Thus, each of the three sentence pairs constitutes a near
minimal pair illustrating the indeterminacy of thematic role assignment. Together,
sentences (2) - (4) show that in a basic sentence consisting of activity, actor and
undergoer, these three items may occur in any of the six possible orders. Similar
facts obtain also with respect to other thematic roles. Examples such as the above
occur frequently in the corpus; other similar examples are cited in Gil (1994:181,
1999:191-193, 2002b:246-249). Thus, sentences such as these point towards the
conclusion that in Riau Indonesian, grammar does not provide any obligatory
grammatical means for distinguishing between thematic roles.3
Given the kind of indeterminacy present in examples such as the above, it is only
natural to wonder whether intonation might play a role in differentiating between
various interpretations. In fact, practically every time I have presented examples
such as the above in lectures, somebody in the audience has asked whether it isn’t
perhaps the case that different interpretations involving different assignments of
thematic roles might be distinguishable by means of different intonation contours.
However, the answer to this question is a simple, straightforward ‘no’: intonation
does not and cannot differentiate between different assignments of thematic roles in
Riau Indonesian. Thus, for example, in sentences such as those in (2) - (4), there are
no systematic differences between the intonation contours of the (a) sentences, in
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 45
which the actor precedes the undergoer, and the (b) sentences, in which the actor
follows the undergoer.
Here the matter should rest, but unfortunately it does not always do so. Rather,
many scholars continue to hold steadfast to the belief that intonation must
distinguish thematic roles in Riau Indonesian, and in other varieties of Malay/
Indonesian. (Some of the possible reasons behind the persistence of this belief are
discussed in Gil 2003). However, not a single one of these scholars, when
challenged, has been able to formulate an explicit description of exactly how
intonation can be used to distinguish thematic roles, and to the best of my knowledge,
no such account appears anywhere in the linguistic literature on Malay/ Indonesian.
The closest to an explicit proposal that I have come across is perhaps the
following. (The claim is stated in my own words, and constitutes my interpretation
of one or two suggestions made by colleagues in informal discussions.) In general,
in Riau Indonesian, there is a significant tendency for undergoers to follow
activities, as in (2) and (3a) above. Accordingly, when undergoers precede
activities, as in (3b) and (4), this unusual word order is signalled by a pause
occurring right after the undergoer. Within a generative framework involving
movement, this generalization might be restated as follows: when a undergoer is
fronted to a higher position in the clause, a pause occurs between it and the clause
from which it was extracted. This “pause proposal” at least constitutes an explicit
hypothesis which can be examined in face of the facts. But as shown in Section 6
below, it is clearly false.4
4. TWO HYPOTHESES
So what needs to done in order to finally put such claims to rest? Three methods
suggest themselves. First, one might use elicitation, and ask native speakers for their
judgements of sentences exhibiting various possible pairings of intonation contours
and thematic roles. Secondly, one might construct experiments, which would
present native speakers with various tasks requiring them to make use of
intonational cues in order to distinguish thematic roles. Thirdly, one might study
naturalistic corpora, and search for possible correlations between intonation
contours and thematic roles. While each of these three methods is in principle
equally valid, this study chooses to make use of the third method, involving
naturalistic corpora. The reasons for this choice are entirely practical. On the one
hand, elicitation and experiments are particularly problematical in the study of Riau
Indonesian. As a regional colloquial language variety, Riau Indonesian stands in a
basilect-to-acrolect relationship with Standard Indonesian. Put a speaker of Riau
Indonesian in what is perceived to be a learnèd setting such as an elicitation session
or a controlled experiment, and he or she is likely to switch to Standard Indonesian,
no matter how clearly and repeatedly the investigator has asked the speaker to use
“ordinary language”, that is to say, Riau Indonesian. On the other hand, in Riau
Indonesian an extensive naturalistic corpus is available, containing recordings of
speech from many different speakers in a variety of settings, including narrative and
46 DAVID GIL
conversational. Accordingly, the present study makes use of the third method,
examining a naturalistic corpus for possible correlations between intonation
contours and thematic roles.
Two specific hypotheses are examined:
Both of the above hypotheses negate the claim that intonation distinguishes
between thematic roles in Riau Indonesian. However, the second hypothesis is
stronger than the first: one can envisage a state of affairs in which the first
hypothesis holds but the second one fails, but not vice versa. As we shall see in
Section 6 below, the naturalistic corpus provides overwhelming support for the
weaker Hypothesis A, and substantial support for the stronger Hypothesis B.
Accordingly, the results of this study lead to the conclusion that intonation does not
differentiate thematic roles in Riau Indonesian.
The bisyllabic nature of the Riau Indonesian word raises the issue of word
stress. As observed by Tadmor (1999, 2000), word stress in Malay and Indonesian
presents a thorny problem, with different scholars often providing conflicting
descriptions. Thus, for example, van Ophuijsen (1915) claims that stress is on the
final syllable, Amran (1984:60) maintains that it is on the penultimate, while Kähler
(1956:37) asserts that it is either on the final syllable (if the penultimate is a schwa)
or on the penultimate (in all other cases). One possible source for these
discrepancies might be that different scholars are unwittingly describing different
regional and/or social varieties of Malay / Indonesian. Thus, Tadmor (1999, 2000)
shows a tendency for word stress in Malay / Indonesian to progress from final, in
the western parts of the archipelago, towards penultimate, in the eastern regions,
reflecting a similar progression in the local languages, which often constitute
substrates for the regional varieties of Malay/ Indonesian. Another possible source
for these inconsistencies could well be that Malay/ Indonesian has no word stress.
In such a case, the patterns that are being described may be present in the
investigator’s ear but not in the language itself, as is suggested by Goedemans and
van Zanten (to appear). Alternatively, the patterns described may be phonetically
real, but pertaining not to word stress but rather to intonational prominence, as is in
fact suggested in the continuation of this section. Indeed, for Riau Indonesian, I am
not familiar with any positive evidence supporting the existence of a privileged
syllable which could be characterized as the locus of word stress. In this sense, then,
Riau Indonesian may be appropriately characterized as lacking word stress.
Nevertheless, while Riau Indonesian words lack a privileged syllable, there is
strong evidence for the presence of a privileged bisyllabic unit, which may be
referred to as the core foot. As represented in (6) below, the core foot (F) consists
of two syllables (S), each of which consists in turn of an onset (O) plus a rhyme (R):
S S
O R O R
m a k an ‘eat’
m i ‘noodles’
ke p i t ing ‘crab’
b e l i kan ‘buy’
di c at ‘paint’
48 DAVID GIL
Most words, such as makan ‘eat’, are bisyllabic and thus coextensive with most
or all of the core foot. A few shorter monosyllabic words, such as mi ‘noodles’,
occupy only the second syllable of the foot, while a small number of longer words,
such as kepiting ‘crab’, occupy the entirety of the core foot plus additional space
preceding it. Clitics, when present, invariably occur outside of the core foot, either
after it, for example the end-point marker -kan in belikan ‘buy’, or before it, for
example the undergoer marker di- in dicat ‘paint’. The core foot is thus what
underlies the basic bisyllabic nature of Riau Indonesian words. However, the
existence of the core foot is also supported by a number of additional independent
phenomena.
One such phenomenon involves patterns of reduction in fast connected speech.
Typically, as shown in (7) below, material belonging to the core foot is retained,
while preceding material may undergo partial or complete deletion:
S S
O R O R
p s a w at → [psawat] ~ [sawat]
‘airplane’
tang k e r ang → [taNkeraN] ~[NkeraN] ~ [keraN]
‘[place name]’
Whereas the above phenomenon involves the contraction of overly long words, a
number of others involve the expansion of words that are too short to fill the core
foot.
One such phenomenon pertains to the personal marker si, which marks
expressions as constituting names of people:
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 49
S S
O R O R
si t o p an → [sitopan]~[stopan]~[topan]
‘[name]’
s i p an → [sipan], *[span], *[pan]
‘[name]’
Before bisyllabic names, such as Topan, the personal marker si is optional, and,
when present, it may undergo reduction of the kind exemplified in (7); this is shown
in the first line underneath the tree diagram in (8) above. However, names also
possess a monosyllabic familiar form derived by truncation; for example Pan from
Topan.5 Often, this form is used vocatively; however, it is also used in non-vocative
functions, in which case the use of the personal marker si is obligatory; this is
shown in the second line in (8). Thus, one of the functions of the personal marker si
is to expand the monosyllabic familiar form of the name to fill the core foot.
A similar phenomenon involves words with what might be characterized as a
defective penultimate rhyme. For this purpose it is necessary to acknowledge the
existence of two subdialects of Riau Indonesian, which may be referred to as the
schwa dialect and the schwaless dialect respectively. In the former dialect,
the schwa ´ is part of the phonemic inventory, though even in this dialect, it never
occurs in the final syllable. Of interest here however is the second, or schwaless
dialect, in which there is no phonemic schwa. Consider the way in which a word
containing a schwa in the schwa dialect, [b ´sar] ‘big’, is realized in the schwaless
dialect:
S S
O R O R
b s ar → [bs`ar] ~ [b´sar] ~ [besar] ‘big’
phonetic schwa [´], or a full mid-high front vowel [e] (phonetically identical to the
mid-high front vowel phoneme). This range of possibilities can be most
appropriately accounted for by positing a segmental melody bsar occupying the
core foot as per (9) above, with an empty penultimate rhyme position which is
subsequently filled either by backward spreading of the sibilant s or by epenthesis
of a schwa or full vowel. Thus, these phonological processes, spreading and
epenthesis, beef up an impoverished segmental melody, thereby enabling the word
to extend across the entire core foot.
An analogous though somewhat less systematic phenomenon involves loan
words which, in the source language, are monosyllabic:
S S
O R O R
o om < Dutch oom ‘uncle’
g o l op < English golf
As suggested by the above examples, such monosyllabic words are often expanded
to form bisyllabic words in Riau Indonesian, though the strategies by which such
expansion is achieved are idiosyncratic and unpredictable. However, a particular
subclass of such cases, in the schwaless subdialect, make use of the same processes
of spreading and epenthesis that apply, as in (9) above, to native words:
S S
O R O R
s (n) tr um → [strum] ~ [s´trum] ~ [setrum] ~
[s n`trum] ~ [s´trum] ~ [sentrum]
< Dutch stroom ‘electric current’
s m ek → [sm`ek] ~ [s´mek] ~ [semek]
< English smack
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 51
In the first example, the borrowing of Dutch stroom involves the optional
introduction of a nasal stop n, followed by various combinations of spreading and
epenthesis. In the second example, the borrowing of English smack involves either
the spreading of the nasal stop m or epenthesis. In general, evidence from borrowing
may be open to alternative interpretations, since the path from source to target
language could potentially involve any number of intermediate way stations, with
the word in question actually entering Riau Indonesian from another variety of
Indonesian, already in bisyllabic form. However, in at least one case, smek < smack,
it may be safely surmised that the word entered Riau Indonesian directly from
English. This is because the borrowing was actually observed to take place, in the
late 1990’s, via television, immediately following the introduction into US
professional wrestling (hugely popular throughout Indonesia) of the brand name
Smack Down. Accordingly, this latter example provides clear-cut evidence for the
relevance of the core foot as a factor governing the incorporation of loan words into
Riau Indonesian.
The final phenomenon supporting the core foot comes from the Warasa ludling,
a secret language in which the sequence war- is inserted at the beginning of each
word.6 In (12) below the results are shown of applying the ludling to the words
represented in (6) above:
S S
O R O R
wa r a k an
makan → warakan ‘eat’
wa r m i
mi → waremi ‘noodles’
wa r i t ing
kepiting → wariting ‘crab’
wa r e l i kan
belikan → warelikan ‘buy’
wa r c at
dicat → warecat ‘paint’
52 DAVID GIL
As shown in (12) above, the sequence war- is inserted into a position that is defined
structurally, with reference to the core foot: r occupies the first onset of the core
foot with wa immediately preceding it. The effect of adding war- to a word thus
depends crucially on the size of the original word. For most words, which are
bisyllabic, adding war- involves deletion of the first consonant, if the word begins
with a consonant, for example makan → warakan. However, for monosyllabic
words, adding war- involves not deletion but rather the further insertion of an
epenthetic vowel, for example mi → waremi. Conversely, for polysyllabic words,
adding war- involves the deletion not just of the first consonant of the penultimate
syllable, but of any and all preceding material, for example kepiting → wariting.
For stems combined with an enclitic, the ludling ignores the enclitic and treats the
stem as though it constituted the entire word, for example belikan → warelikan. In
contrast, for stems combined with a proclitic, adding war- involves the deletion of
the proclitic, and treats the remainder of the word as though the clitic were absent;
for example dicat → warecat, with the further insertion of an epenthetic vowel.
Thus, as shown in (12) above, the application of the Warasa ludling relies crucially
on the core foot, thereby providing yet additional evidence for its central role in the
structure of the Riau Indonesian word.
Thus, a number of independent phenomena support the existence of a core foot
underlying the structure of the word in Riau Indonesian. Although, as noted in the
beginning of this section, Riau Indonesian has no privileged syllable which could be
characterized as the locus of word stress, the core foot does constitute a privileged
unit, albeit of a larger size. As such, Riau Indonesian may be characterized as being
endowed with a somewhat more abstract variety of word stress, whose locus is not
the syllable, as in most typical instances of stress, but rather the bisyllabic core foot.
As we shall see in Section 5.3 below, the characterization of the core foot as bearing
word stress may account also for properties of focus intonation.
track pad, and the player often ends up under the table; the first time
this happened, I jokingly asked him whether he was looking at the
mice; when this happened once again, speaker joked]
‘I’m looking at the mice’
In order to facilitate the intended interpretation, the above sentence was associated
with an intonation contour which effected the grouping [Tengok tikus] aku.
However, in a different context, a different intonation contour could have been used
to effect a different grouping, Tengok [tikus aku], which would have a quite
different meaning, ‘Looking at my mice’. It should be acknowledged, however, that
the above sentence may also be uttered with intonation contours that do not reflect
any internal constituent structure and hence do not disambiguate between the two
potentially available meanings.
Perhaps the most noticeable characteristic of intonation groups is final
prominence. Within each intonation group, the final syllable is accented, thereby
providing a salient marker of intonation phrase boundaries. Thus, for example, in
(13) above, the grouping [Tengok tikus] aku was affected by accent on the final
syllable of the intonation group, namely kus. As in many other languages, accent is
realized by a combination of phonetic features including greater pitch variation,
greater intensity and greater duration. However, compared to some other languages,
the contribution of greater duration would appear to be relatively larger. Examples
(14) and (15) illustrate the phenomenon of phrase-final lengthening, with durations
indicated in milliseconds:
370 700
760 1230
the above examples are quite typical of the way in which final lengthening may be
exaggerated in order to increase the affective expressiveness of the utterance.
For the unwary investigator, one of the consequences of final prominence in
intonation groups is that it gives rise to the illusion of final word stress. For
example, in a situation involving elicitation, where the researcher asks what the
word for such-and-such is, the speaker typically responds with a one-word utterance
bearing the final-prominent intonation contour. This sounds like final word stress;
however, it is important to keep in mind that the suprasegmental pattern is not a
property of the word, but rather of the entire utterance, which just happens to consist
of a single word. Mistaken analyses of final-prominent intonation contours as word
stress are apparently responsible for the probably erroneous characterization of
many related Malayic language varieties of Sumatra as possessing final lexical
stress, for example Nurzuir et al (1985:32-33) for Jambi, Umar et al (1986:28) for
Muko-Muko, and Suwarni et al (1989:80) for Lintang.
S S
O R O R
M A K AN ‘eat’
M I ‘noodles’
ke P I T ING ‘crab’
B E L I kan ‘buy ’
di C AT ‘paint’
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 55
Example (16) above illustrates the domain of focus intonation for the words pre-
viously illustrated in (6) and (12); in this and subsequent examples, the domain of
focus intonation is indicated with upper case letters. As shown in (16), the domain
of focus intonation coincides precisely with the core foot, as supported by the
various phenomena discussed in Section 5.1 above.
The phonetic realizations of focus intonation are distributed unevenly over the
two syllables of the core foot. The most salient feature of focus intonation involves
the lengthening of the rhyme of the first syllable of the core foot, and sometimes
also the onset of the second syllable. (In some varieties of Riau Indonesian, the
onset of the second syllable may be lengthened if and only if it is other than an oral
stop, while for other varieties, more influenced by a Minangkabau substrate, the
onset of the second syllable may be lengthened no matter what its contents are.)
This lengthening is generally associated with a level pitch contour. At the same
time, focus intonation is also reflected by pitch prominence and secondary
lengthening on the rhyme of the second syllable of the core foot. Some examples of
focus intonation are given in (16) and (17) below, with durations again indicated in
milliseconds:
(18) Rekam LA GI
record again
[Seeing me turn the laptop computer recorder on]
‘Recording again’
56 DAVID GIL
Each of the above examples consists of a single intonation group. In (17), focus
intonation falls on the first word, payah. In this example, focus intonation is
reflected primarily by the length of the first syllable plus second onset, pay, totalling
780 msec. The second rhyme, ah, is also relatively long, and in addition bears
salient pitch prominence. The remainder of the intonation group follows the usual
pattern of final prominence, with three short syllables followed by a final much
longer one, ni. In (18), focus intonation falls on the second word, lagi. Here, once
more, focus intonation is reflected by the length of the first syllable, la, totalling 750
msec., but in this case the second syllable gi is even longer, showing the combined
effect of secondary lengthening due to focus plus the regular final prominence of
the intonation group.7
This particular constellation of features, involving lengthening of a penultimate
syllable followed by some kind of pitch accent on the final syllable, is not peculiar
to Riau Indonesian. In the Jakarta dialect of Indonesian, focus intonation occurs
more frequently than in Riau Indonesian, and its phonetic realization is more
pronounced; so much so that when speakers from Riau attempt to imitate a Jakarta
accent, one of the things that they do is exaggerate the frequency and the phonetic
properties of focus intonation. Outside of Mala y /Indonesian, penultimate
lengthening coupled with some kind of final accentuation has been reported, among
others, for the Formosan language Amis (Edmundson, Huang and Pahalaan 2001),
for various Micronesian languages (Rehg 1993), and for the Polynesian language
Marquesan (Margaret Mutu, personal communication), thereby suggesting that the
feature may be of considerable antiquity within the Austronesian language family.
Just as final prominence in intonation groups sometimes creates the illusion of
final word stress, so focus intonation and concomitant penultimate lengthening may
occasionally give rise to an unwarranted impression of penultimate word stress,
at least in those cases where penultimate lengthening is more salient to the
investigator’s ear than final pitch accent. For example, such a misanalysis is what
underlies some descriptions of Minangkabau, for example Zarbaliev (1987:23) and
Adelaar (1992:12), as having penultimate word stress, even though in reality the
suprasegmental patterns of Minangkabau are largely identical to those of Riau
Indonesian. In some other dialects, such as Jakarta Indonesian, focus intonation and
penultimate lengthening are often used in place of the final-prominent intonation
contour in the context discussed earlier, where, in response to being asked what the
word for such-and-such is, the speaker responds with a one-word utterance. This
use of focus intonation thus contributes further to a characterization of Malay /
Indonesian as having penultimate word stress. However, in actual fact, focus
intonation and the way in which duration and pitch prominence split across the two
syllables of the core foot provide additional support for the claim that in Riau
Indonesian, as in many other related varieties, word stress is present not at the
domain of the syllable but rather at the level of the entire foot, with respect to which
it occurs in fixed position, falling invariably on the core foot.
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 57
The above four contours span much of the variety that is in evidence in the
intonational patterns of Riau Indonesian, though of course they do not exhaust it.
For declarative statements and imperatives, additional intonation contours may
involve more complex configurations containing two or more intonation groups,
focus, and pauses; however, as complexity increases, these intonation contours
become less and less frequent. Alternatively, other intonation contours of a
qualitatively different nature include those associated with certain specific sentence-
final particles, and also with other kinds of speech acts such as polar and
information questions, and direct quotation. Nevertheless, the above four basic
intonation contours suffice to give the proponents of a correlation between
intonation and thematic roles a good run for their money: if such a correlation did
exist, it is most likely that it would involve at least one of the above four contours.
The four basic intonation contours are examined with respect to a set of basic
sentence patterns defined in terms of an activity in construction with a single
associated participant. The participant in question may either precede or follow the
activity, and it may be associated with the thematic roles of either actor or
undergoer. Resulting from these two binary choices are the following four basic
sentence patterns:
58 DAVID GIL
Again, the above four basic sentence patterns do not exhaust the inventory of
sentence patterns in Riau Indonesian. However, it is reasonable to suppose that if
intonation did distinguish thematic roles, its effect would be observable with respect
to at least some of the above basic sentence patterns.
The four basic intonation contours in (19) and the four basic sentence patterns in
(20) may be combined to yield sixteen potentially possible pairings of intonation
contours and sentence patterns. These sixteen pairings are represented in the sixteen
cells of Table 1. (In Table 1, letters a, p and v stand for actor, undergoer and
activity respectively, upper case letters denote focus intonation, while ø represents a
pause between intonation groups.)
↔ ↔
Intonation Contour A: aØv pØv vØa vØp
Pause, no focus (21a) (21b) (22a) (22b)
↔ ↔
Intonation Contour B: av pv va vp
No pause, no focus (23a) (23b) (24a) (24b)
↔ ↔
Intonation Contour C: Av Pv Va Vp
No pause, initial focus (25a) (25b) (26a) (26b)
↔ ↔
Intonation Contour D: aV pV vA vP
No pause, final focus (27a) (27b) (28a) (28b)
distribution of utterances across the table, with, crucially, some empty cells,
reflecting impossible pairings of intonation contours and sentence patterns.
Conversely, if intonation does not differentiate thematic roles, then one would
expect to find utterances exemplifying all of the potential pairings of intonation
contours and thematic roles, with no empty cells in the table.
The facts are quite clear. Even a cursory examination of a small subset of the
naturalistic corpus turns up examples of all sixteen potential pairings of intonation
contours and sentence patterns: there are no empty cells in the table. Thus, there is
no correlation between the intonation contours defined in (19) and the sentence
patterns represented in (20): intonation does not differentiate thematic roles in Riau
Indonesian.
In examples (21)-(28) below, each of the sixteen pairings of intonation contours
and sentence patterns is illustrated with an utterance from the naturalistic corpus; for
easy cross-referencing, the number of each example is shown in the appropriate cell
in the table. As in examples (2)-(4) previously, the activity word is in boldface,
while the relevant associated participant is in italics. (In some of the examples, the
pairing of intonation contour and sentence pattern extends over just part of a larger
utterance; in such cases, the remaining parts of the utterance are enclosed in
parentheses. Breaks between intonation groups, either within the relevant part of the
utterance or outside of it, are represented with commas.)
dah Vid
PFCT FAM|David
[From narrative about village boy and sparrowhawk; boy has fallen
off a bridge into a mangrove tree]
‘He was caught there, he was safe, he fell asleep, the boy Yung’
jauh ‘kan )
far Q
(Intonation contour D)
[Playing billiards on laptop computer]
‘The white ball’s gone in’
dia bilang)
3 say
[From horror story about ungrateful son who tries to rob his mother’s
tomb; at the end of the story, the mother’s ghost tries to snatch her
son’s hand]
‘“I want your hand” she said’
Each of the above eight numbered examples presents a near minimal pair, as close a
contrast as one is likely to find in a naturalistic corpus. Within each pair, the
intonation contours are the same, the relative orders of activity and participant are
the same, but the thematic role of the participant is different: whereas in the first, or
(a) example, the participant is an actor, in the second, or (b) example, it is a
undergoer. Thus, each of these minimal pairs shows that for a particular intonation
contour and a particular sentence pattern, the intonation contour in question fails to
differentiate between thematic roles, allowing a certain participant to be understood
either as an actor, in the first member of the pair, or as an undergoer, in the second.
For example, (21) shows that Intonation Contour A does not differentiate
between actors and undergoers when these occur in a position preceding an activity.
Similarly, (23) shows that Intonation Contour B does not distinguish between actors
and undergoers when these come before an activity. Thus, examples (21) and (23)
refute the “pause proposal”, discussed in Section 3 above, which suggests that when
a undergoer precedes an activity, it must be followed by a pause. Such, indeed, is
the case in (21b); however, in (23b), a undergoer also precedes an activity and here,
contrary to the pause proposal, there is no pause (and there are many more examples
like this in the corpus). Moreover, in (21a) there is a pause, even though here it is an
actor rather than an undergoer that precedes the activity. Thus, examples such as
these show that when the participant in question occurs before the activity, the
presence or absence of a pause plays no role whatsoever in distinguishing actors
from undergoers.
In conjunction, then, examples (21) - (28), and many others like them in the
corpus, show quite clearly that intonation plays no role in the differentiation of
thematic roles in Riau Indonesian. To the extent that the four basic sentence patterns
in (20) are representative of the variety of sentence patterns in the language, the
above examples provide overwhelming support for Hypothesis A, as formulated in
(5a), suggesting that for each sentence there is at least one intonation contour which
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 63
7. CONCLUSION
The results of this paper underscore the need for linguistic descriptions to avoid
Eurocentric assumptions with regard to the expressive power of languages. Just
because thematic roles are central to the grammatical organization of many familiar
languages does not mean that they are of equal importance in all of the world’s
languages. Riau Indonesian shows how a language can manage just fine, fulfilling a
wide range of communicative functions, without any obligatory grammatical means
for distinguishing between thematic roles: word order, case marking, agreement, or
intonation.
More specifically, the absence of any relationship between intonation and
thematic roles in Riau Indonesian provides reinforcement for previous descriptions
of the language which have argued that it is lacking in many of the categories that
are considered to be central to the grammatical organization of most other languages.
64 DAVID GIL
The reader may have noted that no mention was made at any point in this paper of
parts of speech (such as noun and verb), syntactic categories (such as noun phrase
and verb phrase), or grammatical relations (such as subject, direct object, indirect
object, and so forth). Indeed, in Gil (1994, 1999, 2000, 2001a,b, 2002b, 2005)
it is argued that such categories are absent in Riau Indonesian. As statements of
non-existence, such claims can be readily refuted, by showing how a single
grammatical generalization makes reference to the category in question. Conversely,
such claims can be supported only in gradual incremental fashion, through the
examination, one after the other, of a wider and wider range of phenomena, each of
which can in turn be accounted for without reference to the categories in question.
In the case at hand, the absence of any correlation between intonation and thematic
roles adds further to the plausibility of the claim that Riau Indonesian does not
possess any categories whose definitions make reference to thematic roles, such as
grammatical relations, or whose prototypical characteristics involve thematic roles
in any way, such as parts of speech and syntactic categories.
How would the grammar of Riau Indonesian work in the absence of so many
commonplace grammatical categories? Following are syntactic and semantic
representations for a typical Riau Indonesian sentence, example (2a) above, ‘I’ll buy
a laser’. (For ease of exposition, the final particle ’kan in (2a) is ignored.)
(29) syntactic
representation: S
S S S
beli aku laser
BUY 1:SG LASER
semantic
representation: A (BUY, 1:SG, LASER)
8. NOTES
*
I would like to thank all my colleagues who asked whether intonation differentiates thematic roles in
Riau Indonesian, and/or insisted and perhaps still insist that it does, for providing me with the impetus to
write this paper. In particular, I am indebted to Peter Cole, Gabriella Hermon and Uri Tadmor for
numerous discussions on the issues dealt with in this paper, and to Matt Gordon for constructive
comments on an earlier draft. I am especially grateful to the many speakers of Riau Indonesian who
66 DAVID GIL
provided the naturalistic data on which this paper is based: Arief, Benny, Danzha Selpas, Desrul,
Ellyanto, Dwiarpianto, Fuad, Jumbro, Junaidi, Muchlis, Pai, Per, Riki, Rudy Chandra,
Septianbudiwibowo, Wira, Zainudin. Versions of this paper were presented at the Fifth International
Symposium on Malay/Indonesian Linguistics, Leipzig, Germany, 17 June 2001; at Topic and Focus: A
Workshop on Intonation and Meaning, University of California, Santa Barbara, CA, USA, 21 July 2001;
and at the Ninth Annual Meeting of the Austronesian Formal Linguistics Association, Cornell University,
Ithaca, NY, USA, 26 April 2002; I would like to thank participants at all three events for their helpful
comments and suggestions.
1
In addition to Riau Indonesian, some of the data cited in this paper show evidence for interference from
Siak Malay, the dialect of Malay spoken in the lower part of the Siak river basin, in Riau province. Riau
Indonesian and Siak Malay share a considerable degree of mutual intelligibility; in fact, in some cases it
is difficult to determine whether a given utterance is in one dialect or the other. Although this paper
focuses on Riau Indonesian, all of its main points are equally germane also for Siak Malay.
2
The interlinear glosses in this paper make use of the following abbreviations: AG ‘agent’; ASSOC
‘associative’; DEIC ‘deictic’; DEM ‘demonstrative’; DIST ‘distal’; DISTR ‘distributive’; EP ‘end point’; EXCL
‘exclamation’; FAM ‘familiar’; M ‘masculine’; NEG ‘negative’; PERS ‘personal’; PFCT ‘perfect’; PROX
‘proximal’; PST ‘past’; Q ‘question’; SG ‘singular’; 1 ‘first person’; 3 ‘third person.
3
Readers familiar with Malay / Indonesian may be wondering about the well-known “voice markers” and
whether they might perhaps be involved in the differentiation of thematic roles. In Riau Indonesian, the
relevant forms di- and N- are indeed present; however, their use is optional, and, crucially, they do not
help to differentiate thematic roles: sentences with di- or N- (or even both) remain indeterminate with
respect to thematic roles (see Gil 1999, 2002b for examples and detailed discussion). Perhaps the most
productive means for differentiating thematic roles in Riau Indonesian is provided by the form sama,
which can mark participants in any thematic role except that of absolutive, thereby discriminating
between roles such as, for example, actor and undergoer, by overtly marking the former. However, even
this form is optional; moreover, it is only very weakly grammaticalized, and is actually more
appropriately considered as an ordinary “content” word with a very broad and abstract meaning centered
around the notion of togetherness (see Gil 2004, for examples and argumentation).
4
Another proposal occasionally mentioned in discussions of intonation and clause structure in Malay /
Indonesian is that of Chung (1978), pertaining to a language variety that she refers to as “informal
Indonesian”, but which is actually closer to Standard Indonesian than to any of the regional colloquial
varieties (including those of Jakarta and Bandung, from where her speakers hailed). Chung is concerned
with a particular sentence pattern of the form AVP (Agent - Activity - Patient), where the V is devoid of
any morphological voice marking. For a subset of such sentences, those in which the A is a pronoun or a
proper noun, she maintains that two distinct intonation contours are available, which she calls “normal
declarative” and “subject shifting”. She then claims that these two intonation contours correspond to two
different syntactic analyses of the sentence in question, as “active” and “passive” respectively. In the
latter case, her suggestion involves the following derivation. First, an active sentence with AVP order
undergoes passivization (of the variety known in Indonesian studies as the pasif semu, or “second
passive”), resulting in a structure of the form PAV, where the P assumes some subjecthood properties,
and the A is cliticized to the V. Next, the P undergoes subject shifting, a process which moves subjects to
the end of the sentence, in this case restoring the original AVP order. Although it may seem as though
we’re back where we started, Chung asserts that such sentences are passive, and cites as evidence the
purported “subject shifting” intonation contour associated with such constructions. Whether or not the
facts are as described, and whether or not the analysis provided is the most appropriate one to account for
such facts, Chung’s proposal does not involve any suggestion to the effect that intonation may
differentiate thematic roles, since both intonation contours are associated with the same assignment of
thematic roles. Indeed, this could hardly be otherwise, since, in the variety of Indonesian described by
Chung, there is no thematic role indeterminacy of the kind illustrated in (2) - (4), and in particular no
sentences of the form PVA such as in (3b).
5
In general, in the derivation of such monosyllabic forms, the lighter of the two syllables is omitted,
while the heavier one is retained – w here the weight of the respective syllables is defined in terms of the
number of segments they contain and their position on the sonority hierarchy, greater sonority
INTONATION AND THEMATIC ROLES IN RIAU INDONESIAN 67
corresponding to lesser weight. Thus, in the above example, pan is heavier than to by dint of the
additional coda segment n; hence the familiar form of Topan is Pan, not To.
6
The name of the ludling, Warasa is derived by application of the ludling in question to the Malay /
Indonesian word bahasa ‘language’. This and other Riau Indonesian ludlings are described in detail in
Gil (2002a).
7
Occasionally, focus intonation occurs in a variant form, which might appropriately be referred to as
super-focus. Phonetically, super-focus has all the special features of ordinary focus, plus an additional
one, lip rounding on the lengthened penultimate syllable. Semantically, super-focus adds emphasis and
affective force; one common usage of super-focus is with scalar adjectives, where it lends itself to
translation into English with an accented intensifier such as “very”.
8
As far as I can tell, there are no systematic differences between the intonation contours of declarative
statements and imperatives. In fact, there would seem to be no grammatical differences whatsoever
distinguishing between sentences used to perform these two particular speech acts.
9. REFERENCES
Adelaar, K. Alexander. Proto-Malayic: The Reconstruction of Its Phonology and Parts of Its Lexicon and
Morphology, Pacific Linguistics Series C – 119. Canberra: The Australian National University, 1992.
Amran Halim. Intonasi dalam Hubungannya dengan Sintaksis Bahasa Indonesia, Seri ILDEP di bawah
Redaksi W.A.L. Stokhof. Jakarta: Penerbit Djambatan, 1984.
Chung, Sandra. “Stem Sentences in Indonesian.” In S.A. Wurm and L. Carrington (eds.), Second
International Conference on Austronesian Linguistics: Proceedings, Fascicle 1, Western
Austronesian, Pacific Linguistics Series C - No. 61, pp. 335-365. Canberra: Australian National
University, 1978.
Edmundson, Jerold A., Tung-Chiou Huang and Akiyo Pahalaan. “Phonological Strengthening in
Hsiukuluan Amis of Taiwan”, Paper presented at the Eleventh Annual Meeting of the Southeast
Asian Linguistics Society, Mahidol University, Bangkok, Thailand, 17 May 2001.
Gil, David. “The Structure of Riau Indonesian.” Nordic Journal of Linguistics 17 (1994): 179-200.
Gil, David. “Riau Indonesian as a Pivotless Language.” In E.V. Raxilina and Y.G. Testelec (eds.),
Tipologija i Teorija Jazyka, Ot Opisanija k Objasneniju, K 60-Letiju Aleksandra Evgen’evicha
Kibrika (Typology and Linguistic Theory, From Description to Explanation, For the 60th Birthday of
Aleksandr E. Kibrik), pp. 187-211. Moscow: Jazyki Russkoj Kul’tury, 1999.
Gil, David. “Syntactic Categories, Cross-Linguistic Variation and Universal Grammar.” In P. M. Vogel
and B. Comrie (eds.), Approaches to the Typology of Word Classes, Empirical Approaches to
Language Typology, pp. 173-216. New York: Mouton, 2000.
Gil, David. “Creoles, Complexity and Riau Indonesian.” Linguistic Typology 5 (2001a): 325-371.
Gil, David. “Escaping Eurocentrism: Fieldwork as a Process of Unlearning.” In P. Newman and M.
Ratliff (eds.), Linguistic Fieldwork, pp. 102-132. Cambridge: Cambridge University Press, 2001b.
Gil, David. “Ludlings in Malayic Languages: An Introduction.” In Bambang Kaswanti Purwo (ed.),
PELBBA 15, Pertemuan Linguistik (Pusat Kajian) Bahasa dan Budaya Atma Jaya: Kelima Belas,
Jakarta: Unika Atma Jaya, 2002a.
Gil, David. “The Prefixes di- and N- in Malay / Indonesian Dialects.” In F. Wouk and M. Ross (eds.), The
History and Typology of Western Austronesian Voice Systems, pp. 241-283. Canberra: Pacific
Linguistics, 2002b.
Gil, David. “Intonation Does Not Differentiate Thematic Roles in Riau Indonesian.” In A. Riehl and T.
Savella (eds.), Proceedings of the Ninth Annual Meeting of the Austronesian Formal Linguistics
Association (AFLA9), Cornell Working Papers in Linguistics 19 (2003): 64-78.
Gil, David. “Riau Indonesian sama, Explorations in Macrofunctionality.” In M. Haspelmath (ed.),
Coordinating Constructions (Typological Studies in Language 58), pp. 371-424. John Benjamins,
Amsterdam, 2004.
Gil, David. “Word Order Without Syntactic Categories: How Riau Indonesian Does It.” In A. Carnie, H.
Harley and S.A. Dooley (eds)., Verb First: On the Syntax of Verb-Initial Languages,
pp. 243-263. John Benjamins, Amsterdam, 2005.
68 DAVID GIL
Goedemans, Rob and Ellen van Zanten. “Stress and Accent in Indonesian.” In D. Gil (ed.), Studies in
Malay and Indonesian Linguistics. London: Curzon Press, to appear.
Kähler, Hans. Grammatik der Bahasa Indonesia. Wiesbaden: Otto Harrassowitz, 1956.
Nespor, Marina and Irene Vogel. Prosodic Phonology. Dordrecht: Reidel, 1986.
Nurzuir Husin, Zailoet, M. Atar Semi, Isma Nasrul Karim, Desmawati Radjab and Djurip. Struktur
Bahasa Melayu Jambi. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa, 1985.
Rehg, Kenneth L. “Proto-Micronesian Prosody.” In J.A. Edmondson and K.J. Gregerson (eds.), Tonality
in Austronesian Languages, pp. 25-46. Oceanic Linguistics Special Publication No. 24. Honolulu:
University of Hawaii Press, 1993.
Stilo, Don. “Alternative Devices for Object Marking in Middle Eastern SOV Languages”, Paper
presented at the Middle East Studies Association of North America, San Francisco, CA, USA,
29 November - 1 December 1984.
Suwarni Nursato, Sutari Harifin, Zainin Wahab, Nangsari Ahmad and Homsen Nanung. Fonologi dan
Morfologi Bahasa Lintang. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa, 1989.
Tadmor, Uri. “Can Word Accent Be Reconstructed in Malay?”, Paper presented at Third International
Symposium on Malay / Indonesian Linguistics, Amsterdam, The Netherlands, 24 August 1999.
Tadmor, Uri. “Rekonstruksi Aksen Kata Bahasa Melayu.” In Yassir Nasanius and Bambang Kaswanti
Purwo (eds.), PELBBA 13, Pertemuan Linguistik (Pusat Kajian) Bahasa dan Budaya Atma Jaya:
Ketiga Belas, Pusat Kajian Bahasa dan Budaya, pp. 153-167. Jakarta: Unika Atma Jaya, 2000.
Umar Manan, Zainuddin Amir, Nasroel Malano, Anas Syafei and Agustar Surin. Struktur Bahasa Muko-
Muko. Jakarta: Pusat Pembinaan dan Pengembangan Bahasa, 1986.
Van Ophuijsen, Ch. A. Maleische Spraakkunst. Leiden: van Doesburgh, 1915.
Zarbaliev, X.M. Jazyk Minangkabau. Moscow: Nauka, 1987.
MATTHEW GORDON
1. INTRODUCTION
While the realization of focus in languages which express focus either syntactically
or prosodically or through a combination of both prosody and syntax has been
studied relatively extensively, e.g. English (Beckman and Pierrehumbert 1986),
Korean (Cho 1990, Jun 1993), Chichewa (Kanerva 1990), Bengali (Hayes and
Lahiri 1991, Lahiri and Fitzpatrick-Cole 1999), Shanghai Chinese (Selkirk and Shen
1990), Hungarian (Horvath 1986, Kiss 1998), Hausa (Inkelas and Leben 1990),
there is very little work on languages which mark focus morphologically through
affixes or particles attached to or adjacent to focused elements. Of particular
interest is the question of whether languages with morphological marking of focus
also utilize prosodic cues to signal focus, much as languages with special word
orders associated with focus may redundantly use prosody to cue focus. In their
study of Wolof, a language which marks focus morphologically, Rialland and
Robert (2001) claim that Wolof does not use intonation to signal focus redundantly.
Beyond this study of Wolof, however, there is little phonetic literature dealing with
the prosodic manifestation of focus in languages with morphological expression of
focus. It is thus unclear to what extent languages that mark focus morphologically
tend to also employ prosodic cues to focus.1
This study attempts to broaden our understanding of the phonetics of focus by
examining prosodic cues to focus in Chickasaw, a language like Wolof with
morphological marking of focus. A number of potential pitch and duration cues to
contrastive focus are examined to determine whether Chickasaw redundantly use
both prosody and morphology to mark focus.
2. BACKGROUND ON CHICKASAW
Chickasaw is a Western Muskogean language spoken by no more than a few
hundred predominantly elderly speakers in south-central Oklahoma. Chickasaw has
been the subject of extensive work by Pamela Munro and colleagues. Munro (2005)
provides a grammatical overview of Chickasaw and includes an analyzed text of a
traditional Chickasaw story. Munro and Willmond (1994) is a dictionary that also
contains a thorough description of Chickasaw grammar. Gordon et al. (2000)
provides a quantitative phonetic description of Chickasaw and Gordon (1999, 2005)
69
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 69–82.
© 2007 Springer.
70 MATTHEW GORDON
2.1. Intonation
Chickasaw utterances may be divided into a hierarchically ordered set of prosodic
constituents (Gordon 1999, 2005). The largest clearly defined intonational unit is the
Intonation Phrase, which is marked by a f0 excursion at its right edge, typically a f0
rise in statements and a f0 fall in questions. An Intonation Phrase consists of one or
more Accentual Phrases which are canonically associated with a [LHHL] tone
sequence when there is sufficient material in the phrase. The L tone is aligned with
the left edge of the Accentual Phrase, and the first H tone occurs early in the
Accentual Phrase, typically falling on or near the second sonorant mora, with
considerable gradience in its alignment. The final two tones usually associate with
the final syllable, yielding a f0 fall on the final syllable. Stressed final syllables, those
containing a coda consonant or a long vowel (see Gordon 2002 on stress in Chickasaw)
may not realize the final low tone, however. A short Accentual Phrase, one with fewer
than three sonorant moras, may also not realize all the tones of a canonical AP, with
deletion of the initial or final L being the typical strategy for truncating the AP. An
AP with three sonorant moras is usually sufficient to realize all tones though a two
syllable AP with three sonorant moras may not realize all its tones. Schematic
examples of the realization of tones in an AP appear in (1).
(1)
a. Monomoraic 1st Syllable b. Bimoraic first syllable c. Short AP
L H H L L H H L H L
[ µ µµ ! ]AP [ µµ ! ! ! ]AP [ µ µ ]AP
n a S oÚ b a… t n am bi laÚma/ fala
2.2. Focus
Chickasaw has at least two types of focus markers (Munro and
Willmond 1994) which are suffixed to focused nouns and differ according to
whether the focused element is a syntactic subject or an object. The first focus
suffix, -ho…t when attached to subjects and –ho when affixed to objects, is termed
a “focus/inferential case ending” by Munro and Willmond (1994:liv) and will not be
discussed further in this paper. The focus of this paper is the contrastive focus
CONTRASTIVE FOCUS IN CHICKASAW 71
suffix, which is realized as -akot with subjects and as -ako)… with objects (Munro
and Willmond:liii). Although the precise semantic conditions that give rise to the
contrastive focus are not completely understood, one of its primary functions is to
attract narrow focus to the noun which it modifies. There is no comparable suffix
affixed to verbs to signal narrow focus on the verb. Sentences exemplifying the
contrastive focus suffixes and their counterparts lacking contrastive focus marking
appear in (2).
As the sentences in (2) indicate, non-focused subjects are marked with the suffix
– at, while non-focused objects may either have no overt suffix or be marked with the
suffix – a)…. The unmarked word order in Chickasaw is SOV, though other orders
are possible under certain as yet not well-understood semantic conditions, including
focus, which may be associated with fronting of the focused element. For example,
sentence (2c) could appear with a fronted object, i.e. koniako)… hat…akat pisa ‘The
man sees THE SKUNK’.
3. PRESENT STUDY
3.1. Methodology
The present study examines the prosodic realization of sentences involving
contrastive focus on subjects and verbs. Data were collected during elicitation
sessions with individual speakers. Subjects were presented with English sentences
containing a subject, object, and verb and instructed to give the Chickasaw
equivalent. Focus was elicited by offering English translations emphasizing the
focused element. Three different focus conditions were elicited: one involving
broad focus, i.e. no special focus on any particular element, one with narrow focus
on the subject and one with narrow focus on the object. Subjects repeated each
sentence between three and five times. The corpus used in the experiment appears
in Table 1.
72 MATTHEW GORDON
NO FOCUS
Speakers 1-4
hat…akat naSo…bai pisa The man sees the wolf.
hat…akat ampaska pisa The man sees my bread.
hat…akat wa…ka/ pisa The man sees the cow.
hat…akat hopa…ji/ pisa The man sees the fortune teller.
Speaker 5
na…hol…a…t naSo…ba pisa…tok The white man saw the wolf.
na…hol…a…t ampaska pisa…tok The white man saw my bread.
na…hol…a…t wa…ka/ pisa…tok The white man saw the cow.
na…hol…a…t hopa…ji/ pisa…tok The white man saw the fortune teller.
SUBJECT FOCUS
na…hol…a…kot a)…nampaka)…li/ pisa…tok THE WHITE MAN saw my flower.
na…hol…a…kot minko/ pisa(…tok) THE WHITE MAN sees(saw) the chief.
na…hol…a…kot ofo)…lo pisa…tok THE WHITE MAN saw the owl.
OBJECT FOCUS
na…hol…a…t minka…ko)… pisa…tok The white man saw THE CHIEF.
na…hol…a…t amofo)…la…ko)… pisa…tok The white man saw MY OWL.
na…hol…a…t sat…iba…piSiako)… pisa…tok The white man saw MY BROTHER.
Data was collected and analysed for a total of five female speakers. Four of the
speakers were recorded in Oklahoma in 1996 while the remaining speaker was
recorded in Los Angeles in 2002. Subjects were recorded on DAT tape while
wearing a high quality noise cancelling microphone on their heads. Data were then
transferred onto computer using Scicon MacQuirer at a sampling rate of 22.5 kHz.
Two measurements that could potentially distinguish different focus conditions
prosodically were made using the MacQuirer software. First, the average
fundamental frequency for each of the three words comprising each sentence was
calculated to determine whether focused words are produced with heightened pitch
relative to postfocus elements, a common prosodic realization of focus cross-
linguistically. Second, the duration of the pause between the subject and object and
between the object and verb was measured to ascertain the degree of juncture
CONTRASTIVE FOCUS IN CHICKASAW 73
4. RESULTS
p = .0369. Finally, verbs were found to have lower f0 values in sentences with
narrow subject focus than broad focus sentences: t(1,4) = 3.033, p = .0387. The
data recorded from this speaker did not allow for measurement of f0 values for verbs
in sentences with narrow object focus. Interestingly, a tendency to lower f0 of verbs
in sentences with narrow focus also was observed in speaker 1, though this effect
did not reach significance for this speaker.
Speaker 4 also raised f0 values for subjects in narrow focus sentences:
t(2,21) = 2.748, p =. 0120. Sentences with narrow focus on the object were not
recorded from this speaker. Focus did not impact f0 values for either objects or
verbs for speaker 4.
Speaker 5 was the only speaker for whom subject narrow focus and object
narrow focus were differentiated both from each other and from broad focus along
the f0 dimension. Interestingly, for this speaker, f0 values for subjects were highest
in object focus sentences (184Hz on average), and lowest in broad focus sentences
(158Hz on average), with intermediate values obtaining in subject focus sentences
(165Hz on average). Values differed significantly from each other between the
three focus conditions: broad focus vs. narrow subject focus, t(2,27) = 2.056,
p = .0495; broad focus vs. narrow object focus, t(2,22) = 3.919, p = .0007; narrow
subject focus vs. narrow object focus, t(2,21) = 2.811, p = .0105. Speaker 5 also
raised f0 for objects under focus relative to unfocused objects in both broad focus
sentences, t(2,23) = 3.176, p = .0042 and sentences with narrow focus on the
subject, t(2,23) = 2.456, p = .0220. Objects did not differ reliably in f0 between
broad focus and narrow subject focus sentences. Differences in focus condition did
not significantly affect f0 values for verbs.
Figures 1-3 illustrate sentences uttered by speaker 5 with three different focus
conditions. Figure 1 is realized with broad focus, Figure 2 with narrow focus on
the subject, and figure 3 with narrow focus on the object. As the figures show, the
sentence with object focus (figure 3) is associated with a blanket rising of f0 for the
subject and object (and to a lesser extent, the verb, though this is not a consistent
property of object focus). Subject focus (figure 2) triggers a raising of f0 in the
subject relative to the subject in the broad focus sentence (figure 1) but not relative
to the subject in the object focus sentence. It may also be observed that the broad
focus sentence in figure 1 differs in prosodic constituency from the two sentences
with a narrow focused element. The subject and object together form a single
Accentual Phrase when neither is focused but belong to different Accentual Phrases
when either one is focused.
CONTRASTIVE FOCUS IN CHICKASAW 75
Figure 1. Pitch track for broad focus sentence na…hol…a…t minko/ pisa…tok ‘The white man saw
the chief.’
Figure 2. Pitch track for subject focus sentence na…hol…a…kot minko/ pisa…tok ‘THE WHITE
MAN saw the chief.’
76 MATTHEW GORDON
Figure 3. Pitch track for object focus sentence na…hol…a…t minka…ko)… pisa…tok ‘The white man
saw THE CHIEF.’
Table 2. Average f0 results for individual speakers (in Hertz, N=narrow focus)
Speaker
1 2 3 4 5
Broad 191 192 160 192 158
Subject N-subj 205 201 183 210 165
N-obj 204 204 188 ---- 184
Broad 199 189 166 202 159
Object N-subj 195 196 177 203 164
N-obj 205 198 180 ---- 181
Broad 216 199 187 187 164
Verb N-subj 203 206 149 191 166
N-obj ---- ---- ---- ---- 174
In summary, both subject and object narrow focus consistently triggered raising
of f0. One speaker also displayed raising of f0 in objects under both object narrow
focus and subject narrow focus sentences. In addition, f0 for verbs was also lowered
in sentences involving narrow focus for two speakers. Somewhat surprisingly,
object narrow focus and subject narrow focus were only differentiated for one
speaker in terms of average f0 values. For this speaker, object focus triggered
raising of f0 for the focused object, as one might expect. However, this speaker also
curiously displayed higher f0 values for subjects in sentences with object focus than
for subjects under narrow focus themselves.
CONTRASTIVE FOCUS IN CHICKASAW 77
4.2. Duration
A two factor (syntactic category and focus condition) ANOVA pooling results from
all five speakers indicated a significant effect of both syntactic category and focus
on the pause duration between words in sentences: for syntactic category, F(1,284) =
6.200, p = .0133; for focus condition, F(2,284) = 11.242, p<.0001. There was also
a significant interaction between the two factors: F(2,284) = 23.029, p<.0001.
Overall, the pause between subject and object was shortest in broad focus sentences
and longest in sentences with narrow focus on the object. In contrast, the pause
between object and verb was shortest in object focus sentences and longest in
sentences with broad focus. Results averaged across speakers appear in Figure 4.
300
240
milliseconds
60
0
post-subject post-object
Figure 4. Pause durations under three different focus conditions (all speakers pooled
together, bars represent one standard deviation from mean)
Table 3. Pause duration results for individual speakers (in milliseconds, N=narrow focus)
Speaker
1 2 3 4 5
Broad 44 13 0 8 93
Post- N-subj 316 313 41 132 75
subj
N-obj 235 244 88 ---- 143
Broad 66 144 97 125 64
Post- N-subj 25 123 104 104 69
obj
N-obj 96 61 90 ---- 56
In summary, broad focus was typically associated with a very close degree of
temporal juncture between subjects and objects (with zero or nearly zero pause after
the subject for speakers 2, 3, 4), while the two narrow focus conditions were not
consistently differentiated in terms of their effect on the duration of pauses after the
subject. The two narrow focus sentence types were, however, differentiated in their
CONTRASTIVE FOCUS IN CHICKASAW 79
effect on the level of juncture between object and verb. Objects carrying narrow
focus were followed by very short pauses relative to unfocused objects both in
sentences with broad focus and sentences with narrow focus on the subject. These
patterns, though dominant, however, were not entirely consistent across speakers.
Speaker 5 differed from the other speakers in terms of pause durations after the
subject, whereas speaker 1 differed from the other speakers in her results for post-
object pauses. It should also be noted that the increased temporal proximity between
a focused object and verb observed for most speakers is not associated with
elimination of the Accentual Phrase boundary typically separating most lexical
items greater than two syllables in sentences lacking any narrow focused element.
As figure 1-3 show, the first syllable of the verb is realized with low tone, the initial
tone of a Chickasaw Accentual Phrase, which characteristically has the tonal pattern
[LHHL] (Gordon 1999, 2005).
4. DISCUSSION
This paper has shown that Chickasaw marks contrastive focus not only
morphologically but also through prosody. The strategies employed by Chickasaw
to mark focus prosodically are similar in some respects to those exploited by other
languages but also differ in some respects from other languages. Both narrow object
focus and narrow subject focus were characteristically associated with raised
f0 values for subjects, and, for one speaker, objects as well. Only one speaker
differentiated narrow object focus and narrow subject focus, however: for this
speaker, f0 values were higher for focused objects than non-focused objects. The
raising of f0 of subjects in both sentences with narrow subject focus and sentences
with narrow object focus is an unusual feature of Chickasaw, as increased f0 is
characteristically associated with only the focused element in most languages,
including English (Beckman and Pierrehumbert 1986), Korean (Jun 1993), Hausa
(Inkelas and Leben 1990). The dominant cross-linguistic pattern entailing localized
raising of f0 under focus was found only for a single Chickasaw speaker. Even this
speaker, however, displayed higher f0 values for subjects in sentences with narrow
object focus than in sentences with narrow subject focus. It thus seems that raising
of f0 is a general strategy for signalling any type of focus in Chickasaw and is not a
reliable cue to picking out which element is being focused. It is also worth noting
that two speakers displayed lowering of f0 in verbs in sentences with narrow focus
on either the subject or object. This pattern may be viewed as similar to the
deaccenting of words in the same intermediate phrase following a focused element
in English (Beckman and Pierrehumbert 1986), though focus leads only to a blanket
lowering of f0 in verbs in Chickasaw and does not actually lead to suppression of
the nuclear pitch accent in an IP final verb.
Chickasaw’s use of duration to signal focus follows, in some respects, a pattern
typical of other languages. A focused object increases the temporal proximity of the
object and following verb, a pattern similar to that found in Korean (Cho 1990, Jun
1993). It is important to note, however, that while a focused object triggers deletion
of the Accentual Phrase boundary between an object and following verb in Korean,
80 MATTHEW GORDON
the change in temporal proximity of object and verb in Chickasaw is not necessarily
associated with a change in prosodic constituency. An Accentual Phrase boundary
may also separate the verb preceding a focused object as it typically separates a verb
and a preceding unfocused object. It is conceivable, however, that examination of
more data will reveal a statistically greater likelihood for focused objects to be
grouped together in an Accentual Phrase with the following verb. Thus, it is as yet
unknown whether the temporal effects induced by placing narrow focus on the
object in Chickasaw are purely phonetic or whether the increased temporal closeness
of a focused object and verb has ramifications for prosodic constituency.
Another temporal phonetic effect triggered by narrow focus is increased
separation between the subject and object. For all but one speaker, this enhanced
level of disjuncture is associated with either narrow focus on the subject or object
and often has phonological ramifications on Accentual Phrase formation: the
subject and object are more likely to be grouped in the same Accentual Phrase when
neither carries narrow focus than when one or both does. Although the symmetry of
this effect under both narrow focus conditions, subject focus and object focus, is
atypical from a cross-linguistic standpoint, it serves to set off the focused element
from adjacent words perhaps increasing its prominence. In the case of a focused
object, the increased pause before the object complements the decreased pause
following the object. For two speakers (the pause preceding a focused object is
greater than the pause preceding an unfocused object in both sentences without
narrow focus and sentences with a focused subject. The increased disjuncture
before a focused element for this speaker accords with other languages in which a
phonological phrase boundary is obligatory before a focused constituent, e.g.
Korean (Jun 1993), Hausa (Inkelas and Leben 1990), Japanese (Pierrehumbert and
Beckman 1988), and Greek (Condoravdi 1990).
5. SUMMARY
Results of this study suggest considerable diversity among Chickasaw speakers in
their prosodic realization of focus. More generally, the examined data suggest that
Chickasaw is less reliant on prosody to signal focus than other languages in which
focus is not signalled through morphology. While broad focus sentences is
characteristically differentiated from narrow focus through its lower f0 in nouns and,
for certain speakers, higher f0 in verbs, f0 does not, with the exception of one
speaker, distinguish sentences with narrow focus on the subject from those with
narrow focus on the object. Interword pause durations appear more reliable in
cueing focus, with both narrow focus conditions triggering increased temporal
disjuncture between subject and object, presumably a strategy for increasing the
salience of focused elements. For three speakers, narrow object focus was
associated with increased temporal proximity of the object and verb relative to the
other two focus conditions, broad focus and narrow focus on the subject, a trend
which parallels the dephrasing of post-focus elements in other languages. For one
speaker, narrow focused objects were preceded by a longer pause than objects not
under narrow focus.
CONTRASTIVE FOCUS IN CHICKASAW 81
The results for Chickasaw may be contrasted with the results of Rialland and
Robert’s study of Wolof, another language with morphological expression of focus.
Rialland and Robert do not report any use of prosodic cues to focus for Wolof,
though it should be noted that their study focused on intonation, i.e. f0, the
parameter which less reliably differentiated various focus conditions in Chickasaw.
It is thus conceivable that durational cues to focus are also present in Wolof. The
present study of Chickasaw suggests that, although the role of prosodic cues to focus
may be less consistently exploited in Chickasaw than in languages without overt
focus morphology, measurable phonetic cues to focus are potentially present even in
languages in which morphology carries the primary burden in signalling focus.
6. NOTES
1
A sincere thanks to the Chickasaw speakers, who so generously provided the data discussed in this
paper, and to Pam Munro for her assistance in preparing the corpus examined in this paper, and more
generally, for her insights and suggestions related to Chickasaw prosody. Portions of the data discussed
here were collected as part of an NSF grant awarded to Peter Ladefoged and Ian Maddieson to document
the phonetic properties of endangered languages.
2
Note that the long vowel in naSo…ba ‘wolf’, hopa…ji/ ‘fortune teller’ and pisa…tok ‘saw’ are not phone-
mic long vowels but are the result of a process of rhythmic lengthening targeting a non-final vowel in an
open syllable immediately preceded by a short vowel in an open syllable (see Munro and Willmond
1994, Munro 2005 for discussion of rhythmic lengthening). Rhythmically lengthened vowels behave
parallel to phonemic long vowels phonologically and are either, depending on the speaker, identical in
length or nearly identical in length to phonemic long vowels (see Gordon et al. 2000 for phonetic data).
7. REFERENCES
Beckman, Mary and Janet Pierrehumbert. “Intonational structure in Japanese and English.” Phonology
Yearbook 3 (1986): 255-310.
Cho, Young-Mee. “Syntax and Phrasing in Korean.” In Sharon Inkelas and Draga Zec (eds.), The
Phonology-Syntax Connection, pp. 47-62. Chicago: University of Chicago Press, 1990.
Condoravdi, Cleo. “Sandhi Rules of Greek and Prosodic Theory.” In Sharon Inkelas & Draga Zec (eds.),
The Phonology-Syntax Connection, pp. 63-84. Chicago: University of Chicago Press, 1990.
Gordon, Matthew. “The Intonational Structure of Chickasaw.” Proceedings of the 14th International
Congress of Phonetic Sciences (1999): 1993-1996.
Gordon, Matthew. “Intonational phonology of Chickasaw.” In Sun-Ah Jun (ed.), Prosodic Models and
Transcription: Towards Prosodic Typology, pp. 301-330. Oxford: Oxford University Press, 2005.
Gordon, Matthew, Pamela Munro and Peter Ladefoged. “Some Phonetic Structures of Chickasaw.”
Anthropological Linguistics 42 (2000): 366-400.
Hayes, Bruce and Aditi Lahiri. “Bengali Intonational Phonology.” Natural Language and Linguistic
Theory 9 (1991): 47-96.
Horvath, Julia. Focus in the Theory of Grammar and the Syntax of Hungarian. Dordrecht: Foris, 1986.
Inkelas, Sharon and William Leben. “Where Phonology and Phonetics Intersect: the Case of Hausa
Intonation.” In John Kingston and Mary Beckman (eds.), Papers in Laboratory Phonology I:
Between the Grammar and Physics of Speech, pp. 17-34. New York: Cambridge University Press,
1990.
Jun, Sun-Ah. The Phonetics and Phonology of Korean Prosody. The Ohio State University: Doctoral
dissertation, 1993.
Kanerva, Jonni. “Focusing on Phonological Phrases in Chichewa.” In Sharon Inkelas and Draga Zec
(eds.), The Phonology-Syntax Connection, pp. 145-162. Chicago: University of Chicago Press, 1990.
Kiss, Katalin. “Identificational Focus Versus Information Focus. Language 74 (1998): 245-273.
82 MATTHEW GORDON
Lahiri, Aditi and Jennifer Fitzpatrick-Cole. “Emphatic Clitics and Focus Intonation in Bengali.” In Kager,
René and Wim Zonneveld (eds.), Phrasal Phonology, pp. 119-144. Nijmegen: University of
Nijmegen Press, 1999.
Munro, Pamela. “Chickasaw.” In H. Hardy and J. Scancarelli (eds.) Native Languages of the
Southeastern United States, pp. 114-156. Lincoln: University of Nebraska Press, 2005.
Munro, Pamela and Catherine Willmond. Chickasaw: an Analytical Dictionary. Norman: University of
Oklahoma Press, 1994.
Pierrehumbert, Janet and Mary Beckman. Japanese Tone Structure. Cambridge, Mass.: MIT Press, 1988.
Rialland, Annie and Stéphane Robert. “The intonational system of Wolof.” Linguistics 39 (2001):
893-939.
Selkirk, Elisabeth and Tong Shen. “Prosodic Domains in Shanghai Chinese.” In Sharon Inkelas and
Draga Zec (eds.), The Phonology-Syntax Connection, pp. 313-338. Chicago: University of Chicago
Press, 1990.
CARLOS GUSSENHOVEN
1. INTRODUCTION
This chapter has two aims. First, section 1.0 intends to shows that the way pitch
accents express information structure in English is subject to structural constraints.
This view is contrasted with one in which the pitch accent directly signals the
information status of the word it occurs on. The second aim, pursued in section 2.0,
is to show that there isn’t just a single semantic contrast between ‘old’ and ‘new’
information: languages express various kinds of focus meanings, like reactivating
focus, contingency focus and corrective focus.
83
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 83–100.
© 2007 Springer.
84 CARLOS GUSSENHOVEN
classics, while the focus would be narrowed down further to just classics if
the discussion was more specifically about pedantry among nineteenth-century
professors.
The variation between ‘broad’ and ‘narrow’ focus which this example shows was
earlier discussed under the rubrics of ‘normal stress’, for the ‘broadest’ case, and
‘contrastive stress’, as in other cases (where ‘stress’ is equivalent to ‘accent’)
(Newman 1946; Chomsky & Halle 1986; Bresnan 1971; Bresnan 1972). This older
view held that ‘normal’ accent patterns (which were never defined, but were
assumed to be the most natural pattern when reading out an isolated sentence) were
determined by syntactic factors, but that ‘contrastive’ accent patterns arose from
independently meaningful considerations. Thus, ‘normal’ stress was believed to
yield to formal linguistic rules, while ‘contrastive’ stress was not. This position
came under attack by Bolinger (1972) and Schmerling (1974). Bolinger stressed that
all accent placements are meaningful, and that it is impossible to draw a dividing
line between ‘normal’ and ‘contrastive’ accents. In this view, all new information
implicitly contrasts with other information: the sentence I’ll give you a BOOK does
not change its structure if the implication changes from ‘I won’t give you a cd-rom’
to ‘I won’t give you anything else’, or even to ‘I won’t behave in any other way’.
These differences are gradient and non-structural, Bolinger argued (1972), and in all
three cases the accent location is determined by the speaker’s informational bias
towards the concept ‘book’.
Bolinger’s point that ‘neutral’ and ‘contrastive’ accentuations should be
explained within a single conception of information structure was welcomed by later
researchers (Schmerling 1974; Ladd 1980; Gussenhoven 1983a; Selkirk 1984). His
position was otherwise vulnerable on two counts, however. One is that ‘contrastive’
focus may actually be expressed differently from ‘neutral’ focus. In fact, when
looking at languages other than English, these differences turn out to be of two
kinds. First, ‘contrastive’ may refer to a type of focus, to be discussed as ‘corrective’
focus in section 2.0. Even if English does not always distinguish between
‘presentational focus’ (Zubizarreta 1998; Selkirk 2002) or ‘information focus’ (Kiss
1998) and corrective focus, some languages, like European Portuguese, consistently
use different forms (Frota 1998). In such cases, the equivalents of (1a) and (1b) are
not homophonous.
(1) a. (A: Has she driven any other cars besides Fords and Chevrolets?)
B: She used to drive [a RENault CLIO]FOC
1995).1 Importantly, Schmerling observed that by the side of the SV ‘news sentence’
with its unaccented verb, (3), an unaccented predicate also appears after a non-
subject argument, as in (4). Accordingly, she formulated a principle stating that, in
news sentences, accents go to the argument (the subject and the object), but not to
the predicate. Thus, the lack of accent on died and grow has the same explanation, as
has the lack of accent on hit in (5). (All examples from Schmerling (1974)).
If we assume that ‘topic’ means ‘outside the focus’, things fall into place.
Comments are accented, topics are not; the reason why the topic in (6a) is accented
is due to its position before the focus, where accents are optional. Not only do we
now account for (6a) and (6b), we can also generalize the instruction ‘accent the
comment’. Subjects and objects are ‘arguments’, as noted by Schmerling. That is,
they represent necessary elements in the semantics of the predicate, and as such
contrast with constituents that express circumstantial conditions on the predicate,
like time, space and manner adjuncts, henceforth ‘modifiers’. Schmerling’s principle
amounts to the generalization that any focused argument, predicate or modifier
forms its own comment, except the special case of the single comment formed by a
predicate that is adjacent to one of its arguments. The argument-predicate
connection seems especially clear from cases like (7a,b). Since direction adjuncts are
arguments of verbs of motion, as in (7a), no accent appears on the verb taken, but in
(7b), where the verb bury appears in combination with a place adjunct, there are two
‘comments’, and two accents appear. The fact that Independence occurs in a
Prepositional Phrase in both cases is immaterial. (Truman, again, is a topic.)
Following Schmerling’s strategy to employ German to demonstrate the same
regularities in a language with a different word order, I give Dutch translations to
bring out the difference more clearly (Gussenhoven 1978).
Second, the fact that the semantically similar (9a) and (9b), from Gussenhoven
(1985), have accentuations that follow SAAR, not the semantics.
Third, the argument must have its head in focus. The focus constituent in (10a)
is black, and since the noun bird is outside it, the predicate cannot be deaccented. By
contrast, in (10b) the noun blackbird is included in the focus constituent, and the
pattern goes through. Both examples could be answers to a question about the well-
being of a group of birds, but only (10a) can count as a straightforward answer.
Example (10b) carries an implication of some awkward downplaying of the fact that
one of the birds has escaped (Gussenhoven 1985).
In (13a), a nonfinite clause a clock tick functions as the argument of heard in the
main clause. SAAR requires that within the nonfinite clause, the argument a clock is
accented if both it and its predicate tick are included in the focus constituent. This is
shown in (14). In the main clause, the requirement is that the argument a clock tick
is accented and its predicate heard unaccented, if both constituents are included in
the focus constituent. Since the argument is accented, on clock, the condition has
been met. The accent on clock thus functions at two levels of structure, once at the
level of a clock (tick) and once at the level of (heard) a clock tick. In the same way,
lion and a lamb are arguments of the predicate devour in (15). At the higher level,
the requirement that a lion devour a lamb, the argument of saw, be accented here is
met through the presence of two accents.
The structure of (13b) differs from (13a) in that the predicate (e.g. force,
promise, teach, tell ) takes three arguments rather than two. In (13b), there is an
object to tick and an indirect object a clock, in addition to the subject. The latter
TYPES OF FOCUS IN ENGLISH 89
licenses the unaccented predicate forced, while to tick forms its own focus domain.
Therefore, two accents appear in (16). When the direct object is a clause, as in (17),
SAAR applies within it: the argument devour a lamb is a clause, which has an
accent on the argument a lamb and leaves the predicate devour unaccented.
Selkirk (1995) offers an alternative explanation for the difference between (13a)
and (13b), which relies on the presence of a subject trace for the verb to tick in (13a),
as shown in (18a). Assuming that a pitch accent in any event licenses focus on the
word it occurs on, her syntactic theory of focus projection, which also builds on
Rochemont (1984), postulates three projection relations that license focus for higher
constituents. First, heads license focus on phrases; second, objects (i.e. internal
arguments) license focus of the head; and three, moved constituents license their
trace (Selkirk 1995). Because subjects are assumed to be raised from their clause,
they leave a trace which is focus if the subject is focus, and the trace then projects
focus to the VP and ultimately to the whole clause. In effect, because a subject trace
is now treated as an internal argument, this procedure equates subjects with objects
for the purposes of the second projection relation. It has the additional effect of
explaining why to tick in (18a) can be unaccented, and yet be focus, since (18a) has
a trace, but (13b) has not, as shown in (18b), after Selkirk (1995). Her theory is
considerably less restrictive than the one defended here. The restriction to internal
arguments in the second projection clause would appear to be rendered vacuous by
the addition of the third clause. While in the original two-clause version subjects
were incapable of projecting focus at all, in the three-clause version subjects can
project focus to the entire clause, even in a sentence like JOHNson died of natural
causes. This seems incorrect; for discussion see Gussenhoven (1999a).
2. TYPES OF FOCUS
As Dik (1980, 1997) makes clear, languages not only express information packaging
in different ways, they also express different focus meanings, or ‘focus types’.
Unlike Culicover & Rochemont (1983), I take formal characteristics rather than
contextual differences to be the criterion for recognizing a focus type. This section
lists a number of focus types that have been distinguished in English. In each case,
the form is described, and the meaning informally characterized.
TYPES OF FOCUS IN ENGLISH 91
The eventive vs. definitional distinction is akin, but not identical to the distinction
between ‘individual level’ and ‘stage level’ predicates (Kratzer 1996). Stage level
predicates involve transient qualities, as in (27a), where the redness is due to swollen
eyelids, and individual level predicates to permanent qualities, as in (27b), where
blue refers to the colour of the iris. However, (28) shows that the eventive
interpretation may combine with the inherent colour interpretation. Genericity of the
subject, as suggested by Diesing (1992), does not explain the pattern either. In (29a),
an existential subject licenses focus on the verb, but the generic subject in (29b) does
not. However, generic subjects may occur in eventive sentences, as shown by (30),
which the keeper of the last dodo might have used to announce its demise, leaving
his listeners to infer the death of the last dodo from his communication that none in
fact survive (Gussenhoven 1983c).
Having ruled out equations between ‘eventive’ and ‘stage level’ and between
‘eventive’ and ‘non-generic’, we would of course like to have a semantic definition
of ‘eventive’ that will cover all instances of this pattern. An eventive sentence reports
a change in the world. However, there are two caveats. First, the pattern would appear
to carry some additional semantic feature of ‘non-agentive’ or ‘non-volitional’
(Faber 1987). Thus, The BAby’s crying is the expected accentuation in a reply to
‘Why are you getting up?’, but so is GRANDmother’s CRYing, where a volitional
involvement of the subject is somehow conveyed by the accent on the verb. Second,
in a case like (28), there is no change in the world to report, as observed by Daniel
Bühring (personal communication, 2003). Here, the change would appear to lie in
the announcement of the relevance of blue eyes for mate selection, or for this
particular case of mate selection. Neither of these aspects seem at all easy to
incorporate in a definition of ‘eventive’.
Under definitional focus, objects retain their power to license definitional focus
on the predicate. Definitional focus thus differs from eventive focus in disallowing
subject-predicate focus domains. The accentuation of a broad-focus SOV sentence
like JOURnalists report the NEWS therefore corresponds to that of an eventive A
JOURnalist was reporting the NEWS. The next focus type, contingency focus, not
only bans Subject-Predicate focus domains, but also Predicate-Object ones, i.e. also
requires focused predicates with adjacent accented objects to be accented. The
situation can be summarized as in (31).
94 CARLOS GUSSENHOVEN
Clefting, therefore, presents a somewhat complex picture when viewed from the
perspective of information status. Since no ready generalization arises, its meaning
may not really be concerned with legitimacy or recency of information in the
background. Rather, the meaning is to exhaustively identify a constituent (Szabolcsi
1981; Kiss 1998). In (41a), the focus constituent is egy kalapot ‘a hat+ACC’. The
sentence differs from that in (41b), which also has egy kalapot in focus, in that (41a)
entails that Mary bought nothing but a hat. By contrast, in (41b) the hat may be one
of a number of items that were bought by Mary. In other words, clefting expresses
identificational focus (Kiss 1998).
The difference between (41a) and (41b) is brought out by a test attributed by Kiss to
Szabolcsi (1981). Compare (42) with (43): (42b) is semantically incompatible with
(42a), since it claims that the hat in question is the only item bought by Mary, thus
denying (42a). By contrast, no such conflict arises in the case of (43a,b), even
though the speaker of (43b) may be accused of being parsimonious with the truth.
This is true regardless of the information status of the clefted constituent. All
examples could be answers to ‘What did Mary buy?’, so that the non-clefted
constituents (that Mary bought) are unaccented, but they can also be placed in a
context in which Mary has presentational focus and the clefted constituents are old
information (in which case the examples could be answers to I wonder why no one
bought a hat or a coat or a similar item of clothing).
3. CONCLUSION
One dimension of meaning expressed by sentence-level pitch accents in languages
like English concerns the size of the focus constituent, which is expressed through
deaccentuation of constituents after the focus. Beginning with Schmerling (1974),
researchers have found that the relation between the pitch accent and the focus is
mediated through the predicate-argument structure of the sentence, which is evident
from the fact that predicates remain unaccented when they abut a focused argument.
In many cases, therefore, the accent on the argument is properly to be seen as an
accent on the predicate-argument combination, a regularity which obtains as many
times as there are clauses in the sentence.
A second dimension of meaning concerns the meaning of ‘information
packaging’ itself. The semantics would appear to involve a number of distinctions.
The above summary suggests that the speaker indicates how the information in
his expression is to be related to the hearer’s information about the mini-world about
which they are together trying to reach a state of mutual understanding. The
meanings of the melodic aspects of the pitch accent in English proposed in Brazil
(1975) and Gussenhoven (1983b) as well as those proposed by Pierrehumbert &
Hirschberg (1990) fit this type of meaning well. The former include ‘Addition’
(Brazil’s ‘Proclaiming’), used for the commitment of information to the discourse
model and signalled by falling contours, and ‘Selection’ (Brazil’s ‘Referring’), used
for reference to information in the background and signalled by falling-rising
contours. A third meaning, ‘Testing’, signals the speaker’s inability or refusal to
commit information to the discourse model, signalled by rising contours
Gussenhoven (1983b). ‘Identificational’ focus somehow doesn’t quite match these
other meanings. The information that John is the only person who caught a fish, as
conveyed by It’s John who caught a fish, concerns information content rather than
information status. Possibly, therefore, intonation can only be used for the
expression of information structure, implying that identificational focus can only be
expressed through the morphology or syntax.4
4. NOTES
1
One difference between my account and Selkirk’s theory is that the latter contains two indirectness
relations rather than one. First, there is a relation between ‘focus interpretation’ and ‘F-marked
constituent’ (the focus constituent), and there is a second relation between the F-marked constituent
and accent distribution. While in my account the first relation is trivial in the sense that the
interpretation of each focus constituent is that it is focused, and thus ‘new’, in Selkirk’s theory,
focus interpretation principles are applied to the focus constituent so as to establish which parts in it
are interpreted as ‘new’ and which as ‘given’. See also Gussenhoven (1999a).
2
I incorrectly analyzed (25) as having focus on the verb in Gussenhoven (1983a, note 5). The latter
would indeed have the same form, but is only appropriate in some context like ‘What doesn’t John
do with books?’. Ladd (1980) himself analyzed his example as having ‘default accentuation’ on the
verb, his point being that the accentuation signals that books is outside the focus, rather than that
read is included in it.
TYPES OF FOCUS IN ENGLISH 99
3
A class of ‘event’ sentences was independently identified by Cruttenden (1984) in connection
with the accentuation pattern SUBJECT-predicate. My definition referred to a focus type regardless
of syntax.
4
Recent treatments which have not been covered in this survey include Lambrecht (1994), Vallduví
& Engdahl (1996), Erteschik (1997), and Zubizarreta (1998).
5. REFERENCES
Bolinger, D. Intonation: Selected Readings. Harmondsworth: Penguin, 1972.
Bolinger, D. Review of Schmerling (1974). American Journal of Computational Linguistics (1978), 1-23.
Microfiche.
Bolinger, D. “Two views of accent.” Journal of Linguistics 21 (1985): 79-123.
Bolinger, D. More views on ‘Two views on Accent’. In On Accent, pp. 124-146. Bloomington, IN:
Reproduced by the Indiana University Linguistics Club, 1987.
Bolinger, D. Intonation and its Uses. Stanford, CA: Stanford University Press, 1989.
Brazil, D. Discourse Intonation I. Birmingham UK: English Language Research,Birmingham University,
1975.
Bresnan, J. “Sentence Stress and Syntactic Transformations.” Language 47 (1971): 257-281.
Bresnan, J. “Stress and Syntax: a Reply.” Language 48 (1972): 326-342.
Burzio, L. Principles of English Stress. Cambridge: Cambridge University Press, 1994.
Chafe, W. L. “Language and Consciousness.” Language 50 (1974): 111-113.
Chomsky, N. “Deep Structure, Surface Structure and Semantic Interpretation.” In D. D. Steinberg and L.
A. Jakobovits (eds.), Semantics: an Interdisciplinary Reader in Philosophy, Linguistics and
Psychology, pp. 183-216. Cambridge, UK: Cambridge University Press, 1971.
Chomsky, N. and M. Halle. The Sound Pattern of English. New York: Harper and Row, 1986.
Cruttenden, A. “The Relevance of Intonational Misfits.” In D. Gibbon and H. Richter (eds.), Intonation,
Accent and Rhythm. Studies in Discourse Phonology, pp. 67-76. Berlin: de Gruyter, 1984.
Culicover, P.W. and M. Rochemont. “Stress and Focus in English.” Language 59 (1983): 123-165.
De Jong, J. “On the Treatment of Focus in Functional Grammar.” GLOT, Leids Taalkundig Bulletin 3
(1980): 89-115.
Di Sciullo, A. and E. Williams. On the Definition of Word. Cambridge, MA: MIT Press, 1987.
Diesing, M. Indefinites. Cambridge University: Doctoral dissertation, 1992.
Dik, S. C. The Theory of Functional Grammar. Part 1: The Structure of the Clause. New York: Mouton
de Gruyter. Edited by Kees Hengeveld, 1997.
Dik, S. C. “On the Typology of Focus Phenomena.” GLOT, Leids taalkundig bulletin 3 (1980): 41-74.
Erteschik-Shir, N. The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997.
Faber, D. “The Accentuation of Intransitive Sentences in English.” Journal of Linguistics 23 (1987):
341-358.
Frota, S. Prosody and Focus in European Portuguese. University of Lisbon: Doctoral dissertation, 1998.
[Published by Garland, New York, 2000.]
Gussenhoven, C. Review of Schmerling (1974). Dutch Quarterly Review of Anglo-American Letters
(DQR) 8 (1978): 233-240.
Gussenhoven, C. “Focus, Mode and the Nucleus.” Journal of Linguistics 19 (1983a): 377-417.
Gussenhoven, C. A Semantic Analysis of the Nuclear Tones of English. Distributed by Indiana University
Linguistics Club (IULC). Bloomington, Indiana, 1983b.
Gussenhoven, C. “Van focus naar zinsaccent: Een regel voor de plaats van het zinsaccent in het
Nederlands.” GLOT 6 (1983c): 131-155.
Gussenhoven, C. “Two views of accent: A reply.”Journal of Linguistics 21 (1985): 125-138.
Gussenhoven, C. “Sentence Accents and Argument Structure.” In I. M. Roca (ed.), Thematic Structure:
Its Role in Grammar, pp. 79-106. Berlin/New York: Foris, 1992.
Gussenhoven, C. “Discreteness and Gradience in Intonational Contrasts.” Language and Speech 42
(1999a): 281-305.
Gussenhoven, C. “On the Limits of Focus Projection in English.” In P. Bosch and R. van der Sandt (eds.),
Focus: Linguistic, Cognitive, and Computational Perspectives, pp. 43-55. Cambridge, UK:
Cambridge University Press, 1999b.
100 CARLOS GUSSENHOVEN
Halliday, M. A. Intonation and Grammar in British English. The Hague: Mouton, 1967.
Hayes, B. Metrical Theory: Principles and Case Studies. Chicago: Chicago University Press, 1995.
Hayes, B. and A. Lahiri. “Bengali Intonational Phonology.” Natural Language and Linguistic Theory 9
(1991): 47-96.
Jackendoff, R. S. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press, 1972.
Kanerva, J. M. Focus and Phrasing in Chichewa Phonology. Stanford University: Doctoral dissertation,
1989.
Kiss, K. E. “Identificational Focus and Information Focus.” Language 74 (1998): 245-273.
Kraak, R. “Zinsaccent en syntaxis.” Studia Neerlandica 4 (1970): 41-62.
Kratzer, A. “Stage-level and Individual-level Predicates.” In G. N. Carlson and F. J. Pelletier (eds.), The
Generic Book, pp. 125-175. Chicago: Chicago University Press, 1996.
Ladd, D. R. The Structure of Intonational Meaning: Evidence from English. Bloomington: Indiana
University Press, 1980.
Ladd, D. R. “Phonological Features of Intonational Peaks.” Language 59 (1983): 721-759.
Ladd, D. R. Intonational Phonology. Cambridge: Cambridge University Press, 1996.
Lambrecht, K. Information Structure and Sentence Form. Topic, Focus, and the Mental Representation of
Discourse Referents. Cambridge: Cambridge University Press, 1994.
Newman, S. “On the Stress System of English.” Word 2 (1946): 171-187.
Pierrehumbert, J. B. and J. Hirschberg. “The Meaning of Intonational Contours in the Interpretation of
Discourse.” In P. Cohen, J. Morgan, and M. Pollack (eds.), Intentions in Communication, pp. 271-
311. Cambridge MA: MIT Press, 1990.
Rochemont, M. Focus in Generative Grammar. Amsterdam: John Benjamins, 1984.
Schauber, E. “A Comparison of English Intonation and Navajo Particle Placement.” In D. J. Napoli (ed.),
Elements of Tone, Stress, and Intonation, pp. 144-173. Washington, DC: Georgetown University
Press, 1978.
Schmerling, S. F. Aspects of English Sentence Stress. Austin: Texas University Press, 1974.
Selkirk, E. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass.: MIT
Press, 1984.
Selkirk, E. “Sentence Prosody: Intonation, Stress and Phrasing.” In J. Goldsmith (ed.), The Handbook of
Phonological Theory, pp. 550-569. Oxford: Blackwell, 1995..
Selkirk, E. “Contrastive FOCUS vs. Presentational Focus: Prosodic Evidence from English.” In B. Bel
and I. Marlien (eds.), Speech Prosody 2002. An International Conference, Aix-en-Provence.
Laboratoire Parole et Langage, CNRS and Université de Provence, 2002.
Szabolcsi, A. “The Semantics of Topic-Focus Articulation.” In J. Groenendijk, T. Janssen, and M.
Stokhof (eds.), Formal Methods in the Study of Language, pp. 513-514. Amsterdam: University of
Amsterdam, Mathematisch Centrum, 1981.
Vallduví, E. and E. Engdahl. “The Linguistic Realization of Information Packaging.” Linguistics 34
(1996): 459-519.
Winkler, S. “Focus and Secondary Predication.” Berlin: Mouton de Gruyter, 1996.
Zonneveld, W., M. Trommelen, M. Jessen, C. Rice, G. Bruce, and K. Árnason. “Wordstress in West-
Germanic and North-Germanic languages.” In H. van der Hulst (ed.), Word Prosodic Systems in the
Languages of Europe, pp. 477-603. Berlin: Mouton de Gruyter, 1999.
Zubizarreta, M. L. Prosody, Focus, and Word Order. Cambridge, MA: MIT Press, 1998.
NANCY HEDBERG AND JUAN M. SOSA
1. INTRODUCTION
Our research addresses the interface between meaning and prosody. In particular, it
concerns the way intonation plays a part in the interpretation of an utterance. For
example, we are concerned with the extent to which a falling versus a falling-rising
intonation at the end of an utterance or an extra tonal height on a specific word or
phrase affects the way the utterance is interpreted.
Information structure categories such as topic and focus have been correlated
with specific types of contours. Many authors have stated that there is a peak
associated with focus, while others have stated that there is also a peak associated
with topic. Claims have been made as to the specific sequence of underlying tones
associated with these categories, at least for constructed examples; for instance, that
focus will be marked with H* and topic will be marked with L+H*. Here, we test
these claims by analyzing the intonation and information structure of a sample of
spontaneous dialogue in English.
2. DATA
The data were taken from six half-hour episodes of the PBS political discussion
television show, The McLaughlin Group, videotaped in April and May 2001. The
host, John McLaughlin, discusses current issues of the day with four journalist
guests. The journalists have widely differing political beliefs and therefore the
discussions get heated and the speakers produce speech that we believe to be quite
spontaneous. The guests vary somewhat from week to week. Each half-hour episode
consists of four issues discussed. For the first five episodes, we selected the first
issue because it was the longest. For the sixth episode, we analyzed a combination of
issue two and three. Each issue is introduced by John McLaughlin in a monologue.
We didn’t analyze these portions of the videotapes. All participants are native
speakers of American English.
An advantage to analyzing the McLaughlin Group as a source of data is that
transcripts of the sessions are available on the World-Wide Web. In the few cases
where we found discrepancies between the transcript and the videotape in the
portions of the transcript we were analyzing, we corrected the transcript.
101
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 101–120.
© 2007 Springer.
102 NANCY HEDBERG AND JUAN M. SOSA
topic and ratified topic. We follow Gundel (1988) in defining topic, comment, and
focus.
Topic
An entity, E, is the topic of a sentence, S, iff, in using S, the speaker intends to
increase the addressee’s knowledge about, request information about or otherwise get
the addressee to act with respect to E.
Comment
A predication, P, is the comment of a sentence, S, iff, in using S the speaker intends
P to be assessed relative to the topic of S.
Focus
That part of the linguistic expression that realizes the comment.
The focus is very long in the majority of cases, and consists of multiple pitch
accents and sometimes multiple intonational phrases. For that reason, Hedberg
picked the final pitch-accented phrase to annotate, except in the case of it-clefts
where she picked the clefted constituent since all three it-clefts in the data were either
topic-clause it-clefts or all-comment it-clefts (Hedberg, 2000). To explain the five
categories, we’ll illustrate with examples from the passage shown in (1). Topics are
italicized and foci are bold-faced. Contrastive elements receive double underlines,
and unratified topics receive a single underline.
(1) Ms. Clift: Look, John McCain would be the first one to say this
doesn’t improve the system to perfection; it makes it
marginally better. And there’s still a possibility that
Tom DeLay, who is an enemy of the bill, will forge an
unholy alliance with Democrats in the House. Because
Democrats have figured out, they do worse under this
bill than the Republicans do. But the big thing that
comes out of this, to me, is that it’s John McCain who
gets the big legislative triumph so far in this first 100-
day period, while President Bush is looking rather
passive on a number of issues across the board,
especially foreign policy. (3/31/01)
Ratified Topic
Contrastive Topic
Unratified Topic
Contrastive Focus
Plain Focus
The topic of the entire issue is the McCain-Feingold bill on campaign finance
reform. John McCain has just gotten it passed through the Senate and the question is
how it will do in the House. John McCain is an unratified topic because Eleanor
Clift is re-establishing him as the topic here and thus he is not already established as
a topic. The bill itself is already established as the topic and thus references to it with
‘this’ and ‘it’ are coded as ratified topics. The terms ‘ratified’ and ‘unratified’ topic
come from Lambrecht and Michaelis (1998). Both ‘John McCain’ and ‘this’ are
marked as topics here because John McCain is the topic of the matrix clause and the
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 103
referent of ‘this’ is the topic of the embedded clause. The focus of both the matrix
clause and the embedded clause falls on ‘perfection.’ Plain foci are marked in bold-
face. Tom DeLay is a Republican representative and is the topic of the next sentence.
Here ‘Democrats in the House’ is marked as a contrastive focus because there is an
implicit contrast with ‘Republicans in the House’. Likewise ‘John McCain’ is a
contrastive focus because it explicitly contrasts with ‘President Bush.’ The whole it-
cleft expresses a comment here, and thus there is no topic indicated for this sentence.
In the next sentence, President Bush contrasts with John McCain and is a topic, and
hence the phrase denoting him is marked as a contrastive topic.
To help identify the topic, Hedberg used Gundel’s (1974) ‘as for’ test and
Reinhart’s (1981) ‘said about’ test. For example, in (2), ‘you’ is identified as the
topic because the sentence can be paraphrased, ‘As for you, what do you think?’.
We decided to select seven examples of each of the five categories from each
transcript for prosodic coding. For each transcript, Hedberg counted the total number
of each category and divided by seven. For example, there were 142 plain foci in
transcript 1. Division by 7 yields 20.3, so she selected every 20th example for
prosodic analysis. In this way, we acquired seven examples of each category spread
evenly across the transcript. She then printed a new copy of the transcript and
identified the 35 phrases to be analyzed with a highlighting pen, with no indication of
104 NANCY HEDBERG AND JUAN M. SOSA
information structure category. This transcript was given to Sosa, along with the
videotape, for prosodic coding. Because there were 6 transcripts, we subjected 210
phrases to prosodic coding. There were a total of 42 examples of each of the five
information-structure categories.
Sosa then listened to the videotapes and digitized each of the 210 phrases along
with some of their surrounding context. Using the Kay Computerized Speech Lab
(CSL 4300), he then analyzed the target phrases prosodically and assigned an
autosegmental sequence of tones to each phrase. He used annotations for pitch
accents (H*, L*, L+H*, H*+!H, H*+L, L*+H and H+L*), boundary tones (L%,
H%), intermediate phrase tones (L, H), downstep (!H), upstep (¡H), and increased
range (↑H). Again, in future work on this project, we plan to have two prosodic
coders, so that we can calculate an intercoder reliability statistic, to be surer that the
prosodic coding is accurate.
4. INTONATIONAL CODINGS
The intonational analysis and annotation of all digitized utterances was performed
following closely the Guidelines for ToBI Labelling (Beckman and Ayers Elam,
1997) and taking into consideration other published materials on the intonational
structure of English, notably Pierrehumbert and Hirschberg (1990) as well as other
autosegmental-metrical approaches to the phonology of intonation. The ToBI
conventions and assumptions were followed, although we introduced two additional
pitch accents that we felt were necessary in order to account for certain distinct
patterns. For example, we rescued the H*+L pitch accent (which was originally
designed to trigger downstep) to generate a dip between two H* pitch accents, which
is not captured by the notation H* ... H* alone.
Our independent feature downstep !H allowed us to free the H*+L notation and use it
for this effect. An example of this distinction is shown in figure 1 versus figure 2.1
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 105
Figure 2. Thirty years [of serious anthropological consideration] (plain focus, 3.19)
H*+L H* LL%
We noted that the sequence H* ...H* (equivalent to the high head in the British
tradition) is quite scarce in the data since the great majority of the utterances show
some kind of downdrifting pattern. The very few instances of sequences of straight
H* sequences may show a contrast with British English, which is said to typically
have this recurring high-pitched pre-nuclear pattern.
As already mentioned, the rest of the pitch accents used in this paper were H*,
L*, L+H*, L*+H, H+L*, and H*+!H, all of them with the value assigned to them in
the ToBI notation and previous work on English intonation. Given the emphasis
on this pitch accent in this paper, we present two instances of the L+H* in figures 3
and 4.
The feature ‘increased range’ as well as the ‘upstep’ pitch accent ¡H* were added
to the tonal analysis, to specify high pitch excursions. Range is characterized by
higher peaks and low valleys, as shown in figure 5.
On the other hand, upstep is mostly a H* that is higher than any previous H*,
reversing any downdrift of declination effect, as shown in figure 6.
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 107
Figure 7. Mr. McLaughlin: Can you handle that last question? Where do you think the
international community is? (2.14)
108 NANCY HEDBERG AND JUAN M. SOSA
Figure 8. Ms. Clift: It requires a leap of faith, however, to believe that the historical
Jesus was, in fact, the son of God. (3.27)
This macro-unit doesn’t necessarily coincide with Nespor and Vogel’s (1986)
phonological utterance, and is certainly perceptible in oral discourse and visible as
such in pitch tracks.
After the intonational coding was completed, it was entered on the data
spreadsheet and we proceeded with correlating the intonational coding with the
information-structure coding.
Steedman (1991) states that foci (‘rhemes’) receive the H*LL% accent and tune
and that topics (‘themes’) receive a L+H*LH% (the so-called ‘scooped fall-rise’)
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 109
accent and tune. Vallduvi and Engdahl (1996) state that noncontrastive links
(Gundel‘s (1978) ‘unactivated topics’ or Lambrecht and Michaelis’s (1998)
‘unratified topics’) receive an L+H* pitch accent, that contrastive links are
obligatorily so marked, and that foci are marked with the pitch accent H*. Gundel
(1999) claims that topics, both new and contrastive, are marked with L+H*, and that
her category of ‘semantic focus’ (contrastive or noncontrastive) is marked with H*.
Steedman (2000a, 2000b) and Gundel and Fretheim (2004) also claim that topics
are marked with L+H* and foci with H* pitch accents. Lambrecht and Michaelis
(1998) distinguish topic accents from focus accents but don’t claim that there is any
prosodic difference between them; however, they mention in a footnote that H% may
mark topics and L% mark foci.
Pierrehumbert and Hirschberg (1990) suggest that L+H* is used to mark contrast,
or in their terms, to mark the selection of an item on a contextually-evoked salient
scale. They don’t specify whether this contrastiveness is associated with the
information structures of topic and focus. Presumably either a topic or a focus can be
marked by L+H*, according to them, just so as long as the category is contrastive in
their sense. We speculate that Gussenhoven’s (1983) fall-rise tone, which he says is
used to ‘select’ an entity from the background, corresponds to a topic accent, and that
his fall tone, which he says is used to introduce an entity into the ‘background’,
corresponds to a focus accent.
The major goal of our research was to put these hypotheses to the test.
6. PITCH ACCENTS
6.1. Does L+H* mark contrast, or topic?
With regard to L+H* marking information structure and/or contrast, we came up
with the results in Table 3:
L+H* % out of 42
Ratified Topic 1 2%
Contrastive Topic 10 24%
Unratified Topic 13 31%
Contrastive Focus 11 26%
Plain Focus 6 14%
As can be seen from the table, we did find a significant number of L+H* pitch
accents marking contrastive topics or contrastive foci, e.g. the examples shown in (3)
and (4):
110 NANCY HEDBERG AND JUAN M. SOSA
(3) Mr. Kudlow: And we need to drill oil and gas in the Rockies. And
Jeb Bush is wrong and George Bush is right; we need
L+H* !H* L+H* !H*
to drill in the Gulf of Mexico.
(contrastive topic, 6.27, 28)
(4) Mr. McLaughlin: This exit question may be superfluous, but I’m
going to hit you with it anyway. Tito cracked the
space barrier between civilians and professionals.
For the most part, was his way the right way, or
for the most part was his way the wrong way, as
L+H* LH%
Goldin would lead you to believe, Michael
Barone? (contrastive focus, 5.32)
However, Pierrehumbert and Hirschberg’s (1990) proposal that L+H* is
associated particularly with contrast does not seem to be borne out by the number of
noncontrastive topics (6) and noncontrastive foci (14) marked by this tone. Examples
of noncontrastive topics are shown in (5) and (6):
(5) Ms. Clift: A good working-class guy may well be what Jesus was.
And in fact, this is discussed in a documentary that was
produced in England. And there they can talk about
these kinds of things. I think in this country we’re still a
little nervous about suggesting that Jesus may not fit the
Westernized, romanticized ideal. In Britain, in fact, the
archbishop of Canterbury there has called Britain a
H* L+H* L* HH%
nation of atheists. In a country of 60 million people, only
a million people go to church. (unratified topic, 3.9)
(6) Mr. Barone: I used to be an editorial writer, and I’ll tell you
something, there’s a temptation to harumph when
you’re an editorial writer – (laughter) – and I’m
afraid that that was the New York Times
harumphing.
Mr. McLaughlin: Well, they could have pointed out that $20
L+H*
million given to Russia probably wound up with
Russian scientists, and that might keep them
from making Iranian nuclear bombs.
(unratified topic, 5.26)
Similarly, examples of noncontrastive, plain foci marked by L+H* are shown in
(7) and (8):
(7) Mr. McLaughlin: Well, what is – do you think that NASA has egg
on its face? (plain focus, 5.29) L+H*
!H* HL%
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 111
(8) Mr. Kudlow: I have a different view, with all respect. I think it
turns this guy into a celebrity, and I think that
L+H*LL%
actually encourages more of these heinous
actions. (plain focus, 6.5)
Example (7) is about NASA’s unwillingness to allow Mr. Tito to pay $20 million to
go up in the Space station. Example (8) is about the pending excecution of Timothy
McVeigh.
It is clear from Table 3 that L+H* is not significantly correlated with topic as
opposed to focus, since there are 11 contrastive foci marked by this pitch accent and
6 examples of plain foci, although the raw number of 17 foci versus 24 topics
represents a trend in this direction. Example (4) shows an L+H*-marked contrastive
focus, and (7) and (8) show L+H*-marked plain foci. We present in figure 9 a pitch
track for example (8):
Figure 9. I think it turns this guy into a celebrity. (plain focus 6.5)
L+H*LL%
The Information Structure category from the literature that seems to best fit the
data concerning L+H* is Gundel’s (1999) category of ‘Contrastive Focus’. Her
category of ‘Contrastive Focus’ encompasses our ‘Contrastive Topic’, ‘Unratified
Topic’ and ‘Contrastive Focus’. This composite category accounts for 83% of our
L+H* marked phrases (34 out of 41).
Except for ratified topics, which tend to be unaccented, most phrases in each
information structure category are marked by H*. Except for H*+!H, we abstracted
away here from high tones further marked with increased range, upstep or downstep.
It is interesting that L+H* is the second most frequent pitch accent in the data, after
H*. This shows that the attention to this pitch accent exhibited in the literature has
not been misplaced.
Ratified topics, unsurprisingly, tend to be unaccented. 34 out of 42 ratified topics
were encoded as personal pronouns. Four ratified topics were coded as L*. In the
case of two of these, we were unsure as to whether they really received an L* pitch
accent, or simply exhibited an unaccented rhythmic beat.
Except for the four cases of unratified topics, L* tends to mark focus, either
contrastive or plain. The five cases of L*+H all mark topics. The other pitch accents,
except for L+H*, do not exhibit any particular pattern.
We were especially curious about the phrases coded as contrastive focus,
contrastive topic or unratified topic that did not receive the L+H* pitch accent. Is this
an error of our information structure coding, or does it represent the actual prosodic
marking system of English?
One interesting class of examples to check in this regard is cleft sentences, of
which there were three in our data. We coded the clefted constituent in each case as a
contrastive focus since the meaning of the cleft sentence involves an exhaustiveness
condition on the clefted constituent. For example, in (9), it is asserted that nobody
other than the Communist Chinese are behaving as a Cold War power right now; in
particular not the United States. The proposition that the United States has been
behaving as a Cold War power has been previously evoked.
(9) Mr. Buchanan: What the United States should do, John, is pull
the ambassador home right now. The president of
the United States should say, ‘I understand why
Americans are boycotting Chinese goods, and I
believe that if this thing is not resolved
satisfactorily, it will be time to suspend PNTR for
exactly one year.’ It is the Communist Chinese
↑H* !H* HL%
who are behaving as a Cold War power right
now. (contrastive focus, 2.23)
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 113
Like the other two it-clefts, the clefted constituent here is marked by some variant of
the H* pitch accent, but it is contrastive. It is interesting that the three it-clefts are the
only examples in the data of a subject receiving narrow focus. All three are subject
clefts.
Some narrow foci were coded as contrastive, but perhaps were not treated as
contrastive by the prosodic system. For example, at the end of transcript 4,
participants were asked to grade President Bush on style and substance during his
first 100 days. Because there was a limited set of possible answers (the grades A, B,
C, D, and F), we coded the resulting narrow focus answer as a contrastive focus.
Perhaps a more refined definition of contrastive focus, one that requires the explicit
ruling out of alternatives, would exclude these cases. An example is shown in (10):
(10) Mr. McLaughlin: Yeah, what about substance?
Ms. Clift: Substance, C-minus.
H* !H* LL% (contrastive focus, 4.25)
There nevertheless are several cases of focus phrases coded as contrastive which
do rule out alternatives but are not marked L+H*. The examples shown in (11) and
(12) are explicitly contrastive in this way:
In general, it seems best to conclude that contrastive topics are only sometimes
marked L+H*. The same goes for non-contrastive topics, as examples (15) and (16)
show:
(15) Mr. McLaughlin: Can you handle that last question? Where do you
think the international community is, especially
the Third World?
Mr. O’Donnell: The international community is very
H* !H*
sympathetic to the Chinese. They’re wondering
what are we doing with the reflexive old Cold
Ward mentality of flying these missions in the
first place. (Unratified topic, 2.15)
(16) Mr. McLaughlin: Tony, what was his best move?
Mr. Blankley: I think there were two. One, coming off the
Florida event, establishing his legitimacy as
president….On a policy basis, his biggest success
is taxes….
Mr. McLaughlin: Do you see his best move as the tax cut’s
tenacity?
Mr. O’Donnell: Yes, I do. I agree with Eleanor it’s not a good tax
cut, it’s not a good policy; but it is an amazing
accomplishment to come from where it’s come
from….
Mr. McLaughlin: Actually, his best move was the handling of the
H* !H* !H*
China spy plane. He kept his cool; he kept the
country cool, he was measured and moderate.
And it worked. (unratified topic, 4.7)
In (15), ‘the international community’ expresses the topic, as it is repeated from the
question; similarly in (16), ‘his best move’ clearly expresses the topic. Indeed these
two phrases are so topical in their contexts that perhaps they should be considered
ratified topics. However, both are marked with H* (or !H*) instead of L+H*. We
present in figure 10 a pitch track for example (16):
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 115
Figure 10. Actually his best move was the handling of the China spy plane. (4.7)
H* !H* !H*
Here the entire event of Eisenhower’s refusal is being put forth as the ‘new
information’ in the discourse. The entire clause answers the question ‘What
happened?’ Nevertheless, we believe that the bold-faced constituents in (13)-(16) do
express topics, and are marked H*, contrary to predictions in the literature.
It is clear from the table that downstep is distributed across the four substantive
information structure categories approximately equally, as is increased range.
Upstep, however, seems to mark focus, either contrastive or plain, although the data
are few. It might be worth following up on this latter tentative conclusion in a more
detailed study.
8. BOUNDARY TONES
Some of the claims and suggestions in the literature concerning topic and focus
accents have involved boundary tones. For example, Lambrecht and Michaelis
(1998) suggest in a footnote that H% might mark topic and L% mark focus. Table 6
shows the distribution in our data of intermediate phrase + boundary tone relative to
information structure type.
Rise from
Fall Level Rise Bottom
LL% HL% HH% LH% TOTAL
Ratified Topic 2 0 0 0 2
Contrastive Topic 7 4 1 1 13
Unratified Topic 12 2 6 0 20
Contrastive Focus 29 1 4 5 39
Plain Focus 26 4 4 5 39
TOTAL 76 11 15 11 113
It can be seen from Table 6 that LL% is associated primarily with foci, whether
contrastive or plain, and foci are most likely to be marked by this boundary tone. It is
not surprising that foci as opposed to topics are marked by LL% since this sequence
tends to come at the end of the sentence, and topics tend to precede foci in the
sentences of the data.
Some non-final topics are, nonetheless, marked by LL%, as shown in example
(18):
THE PROSODY OF TOPIC AND FOCUS IN ENGLISH 117
(18) Mr. Barone: … we’re going to reconsider this decision that Clinton
made that would apply in six years from now, or 2006.
So nobody’s putting any extra arsenic in the water, but
Bush has given the Democrats a good talking point.
H*LL% (Unratified topic, 4.8)
There were three wh-questions and four yes-no questions that ended in phrases
we examined. Interestingly none of them received H% boundary tones. Two wh-
questions and two yes-no questions ended in LL%, and one wh-question and two
yes-no questions ended in HL%. The one alternative question in our data did end in
LH%, see example (4).
L% H% TOTAL
Topic 27 (75%) 8 (25%) 35
Focus 60 (77%) 18 (23%) 78
TOTAL 87 26 113
9. ENTIRE TUNES
Finally, Pierrehumbert (1980) and Steedman (1991) proposed that topics are
associated with entire tunes, H*LH% and L+H*LH%, respectively. Let us first look
at H*LH%.
H*LH%
Plain Focus 4
All four of these mark plain focus, and all seem to mark continuation. For example
(19) is a rejection of a previous participant’s contribution. It is continued with a
correction:
(19) Mr. McLaughlin: Lawrence and ah two other members are correct.
His style rating is probably a B, but your analysis
118 NANCY HEDBERG AND JUAN M. SOSA
L+H*LH%
Contrastive Topic 1
Contrastive Focus 5
Plain Focus 1
It is interesting that the function of four out of five of the contrastive foci
examples of this tune are contradictions. See, for instance, examples (20) and (21):
(21) Mr. McLaughlin: Well, he’s been a successful politician, and he’s
been a successful statesman, has he not?
Mr. O’Donnell: He’s done – the only thing – he was in a box with
China. He did the only thing you could do. He
hasn’t done anything extraordinary.
L+H* LH%
(contrastive focus, 4.20)
10. CONCLUSION
We conclude that while there are systematic correlations between intonation and
information structure categories, these correlations are not as straightforward as is
suggested in the literature. In particular we deny that there is any prosodic category
as distinctive as a ‘topic accent’ as opposed to a ‘focus accent.’
With regard to L+H*, we found that it falls on contrastive topics and unratified
topics and contrastive foci 24-31% of the time and on plain foci 14% of the time. It
doesn’t just fall on topics. L+H* occurred in 41 of our analyzed phrases, or
approximately 20%, which is a significant number. This shows that this accent
deserves the reputation it has received in the literature.
Minor conclusions, given the relative lack of data, are that L* tends to mark focus
and that L*+H tends to mark topic. Upstep also seems to mark focus, although again
the data are few.
Except for ratified topics which tend to be unaccented, all information structure
categories were extensively marked with H*, including unratified and contrastive
topics. The fact that pitch accents with some kind of H* occur six times more often
than L* (150 versus 26) shows that American English is an H* language, as opposed
to other languages such as Spanish in which L* predominates, at least in prenuclear
positions (Sosa, 1999).
Finally, given the fact that our results mitigate the conclusions assumed in the
literature, it is clear that investigations into intonation should be carried out on
naturally-occurring spontaneous dialogue as well as on constructed examples and
experimentally induced speech.
120 NANCY HEDBERG AND JUAN M. SOSA
11. NOTES
* Part of this research was funded by a SSHRC Small Grant from Simon Fraser University, 2001.
1
For the contour in figure 2 the ToBI Guidelines would prescribe a notation H* L+H*. The reason for
which we decided to use the H*+L is that the salient fall is completely realized during the word ‘thirty’.
The point here is that there is an important descent during this word, not that there is a rise for H* on the
word ‘years’.
2
We thank Jeanette Gundel for pointing out this general problem to us.
3
In (20) and (21), it has been suggested to us by Mark Steedman and Chungmin Lee that an alternative
information structure analysis would treat the marked phrase as topic. Note that this alternative analysis
can be justified by the ‘as for’ test as follows: ‘As for whether he is unattractive, I don’t find him so’ and
‘As for whether he has done anything extraordinary, he hasn’t.’ The point here is that the questions of
whether or not the Christ image is attractive and whether or not Bush has done something extraordinary
are relevant in their contexts and to some extent are already under discussion. Büring (2003) would also
analyze the accents in (20) and (21) as contrastive topic accents since (20) and (21) can be seen as
answers to implied subquestions in the discourse, e.g. (21) in the context of the explicit question ‘Has
Bush been a successful politician?’ negatively answers the subquestion ‘Has he done anything
extraordinary?’.
12. REFERENCES
Beckman, Mary E. and Gayle Ayers Elam. Guidelines for ToBI Labelling. Version 3. Columbus: Ohio
State University, Department of Linguistics, 1997.
Büring, Daniel, “On D-Trees, Beans, and B-accents.” Linguistics and Philosophy 26.5 (2003): 511-545.
Gundel, Jeanette. The Role of Topic and Comment in Linguistic Theory. Ph.D. Dissertation, University of
Texas, Austin, 1974.
Gundel, Jeanette. “Stress, Pronominalization and the Given-New Distinction.” University of Hawaii
Working Papers in Linguistics 10.2 (1978): 1-13.
Gundel, Jeanette. “Universals of Topic-Comment Structure.” In Michael Hammond, Edith A. Moravcsik
and Jessica R. Wirth (eds.), Syntactic Universals and Typology, pp. 209-242. Amsterdam and
Philadelphia: John Benjamins, 1988.
Gundel, Jeanette K. “On Different Kinds of Focus.” In Peter Bosch and Rob van der Sandt (eds.), Focus:
Linguistic, Cognitive, and Computational Perspectives, pp. 293-305. Cambridge: Cambridge University
Press, 1999.
Gundel, Jeanette K. and Thorstein Fretheim. “Topic and Focus.” In Laurence Horn and Gregory Ward
(eds.), The Handbook of Contemporary Pragmatic Theory. Oxford: Blackwell, (2004): 175-196.
Gussenhoven, Carlos. “Focus, Mode and the Nucleus.” Journal of Linguistics 19 (1983): 377-417.
Hedberg, Nancy. “The Referential Status of Clefts.” Language 76 (2000): 891-920.
Jackendoff, Ray. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press, 1972.
Lambrecht, Knud and Laura Michaelis. “Sentence Accent in Information Questions: Default and Projection.”
Linguistics and Philosophy 21 (1998): 477-544.
Nespor, Marina and Irene Vogel. Prosodic Phonology. Dordrecht: Foris, 1986.
Pierrehumbert, Janet. The Phonology and Phonetics of English Intonation. MIT: Ph.D. Dissertation, 1980.
[Bloomington, IN: Indiana University Linguistics Club, 1987].
Pierrehumbert, Janet and Julia Hirschberg.. “The Meaning of Intonational Contours in the Interpretation of
Discourse.” In Philip R. Cohen, Jerry Morgan, and Martha E. Pollack (eds.), Intentions in Communication,
pp. 271-311. Cambridge: MIT Press, 1990.
Reinhart, Tanya. “Pragmatics and Linguistics: an Analysis of Sentence Topics.” Philosophica 27 (1981):
53-94.
Sosa, Juan M. La Entonación de Español. Madrid: Cátedra, 1999.
Steedman, Mark. “Structure and Intonation.” Language 67 (1991): 260-296.
Steedman, Mark. The Syntactic Process. Cambridge, Mass.: MIT Press, 2000a.
Steedman, Mark. “Information Structure and the Syntax-Phonology Interface.” Linguistic Inquiry 31 (2000b):
649-689.
Vallduvi, Enric and Elisabet Engdahl. “The Linguistic Realization of Information Packaging.” Linguistics
34 (1996): 459-510.
EMIEL KRAHMER (1) AND MARC SWERTS (1,2)
PERCEIVING FOCUS
1. INTRODUCTION
Many linguists approach intonational matters from a purely speaker-oriented
perspective1. For instance, in different studies, in as far as these are empirical in
nature, evidence for particular tonal distinctions is often solely based on acoustic
analyses of fundamental frequency (F0) traces. However, if one wants to gain full
insight into how intonation ‘functions’, such an approach is arguably incomplete.
That is, a prosodic feature, as any other linguistic feature, can only be said to be
communicatively relevant if it is not only encoded in the speech signal by a speaker,
but if it also has an impact on how an utterance is processed by a listener. In other
words, claims about important intonational categories and their respective meanings
are somewhat premature if they are not backed up with results that show that these
are also relevant at the receiving end of the communication chain. Ideally, such an
analysis should be more than an individual linguist’s interpretation of a prosodic
phenomenon.
Unfortunately, one cannot simply take it for granted that all prosodic detail really
matters to a listener. One obvious, but sometimes neglected, condition is that tonal
variation clearly needs to be above a perceptual threshold to be functionally
relevant. In that respect, it is striking to see that many researchers attach functional
load to particular tonal distinctions, which, from a purely phonetic point of view, are
only minimally separable or even highly overlapping in “tonal space”. For instance,
the difference between H* and L+H*, as defined in the ToBI framework, has been
claimed to indicate semantically distinct categories such as rheme and theme
(Steedman 2000) or new and contrastive information (Pierrehumbert & Hirschberg
1990). Yet these two intonational categories are often confused by labellers who are
instructed to transcribe intonation (e.g., Pitrelli et al. 1994), even to the extent that
some investigators simply give up on the distinction. In comparison, many vowel
systems of the world obey a contrast principle, which states that any two vowels
need to be optimally distinct in order to be appropriately applicable in speech
communication (the idea of vowel dispersion, see e.g., ten Bosch 1991). Also,
linguistic systems are highly redundant in that speakers have various strategies at
their disposal to signal particular meanings. Since tonal markers of semantic events
often covary with morpho-syntactic, lexical or other prosodic cues, it is theoretically
possible that their communicative function is ‘overruled’ by that of other resources,
or by the situational or linguistic context in which they occur.
In this chapter, we argue that controlled perceptual studies allow us to investigate
the communicative importance of intonational features. Rather than concentrating on
121
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 121–137.
© 2007 Springer.
122 E. KRAHMER AND M. SWERTS
English
TOTTENHAM ONE - LIVERPOOL one
Italian
UDINESE UNO - ROMA UNO
between accent structures in Italian and Dutch, as two cases of a non-plastic and a
plastic language, respectively, have implications on the way listeners perceive focus.
Second, apart from variation between languages, the importance of pitch accents
may also depend on the communicative setting in which they are used, in particular
if we compare communicative settings in which dialogue participants can or cannot
see each other during a spoken interaction. Different studies have suggested that
there exist specific visual cues to focus structure as well. In particular, like pitch
accents, rapid eyebrow movements have been claimed to play an accentuation role
(e.g., Birdwhistell 1970, Condon 1976). It has even been argued that there is a one-
to-one connection between the two; see, for instance, the so-called Metaphor of Up
and Down (Morgan 1953, Bolinger 1985:202ff ): when the pitch rises or falls, the
eyebrows follow the same pattern. In fact, to see that there is indeed a close
connection between pitch and eyebrows, one may try to utter a two word phrase, say
“blue square”, with a pitch accent (but no corresponding eyebrow movement) on the
word “blue” and a rapid eyebrow movement (but no corresponding pitch accent) on
the word “square”. Most people find this a difficult exercise.
One of the few empirical studies devoted to the relation between pitch accents
and eyebrow movements is Cavé et al. (1996), who report on a significant
correlation between the two (in particular, and surprisingly, for the left eyebrow). It
appears that rapid eyebrow movements often co-occur with pitch accents. The
opposite is not the case: people do more with their pitch than with their eyebrows.
Cavé and co-workers suggest that eyebrow movements and pitch do not link
automatically (e.g., due to muscular synergy), but coincide for communicative
reasons. Naturally, this raises the question what these communicative reasons might
be. In the literature on Talking Heads (i.e., combinations of computer animations
with speech), there is no consensus on the timing and placement of eyebrow
movements. Pelachaud et al. (1996) note that the decision to raise the eyebrows is
affect dependent, but in the examples they discuss, pitch accents and eyebrows
coincide. Thus to the question I know that Harry prefers POTATO chips, but what
does JULIA prefer? the Talking Head of Pelachaud et al. (1996:19) would respond
with the following utterance, in which capitalized words again indicate an accent,
whereas overlined words are accompanied by a rapid eyebrow movement:
Cassell et al. (2001) use eyebrow raising (or “flashes” as they call them) more
sparingly. The eyebrows are raised when an object in the “rheme” is described. So in
reply to the question above, the algorithm of Cassell et al. would not produce a
‘flash’ on “Julia”. It should be noted that neither Pelachaud et al. (1996) nor Cassell
et al. (2001) report on evaluation: it is not known whether the animations are
effective in the way human listeners process the information. We get no insight in
the contribution of the eyebrow movement: its function remains unclear. Again, to
learn more about the relative importance of pitch accents and eyebrow movements,
this issue is tackled in the current study from a perceptual point of view, testing how
listeners detect focus in audiovisual stimuli.
124 E. KRAHMER AND M. SWERTS
2. MATERIALS
2.1. Speech
For all three studies, utterances were used which were obtained in a semi-
spontaneous way via a simple dialogue game. The game was played each time by
two subjects, call them A and B, separated from each other by a screen. Figures 1
and 2 visualize the experimental set-up with a bird’s-eye perspective on the starting
situation of the game and the situation after the first turn in the game. In each game,
both players have an identical set of eight cards at their disposal, each card showing
a geometrical figure in a particular colour. Four of these cards are put on a stack in
front of them, the four other cards are in a row before them. The four cards in the
stack of A are the same as the four cards in the row of B, and vice versa. The game
consists of a series of turns in which one participant gives instructions to select a
card with a particular geometrical figure and the other follows these instructions. In
each consecutive turn, the participants switch roles so that the original instruction-
giver becomes the instruction-follower, and the other way around. In turn 1, the
instruction giver, say A, begins with describing the figure on the top of his stack (“a
blue square”). After he has described this figure, he removes it from his stack and
PERCEIVING FOCUS 125
puts it behind number 1 on his list. The instruction follower, B, listens to the
description of A and removes that figure from his row of figures, and also puts it
behind number 1 on his list. Now, the participants switch roles, so that B describes
the figure that is on top of his stack (“a black triangle”), and A follows the
instructions of B which will prompt both A and B to place the card with this object
on the second place in the row with figures, and so on. The game is over when both
players have no cards left. Each pair of subjects played a sequence of eight games,
each time separated by a break of at least two minutes. Note that the players are
given the instruction to describe the figure on top of their stack in terms of its colour
and figure property. Speakers generally found it a very easy game to play, and as a
consequence there are no faulty descriptions in the respective data sets.
Figure 1. Visualization of the initial set-up of the experiment to elicit different referring
expressions. A and B represent the two participants in the dialogue game. In the actual
experiment, the different figures were given different colours. Further explanations in the text
mentioned in the first turn of the current dialogue game, it is given (G) if it was
mentioned in the previous turn and finally a property is contrastive (C) if the object
described in the previous turn had a different value for the relevant property. We
define a property to be in focus, if it is not given. (In the three studies described
below, we will ignore newness and hence in these studies a property will be in focus
if, and only if, it is contrastive.) By systematically varying the order of the cards in
the stack, target descriptions (Dutch: “blauw vierkant” (blue square); Italian:
“triangolo nero” (black triangle)) could be collected in all contexts of interest: no
contrast (all new, NN), contrast in the prefinal word (CG), contrast in the final word
(GC), all contrast (CC). Notice that in the 2-letter abbreviations, the first letter
corresponds with the contextual status of the first word, and the second letter with
the contextual status of the second word. Table 1 summarizes the situation. It is
worth noting that in the Dutch elicited utterances the adjective always precedes the
noun, whereas in the Italian data it follows the noun. In other words, if we refer to
the first word in the elicited NPs, we mean the adjective in case of the Dutch data,
and we mean the noun in the case of the Italian data.
NN (beginning of game)
B: “blue square”
CC A: “red circle”
B: “blue square”
CG A: “yellow square”
B: “blue square”
GC A: “blue triangle”
B: “blue square”
PERCEIVING FOCUS 127
Figure 2. Visualization of the set-up of the experiment after A’s first move (“blue square”)
Eight Dutch speakers were recruited from students and colleagues from IPO,
speaking the variant of standard Dutch as spoken in the Netherlands; the eight
Italian speakers we recorded were all living in Italy, and were native speakers of the
Tuscan variety of Italian. The Dutch speech materials are used in studies 1 and 3, the
Italian ones in study 2.
128 E. KRAHMER AND M. SWERTS
2.2. Animations
For study 3 we combined the Dutch speech materials with an animated talking head.
Since this was a male head, we only used the four male voices collected for Dutch.
In addition, two synthetic male voices were used, copying the intonation contours of
two of the human voices. We use both synthetic and natural voices in order to see to
what extent the naturalness of the voice influences the perception of focus. A human
voice has more natural and better sounding prosody, but a synthetic voice might be
better suitable to accompany the visual counterpart of a synthetic character. A Dutch
diphone speech synthesizer was used for the generation of the two synthetic
versions. The animations were produced with the CharToon environment (Ruttkay
et al. 1999). A 2D head of a male person formed the basis of the animations. Visual
speech is generated on the basis of a set of 48 visemes (elementary mouth positions).
Phonemes from the input are matched to corresponding visemes with a sampling
rate of 100 ms, while intermediate stages are computed using linear interpolation.
Rapid eyebrow movements coincide with the stressed syllable of either the first
(“blauwe”) or the second word (“vierkant”). Notice that these are the eyebrow
counterparts of focus on the adjective and focus on the noun respectively. We did
not include an eyebrow counterpart to “all focus”, since this would involve either a
raised eyebrow for a longer stretch of time or two rapid eyebrow movements in
succession. For Dutch subjects both of these primarily have a non-focus signalling
interpretation. It is worth stressing that in certain stimuli eyebrow movements are
associated with words which are not accented. Eyebrow movements always had the
following pattern: first, a 100 ms dynamic raising part, then a static raised part of
100 ms, and finally a 100 ms dynamic lowering part. The overall length of the
movement is comparable to the average duration of rapid eyebrow movements of
human speakers (± 375 ms, Cavé et al. 1996). We opted for slightly shorter
movements due to the overall short duration of the stimuli. Figure 3 shows two stills
from a typical animation used in the experiment.
Figure 3. Two stills from the Talking Head uttering “blauw vierkant” (blue square) with a
raised eyebrow on the first word (left) and no eyebrow action on the second word (right)
PERCEIVING FOCUS 129
3.1. Preliminaries
The first study tests to what extent Dutch listeners are able to determine the main
focus in an utterance by means of pitch accent distribution. For this purpose, we
used data collected via the game described above. Before performing the dialogue
reconstruction experiment, a distributive analysis of the target utterance “blauw
vierkant” (blue square) was carried out. A consensus labelling was done by three
independent intonation experts. The results of the labelling can be summarized as
follows: in most cases, a property which is in focus receives a pitch accent.
Interestingly the only exceptions to this general rule can be attributed to speaker
differences among the eight speakers. One group of four speakers always end their
utterance on a low boundary tone and always associate focused properties with a
pitch accent. The four remaining speakers uniformly employ high boundary tones,
and they associate the CC utterances with a single accent on the noun.
3.2. Procedure
Dialogue reconstruction data were obtained from 25 native speakers of Dutch
(different from the eight speakers). The experiment was performed on an individual
basis and was self-paced. All three versions (CG, GC and CC) of the target utterance
(“blauw vierkant”) produced by the eight speakers were used, making a total of 24
stimuli. In studies 1 and 3, Dutch subjects are presented with speech realizations of
“blauw vierkant” taken from their original context, and the task is to determine by
forced choice whether the preceding utterance would be: (1) “rood vierkant” (red
square), (2) “blauwe driehoek” (blue triangle) or (3) “rode driehoek” (red triangle).
The corresponding contexts are (1) CG (focus on the first word), (2) GC (focus on
the second word) and (3) CC (all focus), respectively. The stimuli were presented in
two random orders, to compensate for potential learning effects. Before the actual
experiment started subjects entered a brief training session (consisting of three
stimuli) to make them acquainted with the materials and the setting of the
experiment. No feedback was given on the correctness of their answers, and there
was no communication with the experimenters. Notice that the all new situation
(NN) is not incorporated in the experiment, because there are no utterances
130 E. KRAHMER AND M. SWERTS
preceding the NN so that subjects cannot reconstruct the preceding utterance. The
NN utterances have been studied extensively in Krahmer & Swerts (2001), to find
out whether there are prosodic differences between newness and contrastive accents
in this setting2.
3.3. Results
Table 2 contains the results for all eight speakers taken together. The overall
2
distribution is significantly different from chance (Ȥ = 395.3, df = 4, p < 0.001). The
first thing to note is that for each line the highest numbers are on the diagonal. This
means that each context is most likely to be classified correctly. However, these
chances are much higher in the case of single focus, on contrastive items (CG and
GC) than in the all focus case (CC). Subjects are particularly good in reconstructing
the dialogue history when the adjective is the single focused item (note that these are
the classic cases of narrow scope), which stands out prosodically due to the
occurrence of a nuclear accent in non-default position. However, also when it is the
noun that is the single item in focus, subjects are generally capable of reconstructing
the context. Interestingly, the number of confusions with the all focus (double
contrast) context increases. This seems to imply that there is at least some amount of
broad focus / narrow focus ambiguity (but see below), although the narrow focus
interpretation is still prevalent. This result is compatible with earlier findings from
Gussenhoven (1983) and Rump & Collier (1996) that these ambiguous cases are
more confusable than the CG case, which only allows a narrow focus interpretation.
In the case of double contrast there appears to be a very substantial broad vs. narrow
focus confusion.
However, looking at the results for each speaker separately (all significantly
different from chance as well), reveals an interesting difference between high and
low boundary speakers. The main difference between speakers is found for the
Table 2. Summary of the results of Study 1: classification of all 24 stimuli, for all 25 listeners
(n=600). The vertical axis indicates the actual CONTEXT of the target utterance “blauw
vierkant” (blue square). The horizontal axis indicates how many subjects CLASSIFIED the
utterance in each of the three contexts
CLASSIFIED as
CC GC CG Total
CC 95 83 22 200
CONTEXT GC 60 119 21 200
CG 10 6 184 200
PERCEIVING FOCUS 131
double contrast (CC) case. For low boundary speakers, utterances made in a CC
context are predominantly classified as CC. Strikingly, this is not the case for high
ending speakers, whose CC utterances are very frequently classified as GC
utterances, which matches the earlier observation that these speakers tend to produce
all-contrast utterances with a single accent on the noun. Thus, the fact that in table 1
CC utterances are often misclassified as GC utterances is essentially due to the
difference between low and high ending speakers rather than broad vs. narrow focus
interpretations.
4.1 Preliminaries
The second study tests to what extent Italian listeners are capable to determine the
main focus and reconstruct the dialogue history of an utterance using prosodic cues.
Before performing the dialogue reconstruction experiment, a distributive analysis of
the target utterances “triangolo nero” (black triangle) was performed. Three
independent intonation experts listened to all realizations of “triangolo nero”
produced by the eight speakers in the various contexts of interest, and decided on
which words they perceived an accent. The three judges were in full agreement:
every word is always accented, irrespective of context. All speakers produce the
same contour, namely a flat hat shape with the second accent downstepped with
respect to the first. Of course, it might be that different kinds of accents are realized
in different contexts. However, an analysis of the fundamental frequency did not
reveal any differences between contexts (see Swerts et al. 2002). In addition, we
found no evidence for a clear correlation between information status and the
perceived prominence of accents for the Italian data. Therefore, it seems a
reasonable hypothesis that, contrary to the Dutch subjects, Italian subjects will not
be able to reconstruct the dialogue history on the basis of prosodic cues.
4.2 Procedure
Subjects of the second dialogue reconstruction experiment were 25 native speakers
of Italian (different from the eight speakers), mostly from Tuscany. The experiment
was performed on an individual basis and was self-paced. All three versions (CG,
GC and CC) of the target utterance (“triangolo nero”) produced by the eight
speakers were used, making a total of 24 stimuli. In this study, Italian subjects hear
versions of “triangolo nero” (black triangle), and have to guess whether the
preceding utterance was (1) “rettangolo nero” (black rectangle), (2) “triangolo viola”
(violet triangle) or (3) “rettangolo viola” (violet rectangle), again representing the
following contexts: (1) CG (focus on the first word), (2) GC (focus on the second
word) and (3) CC (all focus) respectively. The stimuli were again presented in two
random orders, to compensate for potential learning effects. Before the actual
experiment started subjects entered a brief training session (consisting of three
stimuli) to make them acquainted with the materials and the setting of the
132 E. KRAHMER AND M. SWERTS
experiment. No feedback was given on the correctness of their answers, and there
was no communication with the experimenters.
4.3 Results
The results of the Italian reconstruction experiment on the basis of all eight speakers
are displayed in table 3. A Ȥ 2 analysis reveals that the distribution is not
significantly different from chance. Looking at the results of the eight individual
speakers, we see that the results for seven of them are not significant.3 The picture is
significantly different from the one obtained for the Dutch data (Pearson Ȥ 2 = 223.8,
df = 8, p < 0.001). Thus, as expected, Italian listeners are not able to reconstruct the
prior dialogue context on the basis of prosodic properties of the current utterance, in
contrast to Dutch listeners.
Table 3. Summary of the results of Study 2: classification of all 24 stimuli, for all 25 listeners
(n=600). The vertical axis indicates the actual CONTEXT of the target utterance “triangolo
nero” (black triangle). The horizontal axis indicates how many subjects CLASSIFIED the
utterance in each of the three contexts
CLASSIFIED as
CC GC CG Total
CC 52 70 78 200
CONTEXT GC 53 82 65 200
CG 61 73 66 200
5.1 Preliminaries
In the third study we investigate the relative contributions of pitch accents and
eyebrow movements to the perception of focus in Dutch. For this purpose, we use an
animated male Talking Head and six different male voices. Four of these voices are
human, and have also been used in study 1. The two remaining voices are synthetic,
with the respective intonation contours copied from two of the human speakers. This
makes it possible to compare the results of study 3 with those of study 1. The rapid
eyebrow movements have been shown to be clearly perceivable. A further test
indicated that the eyebrow movements boost the perceived prominence of words that
also receive a pitch accent, and downscale the prominence of unaccented words in
the direct context of the accented word (see Krahmer et al. 2002b). The question of
interest to us here is whether this also has functional ramifications.
5.2 Procedure
A total of 25 native speakers of Dutch participated in the audio-visual dialogue
history reconstruction experiment (different from the eight speakers, and also
PERCEIVING FOCUS 133
different from the 25 listeners from study 1). The experiment was individually
performed and self-paced. Subjects watched and listened to the Talking Head
uttering the two-word phrase “blauw vierkant” (blue square), with a particular
intonation contour (taken from its original context; CG, GC or CC) and a rapid
eyebrow movement on either the first or the second word. Eyebrow movements are
indicated with a hat on the relevant item; the resulting six contexts are ƘG, CƢ, ƢC,
GƘ, ƘC and CƘ. Since six voices are used the total number of stimuli is 36. The
stimuli were displayed on a high-resolution color PC screen, sound came over the
loudspeakers to the left and the right of the screen. Dutch subjects had to perform
the same task as those of study 1, except that they were now presented with
audiovisual stimuli. The stimuli were presented in two different random orders, to
compensate for possible learning effect. Before the experiment started, subjects
entered a brief training session (consisting of three stimuli) to make them acquainted
with the material and the setting of the experiment. No feedback was given on the
‘correctness’ of their answers and there was no communication with the conductor
of the experiment.
Table 4. Summary of the results of Study 3: classification of all 36 stimuli, for all 25 listeners
(n= 900). The vertical axis indicates the actual CONTEXT of the target utterance “blauw
vierkant” (blue square) plus the word which is associated with a rapid eyebrow movement.
The horizontal axis indicates how many subjects CLASSIFIED the utterance in each of the
three contexts
CLASSIFIED as
CC GC CG Total
ƘC 64 41 45 150
CƘ 59 70 21 150
ƢC 34 91 25 150
CONTEXT GƘ 33 90 27 150
ƘG 16 22 112 150
CƢ 16 30 104 150
5.3 Results
Table 4 summarizes the results. The total distribution is significantly different from
chance: Ȥ 2 = 292.2, df = 10, p < 0.001. First consider the cases with single pitch
accents, i.e., the cases with a single prosodic focus on either the adjective or the
noun. Notice that in these cases the majority of subjects indeed perceived the focus
on the adjective or the noun respectively, no matter which of the words is
accompanied by an eyebrow movement. Subjects are somewhat more likely to
classify the cases with the prosodic focus on the adjective correctly than those with
prosodic focus on the noun. Certainly for these single prosodic focus cases, the
distribution of pitch accents is more important for the perception of focus than the
placement of eyebrow movements. This is also reflected by the fact that in the post-
experiment interview, all subjects indicated that they paid most (if not all) attention
134 E. KRAHMER AND M. SWERTS
distribution is not a significant factor in this language, since within the elicited NPs
both adjective and noun are always accented, regardless of the information status.
As a result, it is not surprising that the Italian listeners fail completely to interpret
the target utterances in terms of the dialogue history (study 2). As noted in the
introduction, Italian, being a non-plastic language, has other means besides prosody
of marking information status. For instance, it has a freer word-order than plastic
languages such as Dutch, and it is known to exploit this freedom to mark
information status. However, the constraints of the experimental paradigm did not
offer any room for Italian speakers to use word-order as an indicator of information
status. Therefore it would be interesting to look for an experimental set-up in which
speakers have more freedom to describe a particular state of affairs. This might also
shed a different light on the deaccentuation debate, given that Ladd claims that
deaccentuation of complete NPs within a sentence is quite possible in languages like
Italian, which is supported by data from previous studies (Avesani, Hirschberg &
‘
Prieto 1995, D Imperio 1997, Hirschberg & Avesani 1997).
Regarding the outcome of the audiovisual test (study 3), we have found that both
auditory (accent distribution) and visual (eyebrow movement) cues can have a
significant effect on the perception of focus. However, the effect clearly differs in
magnitude; the impact of pitch accents is large, that of rapid eyebrow movements
comparatively small. The visual cues contribute more when the auditory cues are
inconclusive. Thus, for the condition which caused most confusion in study 1,
eyebrows contribute the most in study 3. One consequence of the overall dominance
of speech is that inconsistent cues go largely unnoticed (although a recent
experiment indicates that subjects have a preference for animations in which
eyebrow movements coincide with pitch accents, Krahmer et al. 2002b). That the
auditory cues appear to be more important for focus perception may —with
hindsight— be explained as follows: since human speakers do more with their pitch
than with their eyebrows, it is not unnatural that human listeners have learned to pay
more attention to changes in pitch than to eyebrow movements. It is interesting to
compare the result of study 3 with those of study 1. Since the auditory cues
dominate the visual ones, it is no surprise that the results basically confirm the
speech-only results of study 1. Nevertheless, there is clearly more confusion in the
audio-visual case. In part, the increase in confusion can be ascribed to the presence
of the eyebrow movements. Certainly, they account for much of the “confusion” in
the cases with a double pitch accent. However, eyebrows cannot account for the
slight increase in confusion for the cases with a single pitch accent. It might be that
the mere addition of a visual channel leads to more confusion (compare Doherty-
Sneddon et al. 2001).
As possible follow-up studies, it is useful to investigate real speaker behaviour in
natural interactions to gain more insight into possible visual cues. For study 3, use
was made of an analysis-by-synthesis technique, creating stimuli whose visual
properties were systematically varied to learn more about the relative effect of this
parameter on focus perception. While the manipulations were inspired by claims in
the literature, it would be nice to supplement the current results with findings of
observations on real speakers to see whether they indeed use eyebrow movements
for signaling focus as suggested here, or whether these mainly signal other types of
136 E. KRAHMER AND M. SWERTS
information, if any. It would also be highly interesting to see what happens with
Talking Heads for non-Germanic languages such as Italian. As shown above, the
results of study 2 reveal that Italian listeners systematically fail to correctly classify
the Italian utterances in terms of dialogue history when confronted with speech-only
stimuli. We are currently planning to do the dialogue reconstruction experiment with
an Italian Talking Head lifting its eyebrows on either the first (“triangolo”) or the
second word (“nero”). We would expect that rapid eyebrow movements have more
impact for the Italian head than for the Dutch one, since the auditory cues are less
informative for Italian than for Dutch. This would be in line with one of the findings
of study 3, that eyebrow movements become more important when pitch cues are
less clear.4
7. NOTES
1
This chapter presents an overview of our work on the perception of focus, a research topic that we have
been involved with since 1998. The studies focusing on the dialogue reconstruction for Dutch and Italian
are presented with more detail in Swerts et al. (2002). A preliminary version of the third, audiovisual
study is described in Krahmer et al. (2002a). Thanks are due to our colleagues Cinzia Avesani, Zsófia
Ruttkay and Wieger Wesselink for their help in carrying out these studies.
2
Superficially, newness accents and contrastive accents appear to differ in our data, but a closer look
reveals that this is not the case. In particular, at first sight it seems that (1) single contrastive items on the
adjective (CG) have a different shape from newness accents in the same position and (2) contrastive items
are judged to be more prominent than newness accents. However, (1) the difference in accent type is not
so much associated with a contrast-specific prosodic shape but with the occurrence of a nuclear accent in
a non-default position. And (2) the perceived prominence is not so much the result of inherent melodic
properties of contrastive accents but seems due to the fact that the prosodic context does not contain other
intonationally comparable pitch peaks. When the words are presented in isolation, contrastive accents are
not perceived as more prominent than newness accents.
3
The results for the eighth speaker were just above the significance threshold. This was due to the fact that
his CC utterance was often classified as CG. There is no obvious reason for this. Anyway, it is hard to see
how this can be related to information status.
4
POSTSCRIPT (2004) Since the first version of this chapter was written (2002), both follow up studies
mentioned in the discussion have been carried out. Swerts and Krahmer (2004) report on a production
experiment in which subjects were asked to pronounce short utterances with one syllable marked for
focus. When the audio-visual recordings were analysed, it was indeed found that subject may use
eyebrow movements to signal focus, but various other cues were found of which head movement and
visual articulatory emphasis were the strongest. Krahmer and Swerts (2004) describe a series of
experiments with an Italian Talking Head. Contrary to our expectations, Italian subjects made less
functional use of eyebrow movements than Dutch subjects. In general, we found a number of interesting
differences between subjects’ evaluation of Dutch and Italian Talking Heads, but all of these could be
reduced to prosodic differences between the two languages.
8. REFERENCES
Avesani C. “I Toni della RAI. Un Esercizio di Lettura Inton Ativa”. In Gli Italiani Trasmessi: la Radio,
pp. 659-727. Firenze : Accademia della Crusca, 1997.
PERCEIVING FOCUS 137
Avesani, C., J. Hirschberg, and P. Prieto. “The Intonational Disambiguation of Potentially Ambiguous
Utterances in English, Italian and Spanish.” Proceedings of the 13th International Congress of
Phonetic Sciences, pp.174-177, 1995.
Birdwhistell, R. Kinesics and Context. Philadelphia: University of Pennsylvania Press, 1970.
Bolinger, D. Intonation and its Parts, London: Edward Arnold, 1986.
ten Bosch, L. On the Structure of Vowel Systems. Aspects of an Extended Vowel Model Using Effort and
Contrast. University of Amsterdam: Doctoral dissertation, 1991.
Cassell, J., H. Vihjálmsson, and T. Bickmore. “BEAT: the Behavior Expression Animation Toolkit.”
Proceedings of SIGGRAPH'01, Los Angeles, CA, pp.477-486, 2001.
Cavé, C., I. Guaítella, R. Bertrand, S. Santi, F. Harlay, and R. Espesser. “About the Relationship between
Eyebrow Movements and F0 Variations.” Proceedings of the International Conference on Spoken
Language Processing (ICSLP), Philadelphia, pp. 2175-2179, 1996.
Condon, W. “An Analysis of Behavioral Organization.” Sign Language Studies 13 (1976): 285-318.
Cruttenden, A. “The De-accenting and Re-accenting of Repeated Lexical Items.” Proceedings of the
ESCA Workshop on Prosody, Lund, pp. 16-19, 1993.
Doherty-Sneddon, G., L. Bonner, and V. Bruce. “Cognitive Demands of Face Monitoring: Evidence for
Visuospatial Overload.” Memory and Cognition 29.7 (2001): 909-919.
Gussenhoven, C. “Testing the Reality of Focus Domains.” Language and Speech 26 (1983): 61-80.
‘
t Hart, H., R. Collier and A. Cohen. A Perceptial Study of Intonation: An Experimental-Phonetic
Approach to Speech Melody, Cambridge: Cambridge University Press, 1990.
Hirschberg, J. and C. Avesani. “The Role of Prosody in Disambiguating Potentially Ambiguous
Utterances in English and Italian.” Proceedings of the ESCA Workshop on Intonation, Athens, pp.
‘189-192, 1997.
D Imperio, M. “Narrow Focus and Focal Accent in the Neapolitan Variety of Italian.” Proceedings of the
ESCA Workshop on Intonation, Athens, pp. 87-90, 1997.
Krahmer, E. and M. Swerts. “On the Alleged Existence of Contrastive Accents.” Speech Communication
34 (2001): 391-405.
Krahmer, E. and M. Swerts. “More about Brows.” In Zs. Ruttkay and C. Pelachaud (eds.), Evaluating
ECAs. Dordrecht: Kluwer Academic Publishers, 2004.
Krahmer, E., Zs. Ruttkay, M. Swerts, and W. Wesselink. “Pitch, Eyebrows, and the Perception of Focus.”
Proceedings of Speech Prosody, Aix-en-Provence, pp. 443-446, 2002a.
Krahmer, E., Zs. Ruttkay, M. Swerts, and W. Wesselink. “Perceptual Evaluation of Audiovisual Cues for
Prominence.” Proceedings of the International Conference on Spoken Language Processing
(ICSLP), Denver, CO, pp. 1933-1936, 2002b.
Ladd, D. Intonational Phonology. Cambridge: Cambridge University Press, 1996.
Morgan, B. “Question Melodies in American English.” American Speech 2 (1953): 181-191.
Pelachaud, C., N. Badler, and M. Steedman. “Generating facial expressions for speech.” Cognitive
Science 20 (1996): 1-46.
Pierrehumbert, J. and J. Hirschberg. “The Meaning of Intonational Contours in the Interpretation of
Discourse.” In P. Cohen, J. Morgan, and M. Pollack (eds.), Intentions in Communication, pp.
342-365. Cambridge MA: MIT Press, 1990.
Pitrelli, J.F., M. Beckman, and J. Hirschberg. “Evaluation of Prosodic Transcription Labeling Reliability
in the ToBI Framework.” Proceedings of the International Conference on Spoken Language
Processing (ICSLP), Yokohama, Japan, pp.123-126, 1994.
Rump, H.H. and R. Collier. “Focus Conditions and the Prominence of Pitch-Accented Syllables,
Language and Speech 39 (1996): 1-15.
Ruttkay, Zs., P. ten Hagen, and H. Noot. “CharToon; a system to Animate 2D Cartoon Faces.”
Proceedings Eurographics, 1999.
Steedman, M. “Information Structure and the Syntax Phonology Interface.” Linguistic Inquiry 31.4
(2000): 649-689.
Swerts, M. and E. Krahmer. “Congruent and Incongruent Audiovisual Cues to Prominence.” Proceedings
of Speech Prosody, Nara, Japan, 2004.
Swerts, M., E. Krahmer, and C. Avesani. “Prosodic Marking of Information Status in Dutch and Italian:
A Comparative Analysis.” Journal of Phonetics 30.4 (2002): 629-654.
Vallduví, E. The Informational Component. University of Pennsylvania: Doctoral dissertation, 1990.
MANFRED KRIFKA
1. INTRODUCTION
In Krifka (2001) I argued that three distinct phenomena of question semantics –
alternative questions like Did it rain or not?, multiple constituent questions with
pair-list readings like Who bought what? and the focus patterns of answers to con-
stituent questions – cannot be dealt with adequately within the framework of Alter-
native Semantics. In Krifka (to appear) I argue that Alternative Semantics also is
problematic as a framework for focus semantics in general; in particular, it makes
wrong predictions in case focus occurs in syntactic islands.
In this paper I will take up an issue of Krifka (2001) again, concentrating spe-
cifically on focus patterns in answers to constituent questions. Büring (2002) argued
that the discussion of phenomena in Krifka (2001) was inconclusive, and that Alter-
native Semantics actually does not have problems with the data put forward there. I
agree with the first point, but I will also show that on closer inspection, Alternative
Semantics does not predict the correct patterns of answer focus. I will also show that
the same holds for the theory of Schwarzschild (1999) which works with Givenness
instead of a semantic notion of Focus. The Structured Meaning theory, on the other
hand, does not have these problems.
139
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 139–150.
© 2007 Springer.
140 MANFRED KRIFKA
The criterion that the meaning of the answer must be an element of the meaning
of the question is too crude to exclude answers that may express the right proposi-
tion but whose prosody does not fit to the question. The generalization is that the
position of the main accent must correspond to the wh-element of the question (see
Paul 1891 [1880]). With Jackendoff (1972) and many others, I assume that the main
accent is determined by a focus feature F in syntax. In modern terminology, we can
rephrase Paul’s observation as: The F-feature of the question must correspond to the
wh-constituent of the answer. This is illustrated by the following question-answer
pairs.
THE SEMANTICS OF QUESTIONS 141
The second clause of this congruence criterion, (10.ii), excludes answers with
focus in the wrong place, like the infelicitous answers of (6) and (7). To see this,
consider example (6):
The question meaning and the answer meaning in (12) share one proposition,
namely the proposition λi[READ(i)(ULYSSES)(JOHN)], but the question meaning is not
a subset of the answer meaning.
The congruence criterion also predicts that answers must be focus marked, as
otherwise the alternative meaning is reduced to a singleton set, and the subset
requirement cannot be satisfied:
In general, the congruence criterion (10) ensures that there is enough focus
marking. For example, it rules out question-answer pairs like (14) but allows for
question-answer pairs like (15):
But it is evident that congruence criterion, as it stands, does not rule out too much
focus marking. For example, it allows for unfelicitous question-answer relations as
in (16):
This was the major point of criticism in Krifka (2001). In that paper, I also consi-
dered other possible congruence criteria within Alternative Semantics that assume
additional restrictions of the alternatives introduced by focus and by wh-elements,
but I concluded that they could not systematically exclude overfocused or under-
focused answers.
With this congruence criterion, the problematic example (16) can be ruled out. To
see this, consider the following three potential answers to the question Which
student read Ulysses? and their alternative sets.
All answers satisfy clause (i) of the congruence criterion. Answers (a) and (b) also
satisfy clause (ii), as [[(18)]] ⊆ [[(18.a)]]A and [[(18)]] ⊆ [[(18.b)]]A, but answer (c) is
ruled out by it, as [[(18)]] ⊄ [[(18.c)]]A. Clause (iii) rules out answer (a), as it has
more focus marking than (b): Where (a) has two F markings, (b) only has one.
The underlying idea is that focus marking has to be used sparingly, to achieve
the required purpose of ensuring that the question meaning is a subset of the alter-
native meanings of the answer. This could plausibly be modelled within optimality
theory by two constraints: A higher ranked one that requires the focus marking to
capture the meaning of the question (that is, [[Q]] ⊆ [[A]]A), and a lower ranked one
that prefers minimal focus marking.
144 MANFRED KRIFKA
Extending the congruence criterion by clause (iii) is a promising move, but no-
tice that (iii) contains a notion that is undefined so far, namely “less focus marking”.
It is clear what less focus marking means when comparing sentences like (18.a) and
(b): In (a), there is an additional focus feature that (b) lacks, and in this sense (b)
shows less focus marking. But there are cases in which it is not so clear what less
focus marking should mean. In particular, we should consider cases of broad and
narrow focus, and compare them with cases of more or less focus.
Let us start with the following case, in which the question asks for an activity,
indicated by the verb do. I again specify the meaning of the question and the alter-
native meanings of potential answers.
The VP question (19) asks for any property of John that is an activity. Here, Dset
is the domain of meanings that are functions from indices to functions from entities
to predicates, that is, the domain of properties, type set (or, in another notation, ¢s,
¢e, t²²), and Dseet is the domain of relations-in-intension, type seet. If the answer is
formed with a transitive verb, as in (19.a), the accent on the object NP marks focus
on the whole VP, a case of so-called focus projection or accent percolation. The
answer (b) with object NP focus, which happens to be homophonous with (b), is
unfelicitous. The same holds for answers like (c), with focus on the transitive verb.
Also, answer (d) is unfelicitous; it would be felicitous in the context of a question
like what happened. Again, the marking is similar to (a) by focus projection, with
the main accent on the direct object.
Obviously, all answers satisfy clause (i) of the congruence criterion (17). Ans-
wers (b) and (c) are ruled out by clause (ii), as we have [[(19)]] ⊄ [[(19.b)]]A,
[[(19.c)]]A. The question asks for activities of John in general; the alternatives of the
answer are restricted to reading activities by John and to relations of John to Ulys-
ses, respectively. Answers (a) and (d) satisfy clause (ii), as we have [[(19)]] ⊄
[[(19.a)]]A, [[(19.d)]]A. Answer (d) should then be excluded by clause (iii) if we inter-
pret “less” focus marking as meaning “more narrow” focus marking, if two expres-
sions are compared that differ only insofar as one has a broader focus marking than
the second.
THE SEMANTICS OF QUESTIONS 145
Consider now the following multiple constituent question and two potential
answers.
Multiple wh-questions are often supposed to be answered by a list answer a fact that I
will disregard here. In the appropriate answer, each wh-element of the question corres-
ponds to a focus of the answer, cf. (20.a). This satisfies clause (ii); we have [[(20)]]
⊆ [[(20.a)]]A. Answer (20.b) is not felicitous, even though we have [[(20)]] ⊆
[[(20.b)]]A. Can (20.b) be ruled out by clause (iii) of the congruence criterion? We
have to decide what counts as less focusation: While (20.a) has more focus features,
(20.b) has a broader focus. If we want to keep up our general hypothesis, then we
must assume that broad focus is worse than having more foci:
(21) When two answers A and A′ compete, where both expressions are equal
except that A has more but smaller foci, and A′ has fewer but broader
foci, A is to be preferred over A′.
Consider now again question (19), repeated here, and two potential answers:
Notice that (22.a) is a good answer but (22.b) is infelictous. Both answers satisfy
clause (ii) of the congruence criterion. In particular, answer (22.b) does, as we have
[[(22)]] ⊆ [[(22.b)]]A. To see this, we have to prove that each element of [[(22)]] is also
an element of [[(22.b)]]A. Take p to be an arbitrary element of [[(22)]]. This means
that p can be expressed as λi[P1(i)(JOHN)], where P1 is some constant of type set.
Now we can take an arbitrary constant y2 of type e and define a constant R2 of
type seet as follows: R2 := λyλxλi[P2(i)(x)]. Then we can express p as
λi[R2(i)(y2)(JOHN)], and hence we have p ∈ [[(22.b)]]A. As the choice of p was arbi-
trary, we have [[(22)]] ⊆ [[(22.b)]]A, q. e. d.1
146 MANFRED KRIFKA
The proof goes through if the choice of R2 is totally unrestricted, that is, R2 is an
arbitrary element of Dseet. This might be criticized; we might only allow “natural”
relations. But, first, it is difficult to determine what “natural” relations are. And
secondly, restricting the domain of focus alternatives easily yields to situations in
which it is not guaranteed anymore that the question meaning is a subset of the
alternatives of the answer; it might be just the other way round. See Krifka (2001)
for a discussion of alternative congruence criteria and their problems with exclu-
ding over- and underfocused answers.
Can clause (iii) of the focus criterion (17) decide between the two answers? Yes,
it can, but if we follow the preference rule (21) then it selects, incorrectly, (22.b)
over (22.a). And if we change the preference rule so that more foci are dispreferred
over broader foci, then clause (iii) would select, incorrectly, (20.b) over (20.a). This
means that the preference rule for less focusation cannot be spelled out in a general
way so that it always identifies the felicitous answer.
4. GIVENNESS AS AN ALTERNATIVE?
Büring (2002) also suggested to switch to the theory of Schwarzschild (1999) as a
generally more adequate theory of the distribution of sentence accents. In particular,
Schwarzschild assumes a rule of focus avoidance that is, in essence, the same as the
preference rule for minimal focusation expressed by (17.iii).
Schwarzschild (1999) follows Selkirk (1984) in assuming that focus on the lar-
ger constituent is licensed by focus projection. The general rule is that focus on an
argument licenses focus on the head, and focus on the head licenses focus on the
whole constituent. This is how VP focus is generated, step by step:
According to this theory, VP focus in John read Ul´ysses contains three focus
features. In contrast, multiple focus on the transitive verb and the object NP only
contains two focus features:
Hence this theory makes a clear prediction for cases in which VP focus and V focus
+ NP focus are to be compared. VP focus as in (23.c) contains more focus marking
than multiple focus on the verb and on the object NP as in (24). Consequently, every-
thing else being equal, (24) should be preferred over (23.c), and in general having
THE SEMANTICS OF QUESTIONS 147
more foci should be preferred over having broader foci. This gives us the correct
prediction for (20) but the false one for (22).
Schwarzschild’s theory adds to Selkirk’s rule of recursive F-marking the follow-
ing assumptions:
F-marking on him is allowed, even though the pronoun has a salient antecedent,
John. Why is this so? Existential type shifting of the question Q gives us the propo-
sition
∃x[PERSON(x) ∧ PRAISE(x)(MOTHER(JOHN))],
for which I will write ∃Q, for short. The existential F-closure of the answer A is
what we get when we replace the focus, if there is any, by a variable which is bound
by an existential quantifier with wide scope. In the case at hand, this is
∃x[PRAISE(x)(MOTHER(JOHN))]. Note that this is entailed by ∃Q. This means that the
sentence She praised HIMF is Given. Similarly, the VP praised HIMF is Given, as its
existential F-closure, ∃y∃x[PRAISE(x)(y)], is also entailed by ∃Q. The object noun
phrase HIMF is also Given, as it has an antecedent, John’s. Now, (25) allows for F-
marked constituents that are given, and so it allows that HIMF is F-marked. But (26)
says that F-marking should be avoided if possible. Can F-marking on HIMF be
avoided? No, because then the existential F-closure of the sentence without focus
marking, She praised him, is PRAISE(JOHN)(MOTHER(JOHN)) (notice that there is no
existential closure because there is no F-marking), and this is not entailed by
∃Q. But the projection of F-marking as in She [praisedF HIMF]F can be avoided,
as it is not necessary to ensure that the resulting existential F-closure
∃P[P(MOTHER(JOHN))] is entailed by ∃Q. This is already achieved by less focus
148 MANFRED KRIFKA
marking, on HIMF. For similar reasons, additional focus marking as in SHEF praised
HIMF, is not necessary, and hence avoided.
Schwarzschild’s account generally prefers narrow foci over broad foci, and few
foci over many. But as we have already seen, this makes wrong predictions. Con-
sider the following case again:
The existential closure of the question is ∃P[P(JOHN)].2 This entails all the pos-
sible focus closures of (29.a), which is ∃P[P(JOHN)], ∃R[R(ULYSSES)(JOHN)],
∃x[READ(x)(JOHN)] and ∃R∃x[R(x)(JOHN)]. But it also entails the focus closure of
(29.b), which is ∃R∃x[R(x)(JOHN)]. As (29.b) has less focus marking according to
Selkirk, it should be preferred, but contrary to the theory, it is not.
Clearly, the question-answer pairs (31.a) – (32.a), (31.b) – (32.b) and (31.c) –
(32.c) are congruent. We also find that the problematic cases considered above are
treated in the expected way. First, consider the following two questions:
Now, consider the following two answers. For VP focus in (36) I do not assume
focus projection in the style of Selkirk; rather, I assume that focus is assigned
directly to the VP and expressed by accent on the object NP.
Clearly, only the combinations (33) – (35) and (34) – (36) satisfy the congruence
criterion (32); no other combinations do. No rule of minimization of focus is called
for; wrong focusation leads to a direct violation of clause (i) of the congruence crite-
rion.
In conclusion, it appears that the careful consideration of focus in answers to
constituent questions argues against the alternative semantics account, and for the
structured meaning account, of questions and answers.
6. NOTES
*
Thanks to Regine Eckardt, Andreas Haida and Kerstin Schwabe for discussion of the points of this
paper, and to Daniel Büring for pointing out problems in the argumentation in Krifka (2001).
1 As a matter of fact, we can also prove that [ (22)]] ⊇ [ (22.b)]]A, that is, the two sets are equal.
2
Or rather, ∃P[P(JOHN) ∧ P: activity], as the question asks for an activity. Then it is actually unclear
whether the existential closure of the question entails the existential F-closure of the answer because this
does not have to be restricted to activities.
7. REFERENCES
Büring, Daniel. Question-Answer Congruence - Unstructured Comments on Krifka (2001). Berlin: ZAS,
2002.
Groenendijk, Jeroen and Martin Stokhof. Studies on the semantics of questions and the pragmatics of
answers, Department of Philosophy, University of Amsterdam: Doctoral Dissertation, 1984.
Hamblin, C. L. “Questions.” The Australasian Journal of Philosophy 36 (1958): 159-168.
Hamblin, C. L. “Questions in Montague English.” Foundations of Language 10 (1973): 41-53.
Jackendoff, Ray. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press, 1972.
Karttunen, Lauri. “Syntax and Semantics of Questions.” Linguistics and Philosophy 1 (1977): 3-44.
Krifka, Manfred. “For a Structured Account of Questions and Answers.” In Audiatur Vox Sapientiae. A
Festschrift for Achim von Stechow, eds. Caroline Féry and Wolfgang Sternefeld, 287-319. Berlin:
Akademie-Verlag, 2001.
Krifka, Manfred. “Association with Focus Phrases.” In Valerie Molnar and Susanne Winkler, eds., The
architecture of focus. Berlin: Mouton de Gruyter, to appear.
Paul, Hermann. Principles of the History of Language [Prinzipien der Sprachgeschichte]. Translated
from the second edition of the original by H. A. Strong. London: Longmans, Green, and Co., 1891
Leipzig, [1880].
Rooth, Mats. “A Theory of Focus Interpretation.” Natural Language Semantics 1 (1992): 75-116.
Schwarzschild, Roger. “GIVENness, AvoidF and other Constraints on the Placement of Accent.” Natural
Language Semantics 7 (1999): 141-177.
Selkirk, Elisabeth O. Phonology and Syntax: The Relation between Sound and Structure: Current studies
in Linguistics. Cambridge, Mass.: MIT Press, 1984.
von Stechow, Arnim. “Focusing and Backgrounding Operators.” In Werner Abraham, ed., Discourse
Particles, 37-84. Amsterdam: John Benjamins, 1990.
CHUNGMIN LEE
1. INTRODUCTION
In this chapter I will consider Contrastive Topic (CT), Contrastive Predicate Topic
(CPT) and Focus in information structure and their relations to intonation and
meaning, as I have attempted to account for in a series of papers on related topics1.
Particularly, I will try to see the conventional scalar implicature meanings triggered
by CPT and CT in connection with its intonation. In dealing with those phenomena,
I will use data extensively from Korean, where CT is surprisingly clearly marked
morphologically and intonationally, in comparison with data from English.
Information structure, claimed to constitute a separate component from
phonological, syntactic and semantic components (Vallduvi 1992), consists basically
of Topic – Comment or Background – Focus information. Apart from whether it
constitutes a separate component in grammar, no one can deny that it is closely
interwoven with morphological structure (particularly in Korean and Japanese),
syntactic linear and hierarchical structure, semantic structure, and prosodic
phonological structure. That is why we came to organize the present workshop and
create a volume on Topic and Focus in connection with their meaning and
intonation. Recently the phenomenon of CT in particular has been well characterised.
Through this kind of common efforts we believe we can deepen our understanding
of underlying principles governing related issues cross-linguistically.
The organization of the chapter is as follows: In 2 Contrastive Topic is
distinguished from non-contrastive Topic and from list contrastive topics, which do
not leave implicature; CT is examined in a dialogue model and the notion of sum
considered; Korean CT is shown on pitch tracks. In 3 scalar meanings are analyzed;
type-subtype scalarity and subtype scalarity are distinguished and CT’s inherent
tendency of subtype scalarity even in entities is advocated. In 4 scope relations
between scope bearers and CT and CT’s narrow-scope nature is discussed, together
with non-narrow-scope topicalization effect. In 5 Contrastive Predicate Topic and
the scope relation between CT and REASON clause are explored. 6 concludes the
chapter.
151
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 151–175.
© 2007 Springer.
152 CHUNGMIN LEE
2.1. Topic
We can view an utterance from a Topic perspective and get a Topic – Comment
structure, as follows (Topic here being a non-contrastive Topic):
peak comes on a novel at the end of the corresponding SVO English S Sam bought a
novel. Observe the intonation pattern of a Topic sentence in Korean in Fig. 1:
We will shortly see how the above Topic intonation is sharply distinct from the CT
intonation shown in Figure 2.
Here HE must be marked CT (or Topic), not F, however its intonation may be
modified in the English question sentence (the fall-rise accent remains in an echo
question (O’Connor et al 1973, Hetland 2003; in Hungarian a CT in a question is
reported in Molnar 1998). It is one of those people in the context and was mentioned
or accommodated in the previous question sentence, thus being in the background as
given. If Focus is assigned, because of rhe preceding focal wh-word, the sentence
becomes a reclamatory question such as (5):
Similarly, MARY in (6), with alternative individuals in the speaker’s mind, i.e. CT-
alternatives, not Focus alternatives must be marked CT, not F, contra Krifka (2003).
A multi-wh question (such as Who ate what? or Who kissed who?), appearing on
the top of discourse tree structures (Carlson 1983, Roberts 1995, Buring 2003)
typically requires a multi-narrow focus answer such as ‘FREDA ate the BEANSA ’ or
‘LarryA kissed NinaA (often a reciprocal alternative question), as an exhaustive
answer, a pair-list answer, etc. (cf. Krifka 2002). This will get the following dual
focal value, which Buring himself employed to criticize Roberts’ (1995)
characterization of CT as a set of propositions:
In other words, immediate daughters of the top multi-wh question are not warranted
to get a person or food in them. CT utterances cannot be felicitously at the beginning
of a discourse and they cannot be felicitously preceded by a multi-wh question
abruptly. There must be an appropriate way of introducing a topical element in the
question (Kadmon 2001 also criticized this point; see Krifka in this volume for a
structural account) and at least a D-linked wh-question may have to be given such as
Which person ate what for a subject CT question-answer (What did Fred (and Sam)
eat?-FredCT ate the beans) and Who ate which food for an object CT question-
answer daughters for real congruence in the tree. Otherwise, the derivation is
arbitrary and unpredictable, ignoring which element is previously given. Thus, a CT
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 155
is ‘about’ a given part in the previous discourse and locally ‘about’ the rest of the
CT sentence. Hence it is topical. A CT is selection of one or part of the potential
sum Topic denotations and focal in this local sense in the given potential Topic. In
the multi-foci case in Korean, the Nominative marker –ka and the Accusative
marker –rul but not the Topic marker –nun is employed (Lee 1999). The given or
accommodated part as a potential Topic of the previous discourse context must be
present to represent an appropriate CT (below in the tree), as something like
FRED/HE in (4A). In Korean, a CT occurring in a question sentence has a tone
lower than a CT in a declarative S. The most natural and relevant question that
precedes a CT answer should include a potential Topic of a sum of individuals of
<e> type or properties of <e, t> type.
Buring’s claim, on the other hand, that his proposed CT-value is rather a set of
sets of propositions against Roberts’ (1995) ‘a set of propositions’ (Kadmon 2001
also criticizes this) is surely an improvement. The CT-value of (4B), then, should be:
(8) {{x ate y 蹙 y ം De}蹙 x ം De}} = {{Fred ate the beans, Fred ate the peanuts,
Fred ate the eggplant}{Sam ate the beans, Sam ate the peanuts, Sam ate the
eggplant}{Mary ate the beans, Mary ate the peanuts, Mary ate the
eggplant}}(The variables can be equivalently bound by Ȝ operator).
In each subset above, the subject happens to be fixed and functions as Topic for
alternative objects – foods. The choice of one of the alternative foods, i.e. the
beans here, is marked Focus at the outset because it is not relativized any further,
being exhaustive. The choice of one Topic from the alternative Topics – persons, i.e.
Fred here, is focal. The would-be Topic is relativized to become a CT, involving a
focal process. In this sense, CT is both topical and focal, but because of its Topic
base, the head of the term Contrastive Topic is Topic, not Focus, as in Contrastive
Focus. Focus does not have a Topic base. Furthermore, Contrastive Topic is more
marked than Topic in its term and content. Kadmon (2000) rightly criticized this
CT-value approach for relying too much on Focus-value approach. The invariance
of an element in one subset, however, suggests its topic-hood. If it had not a superset,
it would be a non-contrastive Topic. There would not be a choice involved.
(9) *Fred ate the BEANS but Sam ate the PEANUTS.
L+H*LH% L+H*LH%
156 CHUNGMIN LEE
(10) *Fred ate the BEANS but he did not eat the PEANUTS.
L+H*LH% L+H*LH%
(11) FRED ate the beans but MARY ate the peanuts.
L+H*LH% L+H*LH%
(12) The BEANS, he doesn’t like; the EGGPLANT, he doesn’t
L+H*LH% L+H*LH%
like; and the PEANUTS, he doesn’t like, either.
In (12), many people do not like the last item having a CT contour of L+H*LH%
because they are aware that it exhausts the list of items of the identical presicates
either Brown (1980) noted that a high boundary signals that there is more to come
on the current topic. If we consider topicalized CTs as special cases of CT requiring
a special syntactic position, the most natural and typical situation in which CT
occurs is a single sentence utterance with a CT in-situ like (4B), which unmistakably
involves a conventional implicature (because it is evoked by the contrastive contour
in English or a morpheme plus a high tone in Korean and even without these
linguistic devices the same implicature can be evoked purely from context
conversationally --- Steedman (in this volume) largely came to take this position
but Buring (2003) views it as conversational) of but Sam did not eat the beans (or
but I don’t know about the rest of the people). This denial is the first evoked
implicature even when ‘Sam ate the peanuts’ but it is somewhat redundant and
trivial because the alternative that entails the denial is rather explicitly asserted.
This listing effect (with no implicature) occurs in a discourse even across speakers
or sentence boundaries. Consider Kadmon’s interesting observation in (13). The
only potential relevant kissers are Larry and Bill
(14) (After hearing that Inho didn’t come, regarding his friend Yengswu)
Yengswu-nun w-ass-e
-CT come-PAST-DEC
‘YengswuCT came.’
There is a sharp difference in pitch height between the Topic –nun (Fig. 1) (150 mh)
and the CT –nun (Fig. 2) (over 200 mh). This is why I described the CT -nun phrase
as (L)H*(%). There occurs a direct rise from L on the final syllable of the nominal
or other lexical constituent (CT target) to the CT marker –nun, a non-lexical
function element, unlike in Indo-European languages (C. Lee 2000). This implies
that contrastive accent and contour in Korean and English is different from other
focus accents. In Japanese, according to Nakanishi (in this volume), a CT marker wa
from Subject in initial position does not seem to be high, but mid-sentential CT wa
is high in tone according to my fieldwork. The marker -nun shows phrasal
boundaries, those of Intonational Phrase (IntP) or Accentual Phrase (AP)2. In
158 CHUNGMIN LEE
naturally occurring speeches, non-contrastive Topic and list Topic are so low in
pitch that marking H indiscriminately on their S-initial –nun in Jun’s (1998) K-ToBI
may have to be reconsidered, despite the tendency of LHLH AP in Korean. Because
of the phrase-final rise, CT has nothing to do with dephrasing effect witnessed in
(non-phrase-final) Focus elements (Jun 1993). Therefore, Focus may follow it. De-
phrasing is analogous to de-accenting in English (Pierrehumbert 1980), e.g. Q: Who
did Anna marry? A: (Anna married) MANNYH*LL%. Because of the following Focus,
backward deaccenting occurs and no pitch accent or boundary is marked on the
string of the non-contrastive Topic and the verb in the background (a non-
contrastive Topic given in Korean is similar, as in Fig. 1). Typologically, in Italian
and Romanian given information is not de-accented, contrastively focused elements
already lacking accent (Ladd 1996).CT –nun is also the longest in duration among
different phrase final elements. In contrast to the high pitch of the above typical CT,
observe the low pitches of the list contrastive topics in Fig. 3.
In a join semilattice, a (local) top type is entailed by its lower types in the
ontological type/sort hierarchy, and thus ‘given’ (Schwarzschild (1999) by the latter
if a lower type element occurs first, e.g. male/female→gendered, gorilla/
monkey→animal. Likewise, daughter/son→offspring (baby) but we cannot get the
idea of sum in the situation of ‘giving birth to a baby’ in (16A). Therefore, a
stronger daughter is informative and can be not CT-marked but F-marked or CF-
marked (to be discussed shortly) because an assumed intervening direct question is
an alternative disjunctive question, ‘If yes, is it a daughter or son? If the question is
(17A), we can get the notion of sum in children (or babies) and hence B.
If B’s answer is ‘Yes, I have sonsF,’ then it is exhaustive (but still can have the
conversational implicature of ‘but I don’t have daughters’ from the context. Once
(17B) is uttered, it by default evokes a scalar implicature and I say it is conventional
because it has a special fall-rise pitch contour and is not readily cancellable without
epistemic contradiction. Even an explicitly asserted proposition may at times be
cancelled in a very roundabout way, with hedges and corrections. A conventional
implicature may not be an exception to this kind of roundabout situation. The
implicature of (17B) may initially be scalar with something like “But I don’t have
daughters and I am not totally satisfied with this,’ tending to give more weight to
‘daughters’ on a pragmatically evoked scale. In a boy preference society, B’s answer,
I have daughtersCT’ may evoke a reversed scale of {daughter < son}.
Often a question is used indirectly to induce the hearer’s response on his/her
possible involvement in the event in question. For instance, ‘Who hit Mary?’ Then,
‘someone hit Mary’ is derived as presupposition via existential closure of the
interrogative (Karttunen 1977) such that λp∃x[p & p=hit(x, m)]. Next, a question,
“Did you and other people hit Mary?” is accommodated and ICT didn’t hit her is
naturally interpreted; here, I has more weight than other people (Lee 2003).
3. SCALAR MEANINGS
typical answer can be (17a) on a contextual scale of <coins, bills> (bills with greater
weight) (in this situation (17b) is infelicitous), but in a very special context, e.g.
getting on a bus, (17b) is possible, in an opposite scale <bills, coins> (coins with
greater weight).
My claim, then, is stronger than previous accounts in that scales are dually evoked in
my account, first by the semantic relations of atom – sum, member – set, subset –
superset, and subtype – type, and secondly by pragmatic ordering relations between
alternative parts, i.e. atoms, members, subsets, and subtype elements, of larger units
or wholes in the query, when individuals are discussed, as exemplified above
({coins < bills}, {daughter < son}. In other words, it is not a simple ordering of
money – coin, baby – daughter as values in a basic scale ordered by a relation
between type in the query and subtype in the reply. When the query is by sum and
the reply is by subset or atom, the reply is not enough and generates the implicature
of ‘not sum’ but the reply has affirmed the subset or atom already and it leads to ‘not
the rest or its relevant part’ even conversationally without fall-rise. This kind of
relation has been well explored by Ward and Hirschberg (1985), although they
characterised fall-rise as implicating “uncertainty,” which is general and somewhat
vague but was called “conventional implicature.” They defined scale by poset
(partially ordered set) and included in it hierarchical and linear orderings such as
spatial or temporal orderings, stages of a process, and relationships of type/subtype,
or part-whole, in addition to Ladd’s (1980) hierarchical sets ordered from root to
leaf. They give a ‘is a part of’ relation by dissertation - first chapter - first half.
They also provide a symmetric relation ‘cousin of’ creating oddness in fall-rise. One
conjunct cannot be denied, with the other being affirmed, in ‘I am John’s cousin and
he is mine’ in my account. Consider their example:
My further claim is that the lower line sister alternatives in hierarchies may typically
form scales in CT. A typical CT with an appropriate contour evokes a scalar
implicature conventionally by default but a list alternatives reading may be forced
by certain nominals in certain contexts. Consider further examples by them:
Van Rooy does not distinguish between a semantic scale arising from the hierarchy
of the sum of Beatles’ autographs (this must be posited in the assumed query
preceding (23Q)) and the individual Beatles’ autographs and a pragmatic scale
arising from different weights among different alternative Beatles. He addresses the
latter type of scale. Without any CT contour on (23A), it may have an exhaustive
interpretation with “standard” partition and list reading, evoking no particular scale
among alternative Beatles. Herburger (2000) also indicates that “When a fall contour
162 CHUNGMIN LEE
Then, its conventional implicature is polarity reversed, i.e. affirmative but the value
of weight not higher than the given value but lower than it. Therefore, the
implicature in the given context turns out to be “But I have other Beatles’ (weaker
than John Lennon in the scale of prestige) autographs.” Often the context is limited
than this, e.g. the speaker knows whether the hearer has Lennon’s and McCartny’s
and he/she knows that the hearer knows the speaker’s knowing of the fact and asks,
“Do you have Harrison’s autograph?” The reply is “I don’t have Harrison’s CT . Then
the relevant value element is the lower one: Harrison’s, generating the implicature of
“I don’t have Star’s.” This is the opposite of what happened in (23), where an
affirmative CT reply is uttered.
Now a generalization follows: if a sentence with a CT is uttered (as a reply),
contrastively (“but”) a polarity-reversed proposition with an alternative value greater,
if the reply is positive, and less, if the reply is negative, than the CT denotation, in
the pragmatic scale.
Next, let us turn to what kinds of categories can be marked CT. In Korean (and
presumably crosslinguistically), basically most categories may be marked CT
including adverbs. In Korean, however, prenominal quantifying Determiners such as
motun ‘all’ cannot be marked CT, unlike in English. Instead, their adverbial forms
(motu, ta ‘all’) can. In (25), an adverb cal ‘well’ has been marked CT and a very
high tone far over 200 hz is noticed in Fig. 5. (25) is negative and an affirmative
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 163
proposition with a weaker value than ‘well’ in the scale is implicated, such as ‘but I
know a little bit.’ This is sharply distinguished from an utterance without CT-
marking: cal molla ‘I don’t know it well,’ ‘I am not quite sure,’ which can be used
when the speaker knows (almost) nothing about it. Chierchia (2002) discusses a
similar, interesting point but does not have the idea of CT at all when it is required.
Observe:
Figure 5. Adverb CT
Figure 6. Object CT
Nominals with the Possessive marker –uy following cannot take the CT marker
neither after the nominals nor after –uy. Only predicatively used categories can take
CT (introducing the Nominalizer –ki in the prenominal modifier position, e.g.
yeyppu-ki-nun ha-n sonye ‘A prettyCT girl.’ A postpositional phrase of DP + P takes
the CT marker after P but not after DP. Ku ai-nun [cip’house’-eyse’at’-nun] nul wu-
n-ta ‘That child cries always at home.’ Contrastive Predicate Topic will be discussed
shortly. Hedberg’s (2003) example He hasn’t (H*) done anything (L+H*)
extraordinary.( L+H* LH%) [4/27/01] shows a modifier CT in a negative sentence
and evokes an affirmative implicature with a lower value such as he may have done
something ordinary. Its correspondence in Korean gets CT-marking with –nun on
the nominal kes ‘thing,’ but the CT-marking is associated with the modifier
thekpyeha-n triggers its alternatives. This is a CT and it seems that she departed
from assigning a “Contrastive Focus” to this fall-rise case (Hedberg et al in this
volume).
Let us further consider what types of sentences license CT in general. A simple
declarative sentence is a typical type and an interrogative sentence in Korean is
another. I demonstrated elsewhere (Lee 2002, etc) that in most languages CT is
licensed in relative and subordinate clauses, though restrictively crosslinguistically,
but that occurrence of non-contrastive Topic is impossible in Korean because the
relative clause head nominal comes through Topic in the relative clause during
relativization (Lee 1973) (and in Japanese as well). Complement clauses license CT
in them easily crosslinguistically, as in (27b).
(27) a. John knows a song that MARYCT sings well (from Subject)
b. John knows that MARYC T sings the song well.
In Korean, a whole complement clause can take CT before a main clause attitude or
communication verb and it can be focally associated with either the predicate
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 165
The contrastively implicated proposition may be ‘But Yumi thinks that he’s got a
point.’
Crosslinguistically, in English, German, and Korean, the pitch accent for
(information) Focus, H*(L), is distinct from the one for CT, roughly (L(+))H*(-),
whereas in Finnish and Norwegian, Focus and CT are not so distinct prosodically
(Vallduví and Vilkuna (1998:89), Fretheim (1992), Gundel (2002)).
Ladd (1980) and Jackendoff (1972) claim that fall-rise forces a narrow-scope
reading in (31) and (32) also in English.
Suppose (31) is interpreted as ษ¬, then all is exhaustive and ¬ go and there is no
continuation to a contrasted proposition with weaker affirmation (see (30) above)
‘but some men went,’ etc. The same applies to (32). Therefore, there is no scope
ambiguity in (31) and (32). Consider, however, the ‘ambiguity’ between the narrow-
scope CT and wide-scope CT reading in (32) in English advocated by Buring (1999),
Kadmon (2001).
(35) The Government did not fire two thirdsCT of the doctors. (With contrastive
fall-rise contour on ‘two thirds’)
(36) Two thirdsCT of the doctors the Government did not fire. (With contrastive
fall-rise contour on ‘two thirds’)
In this position, both a partition reading with topicalization effect (with constituent
negation possibilities as in Korean) and a scalar non- partition reading seem to be
available.
We can now see that fall-rise (in CT) in fact forces a narrow scope reading,
which is scalar, both in Korean and in English. A non-scalar partition reading is a
consequent of topicalization effect.
When CT follows a scope-bearing element such as a quantified, focal expression,
it shows narrow scope over the scope-bearing element. Observe:
The CT expression has narrow scope with respect to the preceding universal
quantifier in (37) with the meaning of ‘at least three but not more than three apples.’
It has the same effect of having a distributive marker –ssik ‘each’ attached to the
numeral classifier (sey kay-ssik-un). When the CT phrase is scrambled to the initial
position of the sentence, it still predominantly keeps narrow scope but opens the
possibility of wide scope rather marginally. Even when it comes to have wide scope
reading, ‘three apples as a whole’ is contrasted with other alternatives. Consider:
(39) evokes a scale of {arrive < go on the stage}in context and (40) readily evokes
{pass < ace the exam}. Interestingly, the former scale is not semantic but pragmatic,
in other words, the larger value ‘go on the stage’ does not entail the lower one. But
if we consider a specific context in which ‘go on the stage’ requires ‘arrive’ as a
precondition, the former entails the latter in that context and we can call it a
pragmatic entailment. The latter scale is semantic; ‘ace the exam’ entails ‘pass the
exam.’ (Conventional) scalar implcatures are evoked by both pragmatic and
semantic entailments. On the predicate part we can have such as a CT: “All the
abstracts DID get accepted. ~> but there may be withdrawals. Rooth’s (1996) simple
alternatives by F-marking cannot explain why fall-rise requires the relevant type of
170 CHUNGMIN LEE
scalar implicatures. See Lee (2000) for further examples of scalar Contrastive
Predicate Topic.
Then, a big question arises: Is a single CT sentence without Focus [Topic + CT]
possible, as in (39) and (40)? On surface at least, it is a fact (Steedman 2000 agrees
on this, while some others claim there must be a Focus on surface). If we consider,
however, why we talk without giving new information by focusing something, we
may want to ponder about possible explanations: (1) There is a silent Focus in the
scalar implicature part. This phenomenon is not independent; identification focus is
silent with a rising Topic marker (-nun (Korean), wa (Japanese), shi (Chinese) in a
question such as ney irum-un? or “Your name?”; (2) The yes/no (or verum)
question demands an answer with respect to whether or not, i.e. arrived or not;
passed or not. So, it may include a (Contrastive) Focus (Lee 2003). A partial
affirmative answer to this yes/no question is the concessively admitted CT sentence;
(3) CT itself is partially focal and we may assume that the implicature part is also
partially focal. Thus, the totality may be fully focal; (4) There is nothing beyond the
surface form [Topic + CT]. (1) and (3) above consider the implicature part and
are preferable to (2) and (4).
Focus is even neurologically real: Some ERP experiment results (Yuki 2004)
show striking brain responses to the lack of expected intonational prominence (A2)
in Figure 7 for focused words in Japanese. For the Subject wh-Q “Who lost the
key?” (Da’re-ga kagi’-o nakushita’-no?), A1 is Match: MA’SAYA-ga kag’i-o
nakushita’-N-da-yo and A2 is Mismatch: Ma’saya-ga KAGI’-o nakushita’-N-da-yo.
The Subject that lacks the expected intonational prominence (A2) is more positive
in the waveform than the properly prominent subject (A1). Observe:
Figure 7. ERP waveforms for Subject-focus WH-Q-answer pairs (A1 vs. A2)
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 171
7. CONCLUDING REMARKS
Contrastive Topic is preceded by a question that includes a sum as a potential Topic
or a conjunctive question (or even if it is a disjunctive question, inclusive reading
must be possible). On the other hand, Contrastive Focus, which has not been treated
here, is preceded by an alternative disjunctive question which expects a choice of a
single answer (see Lee 2003). A typical CT, which necessarily evokes a
conventional implicature, must be distinguished from a type of list contrastive topics.
Not only type-subtype scalarity (based on poset) but also subtype scalarity must
be incorporated in any model of Contrastive Topic, although some entities in some
contexts are allowed to receive list reading.
Contrastive Topic basically behaves as a narrow-scope-bearer in interaction with
other scope bearers including a REASON clause. A Contrastive Predicate Topic
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 173
analysis is proposed for the wide-scope negation reading of the scope ambiguous
sentences.
Predicates are necessarily subtype-scalar when CT-marked and numerals and
quantifiers, which are semantically ordered, have the same nature when CT-marked.
We cannot miss the real intent of using a CT: it is to convey a conventionally
implicated proposition. If ‘CT(p)’ is given, then contrastively (‘but’) ‘not q’ (q: a
contextually higher stronger predicate) is conveyed and if ‘CT(not-q)’ is given, then
contrastively ‘p’ (contextually a lower weaker predicate) is conveyed (Lee 2002).
The rhetorical force of CT is placing more weight on the unuttered implicature
proposition. The CT utterance is concessive admission and its concessivity can be
shown by the near-paragraph relation of (39) to (41):
(41) Even though/Even if/Although she ARRIVED, she didn’t go on the stage.
Although ‘even if’ is possible, it is not like a normal conditional, not licensing
contraposition. The truth of the consequent is urged, whatever the antecedent may
turn out to be in truth. The implicature of (39) i.e. the consequent of (40) is so
forceful in rhetorical structure.
Steedman (2000) incorporates a CT tone (L+H*) in the specification of ‘married’
in the lexicon (from Anna MARRIED (L+H* LH%) MANNY (H*LL%) but claims
that its implicature is “conversational” (this volume). But he emphasizes that
“kontrast, thematicity, and hearer responsibility are all elements of literal meaning,
and hence in your terms conventional implicature” (p.c.). Scalar implicatures,
generated by CT marking, though their higher values are determined by context, are
not cancelable and conventional. The intonational device may better be closer to its
meaning as conventional. Information structure must be able to show the relation
between intonation and meaning more closely by our further scrutiny.
8. NOTES
1
I would like to express my gratitude to Klaus von Heusinger, Mark Steedman and Julia Hirschberg and
other audiences of the Workshop on Topic and Focus: Meaning and Intonation at the 2001 LSA
Linguistic Institute (UCSB) for their questions and encouragement. I am also grateful to my co-editors
Matt Gordon and Daniel Buring for their patience in organizing the workshop and leading it to this
volume eventually. For part of this research Sun-Ah Jun’s comments on intonation, Hyunkyung Hwang’s
assistance on pitch tracks from subjects, KRF grants and the SNU leave of absence for my staying at
UCLA were all helpful.
2
Mira Oh, in her recent experiments (in preparation), ‘Phonetic Realizations of Focus and Topic in
Korean’, observes that the Cheonnam dialect shows an IntPBoundary in contrast with the Seoul dialect.
3
Steedman’s (2000) example (1) can be given a similar scalar interpretation. A theatrical musical
performance is assumed in the previous query and under it a pragmatic scale <musical, opera> can be set
up.
(1) Q: Does Marcel love opera?
A: Marcel likes MUSICALS.
L+H* LH%
Therefore, if opera and musicals are substituted by each other, the answer Marcel likes OPERACT would
not be appropriate on the scalar reading. On a non-scalar reading, the implicature may be open to a list
alternatives reading and even roundabout affirmation.
174 CHUNGMIN LEE
9. REFERENCES
Bach, K. “The Myth of Conventional Implicature.” Linguistics and Philosophy 22.4 (1999): 327-366.
Brentano, Franz. Psychology from an Empirical Point of View, trans’. A. C. Rancurrelo, et al London:
Routledge and Kegan Paul, 1973.
Buring, Daniel. “Topic.” In P. Bosch and R. van der Sandt (eds.), Focus and Natural Language
Processing 2, pp. 271-280. Cambridge: MIT Press, 1994.
Buring, Daniel. “On D-trees, Beans and B-accents.” Linguistics and Philosophy 26 (2003): 511-545.
Carlson, Lauri. Dialogue Games: An Approach to Discourse Analysis, Reidel: Dordrecht, 1983.
Chierchia, Gennaro. “Scalar Phenomena and Polarity.” Manuscript, 2002.
Choi, Hye-won. Optimizing Structure in Context: Scrambling and Information Structure. Stanford: CSLI,
1999.
Diesing, Molly. Indefinites [Linguistic Inquiry Monograph 20]. Cambridge: MIT Press, 1992.
von Fintel, Kai. Restrictions on Quantifier Domains. University of Massachusetts, Amherst: Doctoral
dissertation, 1994.
Fery, Caroline. German Intonational Pattens. Tuebingen: Niemeyer, 1993.
Groenendijk, Jeroen and Martin Stokhoff. Studies on the Semantics of Questions and the Pragmatics of
Answers. University of Amsterdam: Doctoral dissertation, 1984.
Hamblin, C. L. Fallacies. Bungay, Suffolk: Methuen, 1970.
Hedberg, Nancy and J. M. Sosa. “The Prosodic Structure of Topic and Focus in Spontaneous English
Dialogue.” This volume.
Hedberg, Nancy. “The Prosody of Contrastive Topic and Focus in Spoken English.” Talk presented at the
Workshop on Information Structure in Context, University of Stuttgart, 2002.
Hetland, Jorunn. “Contrast, the fall-rise accent, and Information Focus.” I: Structures of Focus and
Grammatical Relations, pp. 1-39. Tubingen: Niemeyer Linguistische Arbeiten, 2003.
Horn, L. A Natural History of Negation. Chicago: Chicago University Press, 1989.
Ito, Kiwako and Susan M. Garnsey. “Brain Responses to Focus-Related Prosodic Mismatch in Japanese.”
at SP2004, Tokyo.
Jackendoff, R. Semantic Interpretation in Generative Grammar, Cambridge: MIT Press, 1972.
Jun, Sun-Ah. The Phonetics and Phonology of Korean Prosody. The Ohio State University: Doctoral
dissertation, 1993. [Published by Garland, 1996].
Kennedy, Chris. Projecting the Adjective: The Syntax and Semantics of Gradability and Comparison. UC
Santa Cruz: Doctoral dissertation, 1997.
Krifka, Manfred. “At Least Some Determiners aren’t Determiners.” In K. Turner (ed), The
Semantics/Pragmatics Interface from Different Points of View 1, pp. 257-91. London: Elsevier, 1999.
Krifka, Manfred. “The Semantics of Questions and Focusation of Answers.” This volume.
Ladd, D. R. The Structure of Intonational Meaning, Indiana University Press, 1980.
Ladusaw, William. “Thetic and categorical, stage and individual, weak and strong.” In Negation and
Polarity, L. Horn and Yasuhiro Kato (eds.), Oxford: Oxford University Press, 2000.
Lee, Chungmin. Abstract Syntax and Korean with Reference to English. Seoul; Thaehaksa, 1973.
Lee, Chungmin. “(In-)definites, Case Markers, Classifiers and Quantifiers in Korean.” In S. Kuno et al
(eds.), Harvard Studies in Korean Linguistics. Department of Linguistics, Harvard University, 1989.
Lee, Chungmin. “Definite/Specific and Case Marking in Korean.” In Y.-R. Kim (ed.), Theoretical Issues
in Korean Linguistics, CSLI, Stanford University, 1994.
Lee, Chungmin. “Generic Sentences are Topic Constructions.” In T. Fretheim and G. Gundel (eds.),
Reference and Referent Accessibility. Amsterdam/Philadelphia: John Benjamins, 1996.
Lee, Chungmin. “Contrastive topic: A locus of the interface.” In K. Turner (ed.), The
Semantics/Pragmatics Interface from Different Points of View 1, pp. 317-41. London: Elsevier, 1999.
Lee, Chungmin. “Types of NPIs and nonveridicality in Korean and other languages.” In G. Storto (ed.),
UCLA Working Papers in Linguistics 3: Syntax at Sunset 2, pp. 96-132. Department of Linguistics,
UCLA, 1999.
Lee, Chungmin. “Contrastive predicates and scales.” CLS 36 (2000): 243-257.
Lee Chungmin. “Contrastive Topic and/or Contrastive Focus.” Japanese/Korean Linguistics, 2003.
Lee, Chungmin. “Contrastive Topic/Focus and Polarity in Discourse.” In K. von Heusinger and K. Turner
(eds.), Where Semantics Meets Pragmatics CRiSPI 16, pp. 381-420. London: Elsevier.
Marty, Anton. Gesammelte Schriften, II. Halle: Max Niemeyer, 1918.
CONTRASTIVE TOPIC, INTONATION, AND SCALAR IMPLICATURES 175
Molnar, Vleria. “Topic in Focus: the Syntax, Phonology, Semantics, and Pragmatics of the So-called
‘Contrative Topic’ in Hungarian and German”, Acta Linguistica Hungrica 45 (1998): 389-466.
Nakanishi, Kimiko. “Prosody and Scope Interpretations of the Topic Marker WA in Japanese.” This
volume.
Neale, S. “Coloring and composition.” In ed. by K. Murasugi and R. Stainton (eds.), Philosophy and
Linguistics. Westview Press, 1999.
O’Connor J. D. and G. F. Arnold (eds.). Intonation of Colloquial English 2nd edition. London: Longmans,
1973.
Pierrehumbert, J. and J. Hirschberg. “The meaning of intonational contours in the interpretation of
discourse.” In Cohen, J. Morgan, and M. Pollack (eds.), Intentions in Communication, pp. 271-311.
Cambridge: MIT Press, 1990.
Rooth, M. “Focus.” In S. Lappin (ed.), The Handbook of Contemporary Semantic Theory, London:
Blackwell, 1996.
Roberts, C. “Information Structure in Discourse: Towards an Integrated Formal Theory of Pragmatics.”
Manuscript. The Ohio State University, 1996.
Van Rooy, Robert. “Questions and Relevance.” NASSLLI 4 handout, 2004.
Steedman, Mark. The Syntactic Process, Cambridge: MIT Press, 2000.
Steedman, Mark. “Information Structural Semantics of English Intonation.” This volume.
Ward, Gregory and Julia Hirschberg. “Implicating Uncertainty: The Pragmatics of Fall-Rise Intonation.”
Language 61 (1985): 747-776.
Wee, Hae-Kyung. “Semantics and pragmatics of Contrastive Topic in Korean and English.” Manuscript.
Indiana University, 1997.
KIMIKO NAKANISHI
177
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 177–193.
© 2007 Springer.
178 KIMIKO NAKANISHI
different contours. Section 3 shows that the two functions of wa which are realized
by different intonational patterns are tied to different scope interpretations. This
correlation between the pragmatic functions of wa and their scope interpretations
can be formalized by applying Büring’s (1997) Alternative Semantics Approach. In
the last section, accepting a widely taken view that scope is expressed in syntax at
LF, I claim that the thematic wa and the contrastive wa must be syntactically
different at least at LF.
(2) ʼn ňʼn ň
a. kaki-ga b. kaki-ga c. kaki-ga
oyster-NOM fence-NOM persimmon-NOM
(3)
(4)
wa
However, Poser did not distinguish between thematic wa and contrastive wa.6 The
question is whether the prosodic patterns of thematic wa and of contrastive wa are
the same, which I explore in the next subsection.
(5) a. Thematic Wa
ʼn ň ʼn
Naoya-wa nonbiri-si-teiru.8
Naoya-TOP relax-do-PROG
‘Naoya is relaxing.’
b. Contrastive Wa
ʼn ň ʼn ň ʼn ň ʼn
Naoya-wa nonbiri-si-teiru ga Maria-wa nonbiri-si-tei-nai.
Naoya-TOP relax-do-PROG but Maria-TOP relax-do-PROG-NEG
‘Naoya is relaxing, but Maria is not relaxing.’
(6)
The distribution of the thematic and the contrastive cases of the five participants
are given in Figure 2. The X-axis indicates the value of P1 (Hz), and the Y-axis
indicates the value of P2 (Hz). As can be seen in the figure, the thematic cases
distribute around or above the P1 = P2 line. It means that P1 and P2 are roughly equal
or P1 is lower than P2. The contrastive cases, on the other hand, distribute mostly
below the P1 = P2 line, indicating that P1 is higher than P2.
PROSODY AND SCOPE OF THE TOPIC MARKER WA IN JAPANESE 181
2.3. Summary
In sum, the difference between thematic wa and contrastive wa is reflected in
different F0 patterns. That is, intonation can distinguish two pragmatic functions of
wa. Thus, intonational patterns are used in a significant way to convey different
pragmatic information.
P1 wa P2 P1 wa P2
First, I read aloud the sentence in (7) using the two prosodic patterns in (8) and
tape-recorded it. Actual F0 contours are shown in Figure 3.
PROSODY AND SCOPE OF THE TOPIC MARKER WA IN JAPANESE 183
300
250
200
150
100
50
0 1.37113
Time (s)
300
250
200
150
100
50
0 1.32972
Time (s)
Minna wa ne-nakat-ta
everyone TOP sleep-NEG-PAST
P1 P2
Second, four Japanese informants (2 males, 2 females) were asked to listen to the
recordings, and further asked whether there is any correspondence to the two scope
interpretations. They all agreed that the prosodic pattern of thematic wa corresponds
to the ∀>¬ reading, whereas the pattern of contrastive wa corresponds to the ¬>∀
reading.
Büring (1997) assumes that each sentence S derives three different semantic
objects, that is, the ordinary semantic value [[ S]] o, the Focus value [[ S]]f, and the Topic
value [[ S]] t. The first two values are defined by Rooth (1985): According to Rooth,
the ordinary value is a proposition and the Focus value is a set of propositions. What
is new to Büring is that a Topic as well as a Focus evokes alternatives. In particular,
the Topic value is a set of sets of propositions. He claims that the Topic accent
marks a deviation from the original Discourse Topic: an element marked with the
Topic accent is interpreted as a sentence internal topic such as a contrastive topic.
Let us examine an actual example in (11), which includes a contrastive topic. In (11),
with the Topic accent, a topic is interpreted as contrastive, and thus evokes
alternatives. Following Büring, I represent Topic and Focus marking by using
subscripted brackets, [ ]T and [ ]F, respectively.
c. [[(11A)]]t = {{I would buy War and Peace, I would buy The
Hotel New Hampshire, I would buy Harry Potter,
…},
{John would buy War and Peace, John would buy
The Hotel New Hampshire, John would buy Harry
Potter, …},
{Tom would buy War and Peace, Tom would buy
The Hotel New Hampshire, Tom would buy Harry
Potter, …}, … }
Büring further introduces the notion of Residual Topic, which is a set of disputable
propositions induced by the Topic. The definitions are given in (12) and (13).
Büring assumes that the sentences are structurally ambiguous by LF at the latest.
The different intonational contour leads to certain implicatures that differ for both
LF representations. The unavailable reading is ruled out because its LF
representation does not yield reasonable implicatures. Thus, the LF for the ∀>¬
reading in (14) does not have reasonable implicatures, whereas the LF for the ¬>∀
reading does. His analysis for these two readings is summarized below.
First, let us discuss the ¬>∀ reading. As shown in (15), there are Residual
Topics: if not all politicians are corrupt, are there corrupt politicians at all? If so,
186 KIMIKO NAKANISHI
(16) a. [[(14)]]o = all politicians are such that they are not corrupt
b. [[(14)]]f = {all politicians are such that they are not corrupt, all
politicians are such that they are corrupt}
c. [[(14)]]t = {{all politicians are such that they are not corrupt, all
politicians are such that they are corrupt},
{most politicians are such that they are not corrupt,
most politicians are such that they are corrupt},
{some politicians are such that they are not corrupt,
some politicians are such that they are corrupt},
{no politicians are such that they are not corrupt, no
politicians are such that they are corrupt}}
marks topic and focus. Büring’s approach captures a direct relation between
pragmatics, which is expressed by a certain prosody, and semantics. I apply this
approach to the Japanese data: Rather than examining the relation between prosody
and semantics, I examine the relation between pragmatics and semantics. That is, in
the relevant Japanese data, the thematic wa corresponds to the ∀>¬ reading,
whereas the contrastive wa corresponds to the ¬>∀ reading. Indeed, this approach
can account for the Japanese data.
First, I consider the correspondence between the thematic wa and the ∀>¬
reading, which is shown in (17).
(17) Thematic wa
Minna-wa ne-nakat-ta.
everyone-TOP sleep-NEG-PAST
‘Everyone didn’t sleep.’
√∀>¬, *¬>∀
Kuno (1973) claims that, when the topic marker wa is interpreted as a theme, the
element to which wa attaches must be either anaphoric or generic. If an element is
‘anaphoric’, it should have an antecedent in a previous context. In this sense, an
anaphoric element is definite. A ‘generic’ element does not have an antecedent, and
it denotes something that holds regardless of time or place of the utterance. Minna
‘everyone’ in (17) cannot be generic, because it is a subject of an eventive predicate,
which does not hold for a general time or place. Thus, minna ‘everyone’ in (17)
must be anaphoric. It is independently known that anaphoric definite elements do
not enter into a scopal relation with other scope-bearing elements in a sentence
(Fodor and Sag 1982, for example). In other words, anaphoric definite elements are
said to take the widest scope reading not because they take scope over other
elements by syntactic mechanisms such as quantifier raising (May 1985), but
because they are scopeless. For this reason, in (17), the universal quantifier with the
thematic wa has a wide scope interpretation only. In this way, the scope
interpretation of the sentence can be accounted for by its pragmatic information.
Let us move on to the contrastive wa. The correspondence between the
contrastive wa and the ¬>∀ reading can be straightforwardly captured by applying
Büring’s (1997) framework. Following Büring, I assume that the contrastive wa
always evokes alternatives. The question is where a Focus falls in sentences with
contrastive wa. Consider a possible context for a sentence with contrastive wa given
in (18). Note that (18b) is uttered using the intonational pattern for contrastive wa,
where P1 is much higher than P2.
(20) Contrastive wa
[Minna-wa]T ne-[nakat]F-ta.
everyone-TOP sleep-NEG-PAST
‘Everyone didn’t sleep.’
*∀>¬, √¬>∀
We can see that the example in (20) and the German rise-fall example discussed in
(14) above have the same Topic-Focus assignments. It follows that Büring’s
analysis for German should apply to the Japanese example. The ¬>∀ reading in (20)
is available because there are Residual Topics: if not everyone slept, is there anyone
who slept at all? If so, how many?
The ∀>¬ reading in (20) is, on the other hand, unavailable because there is no
Residual Topic: if all people are such that they didn’t sleep, then, it is true that most
people are such that they didn’t sleep, and it is also true that some people are such
that they didn’t sleep. Other elements of the sets express a contradiction.
(22) a. [[(20)]]o = all people are such that they didn’t sleep
b. [[(20)]]f = {all people are such that they didn’t sleep,
all people are such that they slept}
c. [[(20)]]t = {{all people are such that they didn’t sleep, all people
are such that they slept},
{most people are such that they didn’t sleep, most
people are such that they slept},
{some people are such that they didn’t sleep, some
people are such that they slept},
{no people are such that they didn’t sleep, no people are
such that they slept}}
3.3. Summary
In Japanese, sentences with negation and the topic marker wa are subject to scope
ambiguity. I first showed that the two different prosodic patterns correspond to
different scope readings. In other words, the two pragmatic functions of wa
expressed by different prosodic patterns correspond to different scope interpretations.
I examined a correspondence between pragmatic functions of wa and scope
interpretations based on Büring (1997). The thematic wa corresponds to one reading
and the contrastive wa to the other. In other words, the scope ambiguity arises
because wa has two pragmatic functions.
4. DISCUSSION
In this paper, I presented two sets of empirical data: First, the two pragmatic
functions of the topic marker, that is, theme and contrast, are realized by different F0
190 KIMIKO NAKANISHI
patterns. Second, these two prosodic patterns correspond to two different scope
interpretations. The relevant findings are summarized in Table 1 below.
Different prosodic patterns are used to make pragmatic distinctions between theme
and contrast. Those pragmatic distinctions, which are realized by distinct prosodic
patterns, are correlated with different scope readings. This correlation between
pragmatics and semantics is not arbitrary. As formalized in section 3, the correlation
between pragmatic functions of wa and scope readings can be captured by Büring’s
Alternative Semantics Approach, which uses a direct relation between pragmatics
and semantics. In this way, three properties of the topic marker, i.e., prosodic
patterns, pragmatic functions, and scope readings, are coherently related to each
other.
Finally, I would like to briefly address the question that many previous studies in
Japanese linguistics have discussed: Should the thematic wa and the contrastive wa
be distinguished in syntax? Some previous studies claim that they need not be
distinguished in syntax (Mihara 1996, for example). For these studies, theme and
contrast might be merely different in pragmatic interpretation, not syntactically
different. Others claim that they should (Hoji 1985, Saito 1985, Tateishi 1994, for
example). Their claim is based on the argument that two kinds of wa are base-
generated in different positions in a syntactic structure. For example, Tateishi (1994)
shows that the thematic wa violates Subjacency, whereas the contrastive wa obeys it.
This is because the thematic and the contrastive wa are base-generated in different
positions. The current study does not say anything about where the two kinds of wa
are base-generated. However, it shows that they have different syntax at least at LF,
since they correspond to different scope readings, which are expressed by syntactic
structures at LF. I interpret this fact as a piece of evidence that the thematic and the
contrastive wa should be distinguished in the syntax.
University of Pennsylvania
5. NOTES
*
I would like to thank Mark Liberman, Bill Poser, Satoshi Tomioka, and Jennifer Venditti for valuable
discussions and their insights. I am also grateful to Daniel Büring, Elsi Kaiser, and Kazuaki Maeda.
PROSODY AND SCOPE OF THE TOPIC MARKER WA IN JAPANESE 191
Thanks are also due to the audience at Topic, Focus and Intonation Workshop (University of California,
Santa Barbara, July, 2001).
1
To be precise, the distinction between the thematic and the contrastive wa is valid only when the topic
marker is attached to the subject in canonical word order, which is S-O-V. When the topic marker is
attached to the object in canonical position, the object is exclusively interpreted as a contrastive element,
as shown in (i).
(i) John-ga ringo-wa tabe-ta.
John-NOM apple-TOP eat-PAST
‘John ate apples (but there were some food that he didn’t eat)’
For this reason, I only consider the examples where wa is attached to a canonical subject.
2
An earlier version of this section is presented in Nakanishi (2002).
3
See Haraguchi (1999) for a recent survey of the Japanese pitch accent system.
4
‘ň’ marks a low-high sequence of pitch, and ‘ʼn’ marks a high-low sequence of pitch.
5
For how to determine Major Phrases, see Selkirk and Tateishi (1991).
6
Finn (1984) claimed that thematic wa and contrastive wa were differentiated by pauses as well as
fundamental frequency (F0). Her claim is based on experimental studies in which she measured the peak
of F0 contours before wa and the valley F0 of wa, and also the pause between wa and the following word.
Her experimental methods, however, are problematic; unfortunately, I do not have space to discuss them
here.
7
Voiced segments exhibit smoother F0 contours than other consonants, without being disturbed much by
segmental effects.
8
The predicate nonbiri-site-iru can describe either the current state ‘be relaxing’ or the permanent state
‘be laid-back’. In other words, it can be either a stage-level or an individual-level predicate (Carlson
1977). To avoid possible prosodic effects of this ambiguity, the participants were informed that the
sentences used in the experiment mean ‘be relaxing’, not ‘be laid-back’.
9
For the contrastive case, a question arises as to whether the low value of P2 is a result of Downstep or a
reduction of range for another reason. The result of the experiment suggests that the drop of P2 is not due
to Downstep, since the difference between P1 and P2 is much larger than the case of Downstep. I thank
Jennifer Venditti for discussions of this issue.
10
The first and second arrows in F0 contours indicate P1 and P2, respectively.
11
The negative morpheme is just -na, as we can see in forms such as -na-i ‘-NEG-PRES’. The status of a
suffix after negation -kat (or arguably, and certainly historically, -kar) is admittedly a problem. Bill Poser
(p.c.) pointed out to me that, synchronically -kat has to be analyzed as obligatorily affixed to adjectives
when certain suffixes, such as -ta ‘-PAST’, are added. Following Poser’s suggestion, for the purpose of
this study, I assume that -nakat is a suppletive form of the negative required by suffixes like -ta.
12
The first and second arrows in F0 contours indicate P1 and P2, respectively.
13
Related to this is Noda’s (1996) claim that, when a sentence with the contrastive wa is conjoined with
another sentence, the predicates of these two sentences tend to express opposite states. For example, in
(i) below, the predicate didn’t sleep is most naturally conjoined with the opposite predicate slept.
(i) John-wa ne-nakat-ta-ga Mary-wa ne-ta.
John-TOP sleep-NEG-PAST-but Mary-TOP sleep-PAST
‘John didn’t sleep, but Mary slept.’
14
In Japanese, the negation is a morpheme attached to a verb. For this reason, it seems impossible for the
negation alone to be accented. Thus, I assume that, although the negation is a Focus, it does not have a
special prosodic pattern as in German, where the focused negation is realized with a falling accent.
6. REFERENCES
Bolinger, Dwight. Forms of English: Accent, Morpheme, Order. Cambridge: Harvard University Press,
1965.
192 KIMIKO NAKANISHI
Büring, Daniel. “The Great Scope Inversion Conspiracy.” Linguistics and Philosophy 20 (1997):
175−194.
Carlson, Gregory. Reference to Kinds in English. Ph.D. dissertation, University of Massachusetts,
Amherst, 1977. [New York: Garland, 1980].
Féry, Caroline. German Intonational Patterns. Tübingen: Niemeyer, 1993.
Finn, A.N. “Intonational accompaniments of Japanese morphemes wa and ga.” Language and Speech
27:1 (1984): 47−57.
Fodor, Jane, and Ivan Sag. “Referential and Quantificational Indefinites.” Linguistics and Philosophy 5
(1982): 355−398.
Halliday, M.A.K. “Notes on Transitivity and Theme in English, Part II.” Journal of Linguistics 3 (1967):
199−244.
Haraguchi, Shosuke. The Tone Pattern of Japanese: An Autosegmental Theory of Tonology. Tokyo:
Kaitakusha, 1977.
Haraguchi, Shosuke. “Accent.” In N. Tsujimura (ed.), An Introduction to Japanese Linguistics, pp. 1−61.
Cambridge: Blackwell, 1999.
Hirst, Daniel, and A. Di Cristo. Intonation Systems: A Survey of Twenty Languages. Cambridge:
Cambridge University Press, 1998.
Hoji, Hajime. Logical Form Constraints and Configurational Structures in Japanese. University of
Washington: Doctoral dissertation, 1985.
Jackendoff, Ray. Semantic Interpretation in Generative Grammar. Cambridge: MIT Press, 1972.
Kato, Yasuhiko. “Negation and the Discourse-Dependent Property of Relative Scope in Japanese.”
Sophia Linguistica (1988): 23−24.
Krifka, Manfred. “Scope Inversion under Rise-Fall Contour in German.” Linguistic Inquiry 29:1 (1998):
75−112.
Kubozono, Haruo. The Organization of Japanese Prosody. Tokyo: Kuroshio, 1993.
Kuno, Susumu. The Structure of the Japanese Language. Cambridge: MIT Press, 1973.
Ladd, D. Robert. Intonational Phonology. Cambridge: Cambridge University Press, 1996.
Lambrecht, Knud. Information Structure and Sentence Form: Topic, Focus and the Mental
Representations of Discourse Referents. Cambridge: Cambridge University Press, 1994.
May, Robert. Logical Form: Its Structure and Derivation. Cambridge: MIT Press, 1985.
McCawley, James. The Phonological Component of a Grammar of Japanese. Hague: Mouton, 1968.
Mihara, Ken-ichi. Nihongo-no Toogo Koozoo [Syntactic Structures in Japanese]. Tokyo: Syohakusya,
1996.
Nakanishi, Kimiko. “Prosody and Information Structure in Japanese: a Case Study of Topic Marker wa.”
Japanese/Korean Linguistics 10 (2002): 434−447. Stanford: CSLI.
Noda, Hisashi. Wa to Ga [Wa and Ga]. Tokyo: Kuroshio Syuppan, 1996.
Pierrehumbert, Janet, and Mary Beckman. Japanese Tone Structure. Cambridge: MIT Press, 1988.
Poser, William. The Phonetics and Phonology of Tone and Intonation in Japanese. MIT: Doctoral
dissertation, 1984.
Rooth, Mats. Association with Focus. University of Massachusetts, Amherst: Doctoral dissertation, 1985.
Saito, Mamoru. Some Asymmetries in Japanese and their Theoretical Implications. MIT: Doctoral
dissertation, 1985.
Selkirk, Elizabeth and Koichi Tateishi. “Syntax and Downstep in Japanese.” In C. Georgopoulos and R.
Ishihara (eds.), Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda, pp.
519−543. Dordrecht: Kluwer, 1991.
PROSODY AND SCOPE OF THE TOPIC MARKER WA IN JAPANESE 193
Stalnaker, Robert. “Assertion.” In P. Cold (ed.), Syntax and Semantics 9: Pragmatics, pp. 315−332. New
York: Academic Press, 1978.
Steedman, Mark. “Information Structure and the Syntax-Phonology Interface.” Linguistic Inquiry 31:4
(2000): 649−689.
Tateishi, Koichi. The Syntax or ‘Subjects’. Stanford: CSLI, 1994.
HO-HSIEN PAN
Abstract. This study investigated how focus influences f0 contour and duration of Taiwanese lexical
tones. F0 and duration values were taken from pitch tracks and spectrograms generated from SVO
sentences with different focus conditions. The four focus conditions included a broad focus condition
with focus on the entire sentence, and three narrow focus conditions with narrow focus falling on the first,
second, and third words. Results of the duration data revealed that (1) duration of narrow focused
syllables were longer than syllables in other focus conditions and (2) duration of narrow focused syllables
varied as a function of their position within the phrase; penultimate focused syllables were longest.
Analysis of f0 minimum and maximum indicated that (1) f0 range of narrow focused syllables was
expanded and (2) together, mean f0 value and expansion of f0 range distinguish focus conditions.
Comparison between f0 and duration data showed that duration was more consistently used to distinguish
focus condition than f0 range and mean f0 value in Taiwanese.
1. INTRODUCTION
Focus, tone, and intonation are all manifested through fundamental frequency (f0)
contours and duration in Taiwanese. There is no one-to-one correspondence between
the surface acoustical realization and the deeper structure, nor do surface f0 contours
and duration directly reflect underlying features. To improve our understanding of
surface f0 and duration formation, the contribution of underlying global or local
factors to surface f0 and duration patterns must be investigated. The global factors
that contribute to f0 modulation can be divided into two categories, i.e. declination
and final lowering. The gradual decline of f0 over the course of an utterance is
called declination, while the f0 decline at the end of an utterance or phrase is called
final lowering (Liberman & Pierrehumbert, 1984; Pierrehumbert & Beckman, 1988;
Shih, 1988). Global effects also affect duration. For example the duration of a
syllable varies according to a syllable’s position relative to a prosodic boundary.
Studies have showed that phrase-medial segments are shorter than those in phrase-
initial and phrase-final positions (Lindblom & Rapp, 1973). In addition to global
effects, f0, and duration are also affected by local factors such as tone and focus (Ho,
1976; Lin, 1988).
The contribution of tone to the duration of a tone-bearing unit has been observed
in languages such as Taiwanese. In Taiwanese, the rising tone is longer, and the
duration of checked syllables (CVC structures with final voiceless stops) is shorter
than the duration of unchecked syllables (CV or CVN structures) (Cheng, 1968,
1973; Lin, 1988). Focus also influences syllable duration. It was found that the
duration of narrow focus syllables are longer than broad focus syllables, which in
turn are longer than post-focus syllables (Jin 1996; Xu 1999).
195
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 195–213.
© 2007 Springer.
196 HO-HSIEN PAN
Turning to f0, it was observed that local factors such as tone and focus both
affect the surface f0 pattern. There are unique intrinsic tonal targets that each
Taiwanese lexical tone possesses. These tonal targets determine the f0 height
(register), and f0 shape (contour) of tone bearing syllables. For example, a high level
tone has an intrinsic high and level f0 contour which coarticulates with surrounding
tones (Lin, 1988; Shih, 1988; Gandour et al., 1994; Xu, 1993, 1997).
The contribution of focus on surface f0 patterns was reported in various
languages (Pierrehumbert, 1980; Cooper, Eady & Muller, 1985; Eady & Cooper,
1986; Eady, Cooper, Klouda, Mueller & Lotts, 1986; Jin, 1996; Xu, 1999). Jin
(1996) found that in Mandarin the f0 range of narrow focus syllables was expanded.
In this study he varied the four lexical tones of the first two syllables (or words) in
sentences with the following structure, /___ mi ŋ15 nien 15 liau15 yaŋ 15/, ‘X is going
to the sanitarium next year.’ Each sentence employed four different focus conditions,
including broad focus with focus on the entire sentence, and three narrow focus
conditions with focus placed either on the first, second, and third word. Results
showed that (1) duration of narrow focus syllables was the longest, (2) the f0
range of the narrow focus syllable was expanded, and (3) the f0 contours of the final
word in broad focus sentences were perceptually indistinguishable from narrow
focus final syllable.
Xu (1999) investigated local factors including tone and focus. He varied the
lexical values of the first three words in a sentence. Each of the three words carried
four Mandarin lexical tones, e.g. level, rising, falling, and falling rising tones. For
sentence /mao55 mi35 mai51 mao55 mi55/ ‘Cat fan sells kitty.’ four questions were
asked to elicit production with broad focus, or narrow focus on word one, two, or
three. For example, when the question ‘What is kitty doing?’ was asked, the narrow
focus was appropriately produced on word three, /mai/, in the target sentence.
Results further confirmed that duration increased and f0 range was expanded for
narrow focus syllables in Mandarin.
A tone language with its clear specification of local tonal targets on each syllable
is suitable for studying the contribution of global and local effects on surface f0
realization and duration. This study followed the line of research on the influence of
local effects, i.e. tone and focus, on the surface f0 formation and syllable duration in
a tone language, by controlling the intonation of each utterance and its syntactic
composition, while varying lexical tone and focus condition (Jin, 1996; Xu, 1999).
Lexical tones in a tone language are contrastive in terms of f0 height, contour,
and duration. In Mandarin, there are four lexical tones, namely high level (55), low
rising (15), high falling (51), and falling rising tones (315). These four tones are
distinguished mainly through f0 shapes. Each Mandarin tone has its own distinctive
f0 contour not shared by other tones. However, little is known about a tone
language, like Taiwanese, with tones distinguished mainly by not only f0 contour
but also f0 height contrasts. There are seven lexical tones in Taiwanese, i.e. high
level (55), low rising (24), high falling (51), mid falling (21), mid level (33), high
falling checked (51), and mid falling checked tones (21), as shown in Table 1.
There are pairs of lexical tones that differ in only tonal height. For example, high
falling and mid falling tones differ only in their relative f0 levels, as do high and mid
FOCUS AND TAIWANESE UNCHECKED TONES 197
level tones. Compared with Mandarin, Taiwanese has a richer tone inventory. This
study contributed to the little data on the realization of focus in a tone language.
The present study reports on how focus contributes to the realization of f0 and
syllable duration of lexical tones in Taiwanese. Words containing different lexical
tones were produced in short sentences that controlled for the global effects of
intonation and prosodic tonal grouping, while varying the local effects of tonal value
and focus pattern. The purpose of this study was to examine the surface realization
of f0 and duration of Taiwanese lexical tones under different focus conditions, with
attention drawn to following issues: (1) the effect of narrow focus on duration, (2)
the effect of a narrow focus syllable’s position in an utterance on its duration, (3) the
effect of narrow focus on f0 range and (4) the influence of focus on tone height
between high and mid falling tones, and between high and mid level tones.
2. METHOD
2.1. Corpus
Each Taiwanese syllable has two different lexical tones, i.e. a juncture tone
(underlying) tone and a context (sandhi) tone. The surface realization of tonal values
depends on a syllable’s position in a tone group. When a syllable is located at the
end of a tone group, that is the juncture position and so the juncture (underlying)
tone surfaces. Any other syllables that are not last in a tone group carry a context
tone. The juncture and context tone values that each syllable possesses are recursive
in nature. For example, a syllable that surfaces with either the tones 55 or 24 at the
juncture position of a tone group has a context (sandhi) tone value 33 at non-
juncture positions. The context tone for a syllable with a juncture tone 33 is tone 21,
while the context tone for a syllable with a juncture tone 21 is tone 51. A syllable
with a juncture tone 51 would carry a tone 55 in a non-juncture position, as shown in
Table 2. It should be noted that tone 24 only surfaces at juncture positions, and not
in initial or medial positions of a tone group. The domain of the tone group
boundary is prosodically determined and closely related to syntactic structures in
Taiwanese (Chang, 1968, 1973; Chan, 1987; Lin, 1988).
198 HO-HSIEN PAN
In the corpus, the sentence type was a statement with SVO structure. The tone
group boundaries for these short sentences were located between the first and second
words. That is, the first word (first and second syllables) formed a tonal group, while
the second word (third syllable) and third word (fourth and fifth syllables) formed
another tone group.
Unchecked Checked
55 51 21 53
24 33 21
(1) [ σcntxt σjnctr ]tone group [ σcntxt σcntxt σjnctr ]tone group
According to tone sandhi patterns, the second and fifth syllables which are the
last syllables in these tone groups carried a juncture tone, while the first, third, and
fourth syllables carried context tones. Since a low rising tone is not a possible
context tone, it was not used in the third and fourth syllables, as shown in Table 3.
In the corpus, only sonorants were used as initial consonants to minimize
perturbation in vocal fold vibration in order to ensure smooth pitch tracks, as shown
in Table 3.
The subject, including the first and second syllables of the sentence, was a
surname. The first syllable of the subject was a diminutive morpheme /a- 55/. The
second syllable consisted of five juncture tones: high level (55), low rising (24),
high falling (51), mid falling (21), and mid level (33). The third syllable consisted of
four context tones: high level (55), high falling (51), mid falling (21), and mid level
(33). Since a low rising tone was not a possible context tone, only the four tones
were used in the third and fourth syllables. The fourth and fifth syllables formed the
object. The fourth syllable consisted of the tones 55, 51, 21, and 33. The fifth
syllable was the diminutive affix /-a 51/. Since it was not possible to find an object
carrying a high falling tone for the fourth and fifth syllables (e.g., 51 51) the lexical
item, /a 51 ´ŋ 33/ ‘duck egg’, was chosen for the object with a high falling tone in the
fourth syllable. Checked syllables were not investigated in this study.
FOCUS AND TAIWANESE UNCHECKED TONES 199
Table 3. Tones and syllables used as corpus. Tones are in underlying form within // and
surface forms within [ ].
/55 51/ [33 51] [a mã] /21/ [51] [liam] ‘pinch’ /21 33/ [51 33][´׀ŋ]
51
‘duck egg’
/55 21/ [33 21] [a lun] /33/ [21] [mã] ‘scold’ /33 51/ [21 51] [lua a] ‘comb’
21
/55 33/ [33 33] /24/ [33 ] [law] ‘save’ /24 51/ [33 51] [n ĩ ũ a]
[a liaŋ ]
33
‘silkworm’
Nine hundred and sixty sentences (5 first word X 4 second word X 4 third word
X 4 focus conditions X 3 repetitions) in the corpus were formed by alternating the
five words in position 1 to match the four alternating words in position 2 and the
four alternating words in position 3. There were four focus conditions. Narrow focus
was placed either on the first word (first and second syllables), the second word (third
syllable), or the third word (fourth and fifth syllables), while broad focus was placed
on the entire sentence. Each sentence was repeated three times. The order in which
the 960 sentences were produced was randomized. The sentences were written on a
list with no specification of the placement of focus. A question list corresponding to
the order of the corpus list was created to elicit focus on the desired part of the
sentence, as shown in (2). For example, to elicit broad focus on the sentence ‘A-mei
holds buttons’, the precursor question listed on the question list would be ‘What
happened?’ as shown in (2) d.
2.2. Speaker
Four male native Taiwanese speakers, CYS, LWS, LYK, and HYH, participated in the
experiment. They were all trilingual speakers of Taiwanese Min, Mandarin, and
English. HYH spoke a variety of dialects in which the underlying low rising tone
changes into a mid falling surface tone. All speakers were students at National Chiao
Tung University at the time of the recordings. They were paid for their participation.
2.3. Instrumentation
Recordings were made in a sound-treated booth in The Department of Foreign
Languages and Literatures at National Chiao Tung University in Hsinchu, Taiwan.
A TEV TM-728II unidirectional dynamic microphone was placed 40 cm in front of
each speaker’s mouth and 1 m from the experimenter. A SONY MZS-R4ST Mini
Disk recorded acoustical signals in digital quality. The digital acoustical signal was
transferred from Mini Disk to PC through an optical fiber at 22kHz to the digital
input of Creative Sound Blaster Live sound card, and saved in .wav format. The
ESPS xwaves program was used to generate fundamental frequency tracks for each
sentence.
2.4. Procedure
During the recording a female experimenter and a speaker were present in the sound
booth. Short dialogues between the experimenter and speaker were exchanged to
ensure that each speaker produced the corpus in a conversational, and not in a
citation manner and to ensure that each speaker placed focus in the target position
naturally, as opposed to reading the sentence directly from the list. During the
recording, speakers read the sentences without indication for the placement of focus
from a randomised corpus list. Speakers waited until the experimenter read a
precursor question from a question list and then responded by producing the
sentence, which he read from the corpus list with focus on the specific part of the
sentence. Different questions elicited focus on different parts of the sentence as
shown in Table 2. The experimenter judged the utterance according to the desired
location of focus at the targeted position. If the experimenter decided that the desired
focus condition was not produced, then she would repeat the precursor again, and
ask for another production.
words, phones, tones, and the location of focused elements, another Emu program
(Emuquery) was used to obtain the time at the onset and offset of the second
syllable, third, and fourth syllables. The duration of each syllable was calculated
by subtracting the time at the syllable onset from the time of the syllable offset.
Next, the fundamental frequency was extracted for each syllable using
get_track, and the Emu pitch extraction program. Fundamental frequency values at
5%, 20%, 40%, 60%, 80%, and 95% time points in the target syllables were obtained
from these pitch tracks. The average f0 and duration for the second, third, and fourth
syllables carrying the same tone in different focus conditions were compared. One-
way ANOVAs (focus position) were used to determine the effect of focus position
on peak f0, f0 range expansion, and duration.
3. RESULTS
3.1. Duration
Table 4. One-way ANOVA’s (4 focuses) on mean duration (ms), ** p < .001, p < .05, NF:
Narrow Focus, bold face: narrow focus syllable
Among the narrow focus syllables carrying the same tone in different syllable
positions, the duration of the narrow focus third syllables was the longest, compared
with the duration of the narrow focus syllables in the second and fourth syllable
position. The effect of position on syllable duration was confounded by the vowel
quality and the syllable structure (closed vs. open) which were not controlled in the
corpus.
Although the duration of narrow focus second and third syllables was longer
than the same syllable in other focus conditions, the narrow focus fourth syllable
was not the longest, as shown in Table 4. According to Table 4, the duration of
the narrow focus tones 55 and 33 in the fourth syllable produced by HYH, was not
the longest when compared to the same syllable produced in the other focus
conditions. This was also the result for syllables with tones 55, 51, and 33 in the
fourth syllable produced by LWS, and for syllables produced with tones 55 and 21
in the fourth syllable produced by LYK. The duration of the narrow focus fourth
syllable was similar to that of the post-focus fourth syllable, as shown in Table 4.
In summary, increased duration for narrow focus syllables was most obvious in
second and third syllable position and least noticeable in the fourth syllable position.
3.2. F0
3.2.1. Tonal register (f0 level) contrast
The f0 contours were averaged across speakers to reveal a potential contrast in tonal
register between high level vs. mid level tones and between high falling vs. mid
falling tones, as shown in Figure 1. A comparison between the f0 range of narrow
focus tones 55 and 33 in the syllable onset and the f0 peak revealed that f0 onset of
both the tones 55 and 33 was between 140 to 160 Hz. However, the f0 peak was
between 170 to 190 Hz for the tone 55 and remained below 160 Hz for the tone 33.
The only exception was the 20% point of tone 33 in the second syllable and the 95%
point of tone 33 in the fourth syllable, which was slightly above 160 Hz for tone 33.
Turning to the tones 51 and 21, we see that the f0 peak of the narrow focus tone 51
was between 180 to 200 Hz, while the f0 peak of the narrow focus tone 21 was
between 120 to 140 Hz. As for the lowest point of f0, it was between 150 to 170 Hz
for tone 51 and below 140 Hz for tone 21. The f0 level difference between tones 51
vs. 21 and between tones 33 vs. 55 was maintained for narrow focus syllables even
after the f0 range was expanded under narrow focus condition.
204 HO-HSIEN PAN
220 220
200 200
2 2 4
180 2 4
0
3 3
0 4
180
2 0 4 3 0
3 4 3 3 0 3
0 3 0 4 0 3
4 0 3
f0(Hz)
3 0
f0(Hz)
2 0 2
4 4 0 2
4 2
4 2 3 2
0 2
160 2
3
0 4
0
2 160
0 2
4 3
2
4 4
3 3
0 3 4
0
3 2
3
4 0 2 2 2 2
4 0 0
4 4
3 0 0 4 3 3
2 2 2
4 4
3 0 4
0 2
3 0 03
3
4 4 2 03 3
140 140 2 4
2
4 0
4 4
0 2
0
2 3 3
4
3 3 2 2
0
4 0
4 0
2
3 3 4 0
2 2 0
4
3 4
2 2
3
120 120
100 100
220 220
200 200
180 180
f0(Hz)
f0(Hz)
4
2 0
160 2 160 0 2
0 0
2 0
2 0
2
0 2 2
0 3 3
0 3 3 3
4 4 4
3 4 4 3
4 0
3 0
4 3
4 3 4
2 0 2
0 2 4 4
2
3 3
0 3 0 0
3
4
2 3 0 2
3
4 4 3
3
2
4 0 4
0
2 4
2 2 0
3
4 4
2 3
4 4
0 0
0 2 2
140 0
2
3
4
3
4 140 3 3
2 2
120 120
100 100
syllable 2
0 broad focus syllable 3 syllable 4
Syllable Position
2 Narrow focus on syllable 2
HL tone f0 average
220
200
2 2
3 3
0 0
3 4 3
0 3
180 0 4
2
4 0 4
4 0
3 0 2
2 3 4
4 4 4
2
0 0 4
3
f0(Hz)
3
0 2 0
3 2 0
2 4 4 0 0
4 2 3
160 3
0
2 3
2 4 0 2 4
0
2
4 2 2
3 3 3 4
0
2
4
3
2
140 3
120
100
Syllable position
Figure 1. F0 of five tones in the second, third, and fourth syllable position receiving four
different focus conditions
Lexical tones in the second syllable were preceded by tone 33 with mid offset at
the first syllable and followed by tones 55, 51, 21, and 33 at the third syllable with
an averaged onset realized at a slightly above mid average. For the third syllable, it
was preceded by tones 55, 24, 51, 21, and 33 at the second syllable and produced
with a slightly below mid average f0 onset. The third syllable was followed by tones
55, 51, 21, and 33 and produced with a slightly above mid average f0 offset. The
fourth syllable was preceded by tones 55, 51, 21, and 33 and realized at a mid
average onset. The fourth syllable was followed by suffixes, /a51/, in seventy five
percent of the tokens and followed by the morpheme, /ls´ŋ 33/ ‘egg’, in twenty five
twenty five percent of the tokens. On the average offset of fourth syllable was
realized at an upper mid to high average.
Due to preservatory tonal coarticulation, tone 55 at the second, third, and fourth
syllables started around the mid tonal range following the averaged mid offset of the
first, second, and third syllables. The f0 contours of tone 55 at second and third
syllables then gradually rose to a higher offset target at 80% into the syllables then
slightly declined to coarticulate anticipatorily with the following mid onset of third
and fourth syllables. The f0 contours of tone 55 at the fourth syllable did not decline
at the end of the syllable, since they were followed by an upper mid to high onset at
the fifth syllable. Both preservatory and anticipatory tonal coarticulation was
observed on tone 55. The gradual decrease of the high offset of tone 55 from the
second, to third and fourth syllables was a sign of global declination.
The onset of tone 24 at the second syllable started from the mid offset of
preceding syllable then moved downward to the low onset target of rising tone 24.
The low onset target of tone 24 was reached around the 60% time point into the
syllable and then the f0 pattern began to take on the rising contour of tone 24.
Preservatory tonal coarticulation can be observed at the beginning of tone 24.
The onset of tone 51 at the second, third, and fourth syllables began around the
mid tonal range then began to rise toward the high onset target. The high target was
reached at the 60% time point in the second syllable and the 40% time point in the
third and fourth syllables. After this, the f0 pattern began to move downward toward
the low offset target of falling tone 51. Effects of declination can be observed by
comparing the f0 height of high onset targets that gradually decreased from the
second to the third and to the fourth syllable. Preservatory tonal coarticulation was
observed at the beginning of tone 51.
The onset of tone 21 began around the mid tonal range for the second and third
syllables. The onset of tone 21 in the fourth syllable was much lower due to global
declination and the lower averaged offset of the third syllable. F0 moved downward
toward the target and then began to rise at the 95% for the third syllable and the 60%
time point for the fourth syllable. Effects of declination were observed on the f0
height of the low offset target between the second and third syllables. The rising
contour of tone 21 at the fourth syllable was due to anticipatory tonal coarticulation
with the high to mid onset of following fifth syllable.
The onset of tone 33 gradually declined from the second, third to the fourth
syllable. The rising f0 of tone 33 at the fourth syllable was due to anticipatory tonal
206 HO-HSIEN PAN
coarticulation with averaged upper mid to high onset of the following fifth syllable.
Both anticipatory and preservatory tonal coarticulation was observed here.
Table 5. One-way ANOVAs (4 focus conditions) on f0 range (Hz), ** p < .001, * p <.05,
NF: Narrow focus, bold face: narrow focus syllable
Table 6. One-way ANOVAs (4 focus conditions) on mean f0, ** p < .001, * p < .05,
NF: Narrow focus, bold face: narrow focus syllable
Table 7. Summary of significant effect of focus on duration (D), f0 range (R), and mean f0 (M)
51 DRM DR DRM DRM RM DRM DRM DRM DRM DRM DRM DRM
According to the Taiwanese acoustical data observed here, narrow focus final words
was distinguishable from final words in broad focus sentences produced by CYS and
LWS, but not for LYK and HYH. The discrepancy between production and
perceptual data in Mandarin can be further explored by comparing the results of
future production and perceptual studies in Taiwanese.
Table 8. Post-hoc Duncan tests on the mean duration and f0 range of the penultimate (fourth)
syllable. Means of the fourth syllable in different focus conditions produced by the same
speaker were significantly different from each other when followed by different alphabets.
Means followed by the same alphabets were not significantly different from each other.
p < .05.
4. DISCUSSION
The f0 and duration data produced by Taiwanese speakers in the present study
revealed five major results. First, the duration of narrow focus syllables was longer
than syllables under other focus conditions. Second, the degree of lengthening due to
narrow focus was affected by a syllable’s position in a sentence. Third, the f0 range
of the narrow focus syllable was expanded. Fourth, the tonal register (f0 level)
contrasts between narrow focus high falling vs. mid falling tones, and between
narrow focus high level vs. mid level tones was maintained even when f0 range was
FOCUS AND TAIWANESE UNCHECKED TONES 211
expanded. Fifth, duration was a more consistent cue than either f0 range or mean f0
values in signaling focus condition in Taiwanese. F0 range and mean f0 value
complement each other in distinguishing focus conditions.
In addition to the effect of focus, tonal coarticulation also influenced the f0
contour in Taiwanese. In Taiwanese the f0 offset target of a dynamic tone occurred
after the offset boundary of a tone bearing unit, while the f0 offset target of a level
tone occurred before the syllable boundary (Pan, 2002). By using only sonorants at
either the beginning or end of a syllable, both anticipatory and preservatory tonal
coarticulation was observed in this study. Preservatory tonal coarticulation was
observed in tones 55, 24, and 51, while anticipatory tonal coarticulation was found
in tones 55, 21, and 33. It was proposed that the preservatory tonal coarticulation
took place during the initial consonant of the syllable, as found in Mandarin (Xu,
1999). To support the claim that preservatory tonal coarticulation occurred during
the initial consonant of the syllable in Taiwanese, further studies with various
syllable structures are necessary.
Among narrow focus second, third, and fourth syllables, the duration of narrow
focus third syllable was the longest, while the duration of the fourth syllable was the
shortest. In Mandarin the duration of the narrow focus third syllable was also the
longest, however the shortest syllable was the second syllable (Xu, 1999). The effect
of focus lengthening was the strongest on the third syllable in both Mandarin and
Taiwanese. According to global final lengthening rules, the duration of the narrow
focus fourth syllable should be longer than the duration of the narrow focus third
syllable, however local focus lengthening interacts with final lengthening here to
determine the surface syllable duration. Focus lengthening exerts a strong effect on
the third syllable but not on the fourth syllable. Narrow focused fourth syllables
appeared to be shorter than narrow focused third syllables in both Taiwanese and
Mandarin data. Further investigations with more variable sentence structures are
needed to explore possible factors such as syllable position, part of speech, and
syntactic or prosodic structures that contribute to the longer duration of narrow focus
third syllable.
In Mandarin with four distinctive f0 contours for each lexical tone, f0 range
expansion was used as the major cue for signaling narrow focus. In Taiwanese,
duration lengthening is a more consistent cue for narrow focus. The fact that there
are two tonal pairs in Taiwanese contrasted mainly by f0 height and not by f0
contour may contribute to the limited manipulation of f0 range in different focus
conditions. To further explore this potential cause, studies on other tonal languages
with tonal pairs contrasting mainly by f0 height are needed.
The study here concentrated only on the effect of focus on Taiwanese unchecked
tones. Taiwanese checked tones are known for their shorter syllable duration and
glottalized voiced quality in contrast with unchecked tones. To fully understand the
influence of focus on duration contrasts between checked and unchecked syllables in
Taiwanese and the influence of focus on voice quality in Taiwanese, further studies
are necessary. The interaction between focus conditions, final and initial lengthening
in different prosodic domains, and tonal coarticulation should also be investigated to
fully understand the interaction of prosodic effects on surface duration and f0
contour in tonal languages.
212 HO-HSIEN PAN
NOTES
This research was supported by grants from National Science Council in Taiwan. Thanks to
Professor Anne Chao and Pi-chiang Li for assistance in statistical analysis.
REFERENCES
Beckman, Mary E., and Jan Edwards. (1990) “Lengthening and Shortenings and the nature of prosodic
constituency.” In Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech
(J. Kingston and M. E. Beckman, editors.):152-178. Cambridge: Cambridge University Press.
Berkovits, Rochele. (1993) “Utterance-Final Lengthening and Duration of Final-Stop closures” Journal
of Phonetics 21 (4): 479-489.
Chao, Yun Ren. (1968) A Grammar of Spoken Chinese, University of California Press.
Cheng, Robert. (1968) “Tone sandhi in Taiwanese.” Linguistics 41: 19-42.
Cheng, Robert. (1973) “Some notes on tone sandhi in Taiwanese.” Linguistics 100: 5-25.
Cooper, William E., Stephen J. Eady, and Pamela R. Muller. (1985) “Acoustical Aspects of Contrastive
Stress in Question-Answer Contexts.” Journal of Acoustical Society of America 77: 2142-2156.
Eady, Stephen J., and William E. Cooper. (1986) “Speech Intonation and Focus Location in Matched
Statements and Questions.” Journal of the Acoustical Society of America 80: 402-416.
Eady, Stephen J., William E. Cooper, Gayle V. Klouda, Pamela R. Mueller, and Dan W. Lotts. (1986)
“Acoustic Characteristics of Sentential Focus: Narrow vs. Broad and Single vs. Dual Focus
Environments.” Language and Speech 29: 233-251.
Fougeron, Cecile. (1999) “Articulatory Properties of Initial Segments in Several Prosodic Constituents in
French.” UCLA Working Papers in Phonetics 97: 74-99.
Gandour, Jack, Siripong Potsuk, and Sumalee Dechongkit. (1994) “Tonal coarticulation in Thai.” Journal
of Phonetics 22: 477-492.
Ho, Aichen T. (1976) “Mandarin Tones in Relation to Sentence Intonation and Grammatical Structure.”
Journal of Chinese Linguistics 4: 1-13.
Jin, Shunde. (1996) An Acoustic Study of Sentence Stress in Mandarin Chinese. Ph.D. dissertation, The
Ohio State University.
Liberman, Mark, and Janet Pierrehumber. (1984) “Intonational Invariance under Changes in Pitch Range
and Length.” In Language Sound Structure (M. Aronoff & R. T. Oehrle, editors): 157-233.
Cambridge, MA: MIT Press.
Lin, Hui-Bin. (1988) Contextual Stability of Taiwanese tones. Ph.D. dissertation, The University of
Connecticut.
Lindblom, Bjorn, and K Rapp. (1973) “Some Temporal Regularities of Spoken Swedish.” Papers from
the Institute of Linguistics, University of Stockholm 21: 1-58.
Pan, Ho-hsien. (2002) “The location of F0 offset for Taiwanese Long Tones” In Speech Prosody 2002:
Proceedings of the first International Conference on Speech Prosody: 555-558.
Peng, Shu-hui (1997) “ Production and Perception of Taiwanese Tones in Different Tonal and Prosodic
Contexts.” Journal of Phonetics 25 (3): 371-400.
Pierrehumbert, Janet. (1980) The Phonology and phonetics of English Intonation. Ph.D. dissertation,
Massachusetts Institute of Technology.
Pierrehumbert, Janet, and Mary E. Beckman. (1988) Japanese Tone Structure. Cambridge, MA: MIT
Press.
Shen, Xiaonan Susan. (1973) “A Pilot Study on the Relation between the Temporal and Syntactic
Structures in Mandarin.” Journal of the International Phonetic Association 22 (1-2): 35-43.
Shih, Chi-Lin. (1988) “Tone and Intonation in Mandarin.” Working Papers Cornell Phonetics Laboratory
No. 3: 83-109.
FOCUS AND TAIWANESE UNCHECKED TONES 213
Shi, Chi-Lin, and Benjamin Ao. (1994) “Duration Study for the AT&T Mandarin Text-to-Speech
System.” In Conference Proceedings of the second ESCA/IEEE Workshop on Speech Synthesis:
29-32.
Xu, Yi. (1999) “Effects of Tone and Focus on the Formation and Alignment of f0 Contours.” Journal of
Phonetics 27: 55-107.
Xu, Yi. (1997) “Contextual Tonal Variations in Mandarin.” Journal of Phonetics, 25: 61-83.
ELISABETH SELKIRK
1. INTRODUCTION
In this paper, I want to investigate the consequences of an idea about focus prosody
that was first put forward by Jackendoff 1972, namely the hypothesis that the focus-
phonology interface in grammar is expressed as a relation between focus-marked
syntactic constituents on the one hand, and prosodic stress prominence on the other.
A strong form of the hypothesis, advocated in Truckenbrodt’s 1995 thesis and
pursued here and in other recent work of mine (e.g. Selkirk 2002), is that the focus-
phonology interface consists only of interface constraints on the relation between
syntactic focus and prosodic prominence. All the other predictable, non-
morphological, phonological properties of focus are claimed to be derived as a
consequence of phonological markedness constraints on the relation between
prosodic prominence and other aspects of phonological representation. This
proposal can be called the Focus-Prominence theory of the focus-phonology
interface. I think this theory provides an insightful account of the array of
phonological properties that are associated with focus crosslinguistically, and at the
same time explains the observed generalizations about focus projection and the
distribution of focus-related prominence within the sentence. The question of focus
projection is not addressed in this paper (but see Selkirk 1999, 2000; Selkirk and
Katz, in preparation). What I want to show here is that Focus Prominence theory
provides the basis for an understanding of focus-related phonological phrasing. In
this I am following a path first charted out by Truckenbrodt 1995.
Focus constituents are claimed to display a variety of prosodic properties
crosslinguistically:
215
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 215–244.
© 2007 Springer.
216 ELISABETH SELKIRK
An example of the focus phrasing seen in Bengali appears in sentence (2) below.
(2) is a sentence with a sentence-medial contrastive focus appearing on a medial
constituent within a left branching object noun phrase. The surface syntactic
structure which we tentatively assume for this focus-marked sentence structure is as
in (1). The prosodic phrasing structure in (3), which is an all-new, out of the blue,
utterance of the same sentence structure, but minus the focus marking, should be
contrasted to that in (2)8.
(1) S
VP
NP
PP
NP
NP NP N-FOC P N V
ami raj‡a-r c‡hobi-r j‡onno ˇaka anlam
I king’s PICTURES for money gave
‘ I gave money for the king’s PICTURES.’
(2)
phrase-edge not prominence-related
L* HP L* HP LI
((ami raj‡ar) ( c‡hobir ) j‡onno ˇaka anlam )IP
I king’s PICTURES for money gave.
These are both declarative utterances. The phrasing of the neutral focus sentence (3)
puts the subject, the complex NP object, and the verb each in a separate
phonological phrase. (Nonfinal phonological phrases are in general marked by two
tonal events--the presence of a L* pitch accent on the main stressed syllable in the
phrase and the presence of a HP peripheral tone at the right edge of the phrase.) The
focus sentence (2) alters the otherwise default phrasing in flanking the focus
constituent, here a head noun internal to the complex noun phrase, with the left and
right edges of a phonological phrase. The arrow marks the problematic right
phonological phrase edge found at the right edge of the focus, the phrase edge that
the Focus Prominence hypothesis can’t account for.
Aside from the flanking of a contrastive focus constituent with phonological
phrase edges, there is another important property of sentences with focus in Bengali,
namely the absence of any phonological phrase following the focus constituent. This
is visible in (2) through the absence of any pitch accent or nonfinal peripheral tones
following the focus. We will see that this apparent “dephrasing” can also be given
218 ELISABETH SELKIRK
In its definition of focus prominence this theory does not distinguish between types
of focus (e.g. contrastive vs. presentational) and their associated types of domain
constituent. Nor does the theory assure a regular prosodic level of prominence for
the different focus types. In other work, however, this simplicity is shown to be
problematic for the characterization of at least a certain range of focus phenomena
(see Selkirk 2000, 2002; Sugahara 2002, 2003). So in what follows I will assume a
paradigmatic theory of Focus Prominence, leaving open the question whether the
syntagmatic version above is also required in grammar.
The paradigmatic theory of Focus Prominence that I am entertaining posits a
family of Focus Prominence constraints of the general form in (5), according to
which a focused constituent of a particular morphosyntactic structure type must
contain a phonological prominence of a particular prosodic structure type:
ƒ ( Xn) ⊂ ∆ (π)
A presentational focus has the property of newness in the discourse, and its
semantics is characterizable in terms of the theory proposed by Schwarzschild
(1999). It will sometimes be notated with initial caps as Focus and nicknamed as
‘small’ focus. As for the prosodic category name ‘major phrase’, this is the level of
phrasing immediately below the intonational phrase, sometimes also referred to as
‘intermediate phrase.’ I have chosen the term ‘major phrase’ for its mnemonic value,
since the level of prosodic major phrase is identified by its alignment with the
morphosyntactic maximal projection phrase.
Notice that these hypothesized constraints of the paradigmatic Focus Prominence
theory make the felicitous prediction that the phonological properties of big,
contrastive, FOCUS are either a superset of those of small, presentational, Focus, or,
if different, then are characteristic of a higher level of prominence than those of
small focus. This is because, given the nature of prosodic structure, the ∆IP called
for in a big focus constituent is necessarily also a ∆MaP, and ∆MaP is what is called
for in a presentational focus phrase. That is, both contrastive and presentational
focus will be called on by constraints to show the properties of a ∆MaP, but only
contrastive focus will be called on to show the properties of the higher level ∆IP.
Call this prediction big focus-small focus containment. This point becomes clear
when we examine the definitions for designated terminal element and prosodic head
and apply them to an example.
(8) Def: A head of a prosodic constituent π is (i) the most prominent prosodic
constituent immediately dominated by π (the π-prom of ʌ) or (ii) the most
prominent prosodic constituent immediately dominated by a head of π.
Note that the sample representation (10) satisfies the Focus Prominence constraint in
(5) which requires that the contrastively focused word Mississippi contain the
designated terminal element of an intonational phrase IP.
BENGALI INTONATION REVISITED 221
According to the recursive definition of head given above, the boldfaced head
constituents are all heads of IP. Assuming that moras are part of the terminal string,
the penultimate mora in Mississippi is the designated terminal element of IP. This is
because it is the head mora of the head syllable of the head foot of the head prosodic
word of the head minor phrase of the head major phrase of the intonational phrase.
Turning to Bengali, we will assume that the focus type whose prosodic
properties are being described in the Hayes and Lahiri paper is big, contrastive,
focus. Their examples of focus involve cases of explicitly contrastive focus or
answers to wh-questions. So we will be investigating in Bengali the consequences of
assuming that a big focus (FOCUS) constituent contains the DTE of an Intonational
Phrase, as called for by the big FOCUS Prom constraint in (5). The properties of
presentational focus in Bengali have not yet been submitted to a systematic
investigation.
(10) IP
|
MaP π-prom of IP = MaP
|
MiP π-prom of MaP = MiP
Ft Ft Ft π-prom of PWd = Ft
σ σ σ σ σ σ π-prom of Ft = σ
| | | | | |
µ µ µ µ µ µ π-prom of σ = µ
v Ι s Ι t [M Ι ss Ι ss Ι pp Ι ]FOC
FOCUS-Prom ⇒ IP
|
MaP
|
PWd
|
Ft
|
σ
|
µ = ∆IP
[[ami] [[[[raj‡a-r] [c‡hobi-r]FOC] j‡onno] ˇaka] [anlam]]]
I king’s PICTURES for money gave
‘I gave money for the king’s PICTURES’
In meeting the requirements of the FOCUS-Prominence constraint, head
constituents are defined at all prosodic levels lower than IP. Now, the grammar
contains a class of prosodic markedness constraints that call for the alignment of
these prosodic head constituents with the right or left edge of their mother prosodic
constituents (McCarthy and Prince 1993) such as the well-attested Align R/L (Ft,
PWd). Hayes and Lahiri argue that a phonological phrase has its head at the left
edge of the phrase, giving a pattern of left edge phonological phrase prominence. I
will express this constraint as Align L (PWd, MaP), assuming that the phonological
phrase appealed to in the constraint is at the level of the major phrase and that it is a
prosodic word level head-constituent that is aligned with the MaP left edge. (This
analysis ignores for reasons of expository convenience the possibility that there may
be an additional level of phonological phrase (the Minor Phrase) intervening
between PWd and MaP, as does the analysis of Hayes and Lahiri.) Following
Truckenbrodt’s 1995 analysis of the left phonological phrase edge that appears with
FOCUS in Japanese, my analysis of Bengali gives this prosodic alignment constraint
the responsibility for the flanking of Bengali FOCUS with a left phonological phrase
edge, as shown in (12a). [Note that (12a) is only a partial prosodic tree and (12b) is a
partial prosodic labelled bracketing.]
BENGALI INTONATION REVISITED 223
b.
IP ((ami raj‡ar)MaP MaP(c‡hobir j‡onno ˇaka anlam)IP
I king’s PICTURES for money gave.
On this proposal, then, a constraint like AlignL (PWd, MaP) has in general two
functions. Here it induces the presence of a phonological phrase edge at the edge of
a prosodic prominence whose position with respect to the syntactic structure is fixed
by the FOCUS-Prom constraint. In cases where the location of the prominence is not
fixed by an interface constraint, the same constraint predicts that the prominence
will fall wherever the grammar determines that the left edge of a phonological
phrase might appear. This two-fold effect follows from the fact that the locus of
prosodic prominence may either be fixed independently in which case the edge
comes to align with it, or the locus may not be fixed independently, in which case
the prominence locates itself wherever the grammar may call for a phrase edge.
As for appearance of the right edge of a phonological phrase edge seen in (13) at
the right edge of FOCUS, I argue in the following section that it is to be ascribed to
the presence of the tonal morpheme [H]FOC at the right edge of the FOCUS
constituent in morphosyntactic structure.
224 ELISABETH SELKIRK
b. [H]FOC [L]DECL
h
IP(ami raj‡ar) MaP(c‡ obir)MaP j‡onno ˇaka anlam)IP
pitch accents, nor any phrase-edge-marking H peripheral tones. Since tones mark
these prosodic structure landmarks of a phonological phrase by default, the absence
of the tones is most straightforwardly explained by the post-FOCUS absence of the
phonological phrasing and prominence that trigger the presence of these tones. This
sort of post FOCUS “dephrasing” is argued by Truckenbrodt 1995 to result from a
constraint which calls for the prosodic head of an intonational phrase to align with
the right edge of the IP. Any phonological phrase intervening between the FOCUS
phrase and the right edge of the intonational phrase would be disaligning and so
produce a non-optimal prosodic representation for the sentence. In particular, after
FOCUS one never sees the appearance of the phrasing normally associated with the
matrix verb. So the provisional constraint “Verb-ϕ Align” (see footnote 8) must be
dominated by the IP-level prosodic alignment constraint. I will assume that the
“dephrasing” observed in the optimal candidate moreover constitutes a violation of
Exhaustivity (IP), hence:
ϕ Align” ⇒
(14) Align R (MAP, IP) >> Exhaustivity (IP), “Verb-ϕ
IP
MaP MaP
The section below is devoted to establishing that the account I have proposed for
the appearance of a phonological phrase edge at the right of the FOCUS constituent
is well founded. It will rely on establishing the morpheme status of the H tone that
flanks the FOCUS constituent on the right as well as establishing the existence of a
morpheme-specific alignment constraint that may induce the presence of a
phonological phrase edge at the edge of the FOCUS tonal morpheme.
patterns in the language. Specifically, the aim is provide a complete account of the
tonological differences between declarative and question utterances under both
“neutral” and contrastive focus conditions. We will see that the H tone that appears
at the right edge of a FOCUS constituent has a significantly different behavior from
the peripheral default H tone that is the regular marker of right edge of phonological
phrase.
“Align the R edge of a major phonological phrase with the R edge of a H tone.”
[= the source of the default High edge tone]
∆MaP, Tone)
b. Associate (∆
The tableau in (16) illustrates the role for these constraints in deriving the tones
of the initial phonological phrase from the sentence in (2):
BENGALI INTONATION REVISITED 227
(16)
OCP Align R Assoc *Tone
…. [ ami ] [[ raj‡a-r]... (MaP, H) (∆MaP, T)
a. … ( ami raj‡ar)MaP…. *! *
⇒ L H **
b. … (ami raj‡ar )MaP
H H *! **
c. … (ami raj‡ar )MaP
The two constraints in (15), which call for the presence of tone in the representation,
crucially outrank the constraint *Tone, which minimizes the presence of tone in the
representation. The OCP adjudicates the choice of tone, and is not crucially ranked
with respect to the others. (The non-ranking among the higher constraints is
provisional.)
There is another role for the OCP. In addition to assuring the non-identical
character of the tones introduced by default into the representation, as here, Hayes
and Lahiri also propose that it is responsible for the failure of the default H edge
tone to appear in the first place, when it is followed by another H tone in the
utterance. This effect is seen in (3), where the bracketed perpherial <H> tone is
actually not realized, because of the H* that follows in the next phrase. The absence
of that peripheral H will be analyzed below.
(19) PF PR
(morphosyntactic interface) (surface phonological representation)
(20)
The constraints Realize [L]DECL and Realize [HL]QUES mentioned in the tableau
assure that the tones of a tonal morpheme in the input are maintained in the output,
in the quality specified in the input; these constraints are members of the family of
constraints which require that a morpheme have some phonological realization in
the output. I will assume that the general character of these Realize constraints for
tonal morphemes is as in (21).
Together with the OCP, these faithfulness constraints assure that the default pitch
accent in the final phrase is the polar opposite of the following lexically specified
boundary tone morpheme. So just as the quality of the L* pitch accent in nonfinal
phrases is determined by constraint, so is the quality of the pitch accents in the final
phrase.
Note that the constraint MaxTone, which calls for an input tone to have a
corresponding tone in the output (McCarthy and Prince 1995), cannot be given the
function of maintaining the tonal morphemes in the output. Bengali is not a tone
230 ELISABETH SELKIRK
language, in which tonal contrasts in morphemes which also have segmental content
are preserved on the surface. Rather, assuming Richness of the Base (Prince and
Smolensky 1993), *Tone must be ranked above Max Tone in order to ensure that
any nonmorphemic tones are eliminated in the output. But *Tone must be ranked
below the morpheme realization constraints of the form Realize [Tone]M. An
intonational language, which lacks lexical tone contrasts expected for those found in
tonal morphemes, is thus characterized by the ranking Realize [Tone]M >> *Tone
>> MaxTone.
(22)
L
Realize OCP Assoc Align R *Tone
[[raj‡a-r c‡hobi-r j‡onno ˇaka] [anlam]][ ] DECL] α (∆MaP (MaP, H)
Tone)
L* H H* [L]DECL *! ****
a. (raj‡ar c‡hobir j‡onno ˇaka) (anlam))
L* H [L]DECL *! ***
b. (raj‡ar c‡hobir j‡onno ˇaka) (anlam))
⇒ L* ø H* [L]DECL * ***
c. (raj‡ar c‡hobir j‡onno ˇaka) (anlam))
H* L H* [L]DECL * ****!
d. (raj‡ar c‡hobir j‡onno ˇaka) (anlam))
BENGALI INTONATION REVISITED 231
The optimal candidate c. shows a violation of the constraint Align R (MaP, Tone);
the ranking of this constraint below the OCP and the Assoc (¨MaP, Tone) allows for
this candidate to emerge as the winner. Candidate c. shares an Align R (MaP, H)
violation with the nonoptimal candidate d, because both the absence of a H tone and
the appearance of a L tone instead of H constitute violations of this constraint.
Candidate d. is therefore ruled out by its greater number of violations of the
structure-minimizing constraint *Tone. No higher ranked constraint calls for the
presence of a default peripheral L at the edge of major phrase, so *Tone rules it out.
(23) Neutral focus declaratives lack a phrase-peripheral High tone in the final
phrase:
H* [L]
a. ok
…….. ( ¨….. µ)MaP)IP
L* H[L]
b. * …….. ( ¨….. µ )MaP)IP
If the default H were to surface, the tonal pattern to be predicted by the OCP would
be identical to that found in interrogatives, namely a L* pitch accent followed by a
HL boundary sequence, as in (23b). Homophony avoidance is transparently not a
factor in ruling out this candidate for the declarative pattern, however, since
homophony of distinct sentence types is not avoided in Bengali. As we will see
below, a declarative with a contrastive FOCUS in the final phrase has exactly the L*
HL pitch pattern found in the interrogative. Rather, the impossibility of the pattern
in (23b) is analyzable as a consequence of the constraint system. Basically, the
proposal is that the tonal alignment constraint Align R (MaP, H), which is violated
in the optimal candidate (17), is dominated by the morpheme-specific alignment
constraint Align R ([L]DECL, IP) and the well-known tonal markedness constraint
*Contour Tone. (24) gives the ranking that will derive the absence of the default
peripheral H tone in the final phrase and (25) is the tableau that shows it. (The pitch
accents of the final phrase are not shown in the schematic phrase-final
representations in (25).)
(25)
H *! *
a. …. (… µ µ)MaP)IP
[H] *! *
b. …. (… µ µ)MaP)IP
[L] H *! **
c. …. (… µ µ )MaP)IP
H[L]
*! **
d. …. (… µ µ )MaP)IP
H [L] * **!
e. …. (… µ µ )MaP)IP
⇒ [L] * *
f. … . (…µ µ )MaP)IP
Candidate f., with its simple declarative [L] morpheme, is the optimal one. It
violates Align R (MaP, H), but does not show the violations of the higher ranked
constraints seen in candidates a.- d., and has fewer violations of *Tone than
candidate e. has. The constraint *Contour Tone introduced here is a tonal
markedness constraint familiar from much previous research. Its essential role is to
disallow the case where both the peripheral default H and the tonal morpheme
[L]DECL are associated to the same tone-bearing unit, i.e. the same mora. As for the
constraint Align R ([L]DECL, IP), it has the function of ruling out candidate c. in this
tableau, in which the declarative morpheme is associated to the penultimate mora of
the phrase rather that to the edge mora, to which the default H edge tone is
associated here. Morpheme-specific subcategorizational alignment constraints like
Align R ([L]DECL, IP) are made explicit or presupposed in the the literature
(Gussenhoven 2000 , Grice et al 2000), where they are given the function of
linearizing tonal morphemes within the prosodic representation.
(27)
Realize [L]DECL, AlignR ([L]DECL, IP)
AlignR (MaP, H)
*Tone
(28) FOCUS constituent is final in the declarative sentence (on the verb)
L*HP L* HP L* H [L]
(ami) (raj‡ar c‡hobir j‡onno ˇaka) (anlam)
I king’s pictures for money GAVE
L* HP L H [L]
(ami raj‡ar) (c‡hobir) j‡onno ˇaka anlam)
I king’s PICTURES for money gave.
The H peripheral tone of a FOCUS, marked in bold italics, always flanks the right
edge of the morphosyntactic FOCUS constituent and so differs in its distribution
from the declarative morpheme [L], which is confined to the right edge of the
sentence. If the FOCUS constituent is not final in the sentence, the H appears at its
non-final right edge, at a distance from the L tone at sentence end.
tone seen in nonfinal phonological phrases. Hayes and Lahiri base the argument on
the contrast between the final tonal patterns of nonFOCUS declaratives like (17) and
FOCUS declaratives like (28). The contrast is not in the final L tone, which is
common to both forms of the declarative. The contrast is also not in the tonal value
of the pitch accents, which are predictable on the basis of the quality of the
following peripheral tone. It is the presence of the peripheral H tone in final FOCUS
declaratives like (28) which is contrastive. That peripheral H in (28) must be
morphemic. As we saw above, it cannot be an instance of the default peripheral H
tone, which simply fails to appear in nonFocus declaratives like (17). So we must
posit a FOCUS tonal morpheme--[H]FOC, an entity whose presence in the
representation can be assured by a morpheme-realization constraint. As we will see,
it is this morphemic status which permits an explanation for the distribution of this
H tone in (28) and (29), and for the appearance of the right edge of phonological
phrase at the right edge of the FOCUS constituent.
A contour tone consisting of the [H]FOC morpheme and the [L]DECL morpheme is
formed at sentence edge in the final FOCUS case. The simple presence of these
tones in the representation is guaranteed by morpheme realization constraints, but
faithfulness does not guarantee the joint positioning of the tonal morphemes at the
right extreme of the utterance, in violation of *Contour Tone. The creation of the
contour tone must be forced by constraints requiring that these morphemes appear at
a phrase edge. Such an alignment constraint was proposed above for the declarative
morpheme, namely (26), Align R ([L]DECL, IP). For the FOCUS morpheme, the
constraint should be formulated as an alignment with the edge of a phonological
phrase:
(32)
[H] [L]
⇒ *
a. …. (…µ µ)MaP)IP
[H]
*!
b. …(… µ µ)MaP)IP
[L]
*!
c. …(…µ µ )MaP)IP
[H][L]
*!
d. ...(… µ µ )MaP)IP
Note that this analysis assumes that the alignment constraints for both [H] and [L]
tonal morphemes are satisfied by an association to the final tone-bearing unit of the
phrase, as seen in candidate a. In other words, the [H] in the optimal candidate a. is
considered to be right-aligned even if it precedes the [L] within the phrase.
Candidate d. shows a real misalignment of the [H], however, in being associated to
the penultimate mora. In candidates b. and c., it is the disappearance of the input
tonal morphemes, in violation of the morpheme realization constraints, which
accounts for the ungrammaticality of the forms. What we don’t yet have an
explanation for is the ungrammaticality of an additional candidate where the order
of the morphemes in the final contour tone is simply the opposite of what we see in
candidate a. Some additional principle would be required to account for the
optimality of candidate a over this alternative. In the spirit of Pierrehumbert and
Beckman 1988, one might assume that an IP-aligned edge tone must lie outside a
MaP-aligned edge tone. But there is also a possible explanation based on the
positioning of these tonal morphemes in the morphosyntactic structure, where the
sentence-peripheral [L] declarative tone lies higher up and to the right of the focus
[H] tone, which marks a constituent lower down in the sentence.
To sum up, the two morpheme-specific constraints for the FOCUS morpheme--
Realize [H]FOC and AlignR ([H]FOC, MaP)-- have been brought into play in this
section and the constraint ranking has been refined. The full constraint ranking is
now as in (33).
236 ELISABETH SELKIRK
(33)
Realize [H]DECL AlignR ([L]DECL, IP) Realize [H]FOC AlignR ([H]FOC, MaP)
Align R (MaP, H)
*Tone
MaxTone
In the next section we will see that the constraints motivated here will also enable us
to account for the characteristic tone and phrasing properties of nonfinal FOCUS in
Bengali.
(34) L* H L* H LI
((ami raj‡ar) (c‡hobir) j‡onno ˇaka anlam)IP
(35)
S
VP
NP
PP
NP
NP NP NFOC P N V
| | |\
N N N [H]FOC
ami raj‡a-r c‡hobi-r j‡onno ˇaka anlam
I king’s PICTURES for money gave
‘I gave money for the king’s pictures’
What immediately meets the eye (and ear), is that the right edge of the
nonfinalFOCUS constituent is marked by a H tone. We must presume that this is the
same focus morpheme [H]FOC that is observed when the FOCUS is final in the
sentence. For explicitness, let’s take the FOCUS morpheme to be adjoined as a
BENGALI INTONATION REVISITED 237
suffix to a word (as in (35)) or a larger phrase, where it licenses the FOCUS
property on the dominating node, which in turn gets interpreted as FOCUS in the
semantics. Given the position of the [H]FOC morpheme as a suffix of the FOCUS
constituent in the syntax, the interface and markedness constraints in (33) will
guarantee that in the phonological representation of the sentence the right edge of
the FOCUS constituent will correspond to the right edge of a major phrase in the
declarative case given in (34). The constraint AlignR ([H]FOC, MaP) plays a crucial
role in deriving this result.
The analysis goes as follows. The FOCUS morpheme [H]FOC is forced by
faithfulness to the syntactic representation (call this “Syntax Faith” for short) to
remain in its syntactically specified position as a suffix at the right edge of the
FOCUS constituent. Confined to that position, the FOCUS morpheme is nonetheless
required to satisfy its own morpheme-specific interface alignment constraint,
AlignR ([H]FOC, MaP), which calls for the morpheme to appear at the right edge of a
major phrase in phonological representation. Since the position of the [H]FOC is fixed
by the syntax in a context in which the right edge of phonological phrase may not be
called for, satisfaction of the alignment constraint may require that the phrase edge
be introduced into the representation. In other words, AlignR ([H]FOC, MaP) may in
effect induce the presence of the phrase edge. This is the case in (34)/(35), as seen in
(36):
Candidate c. moves the [H]FOC to coincide with a MaP edge at the end of the
sentence, and so violates Syntax Faith. Candidate b. lacks a right edge of MaP at the
[H]FOC in situ position, and so violates AlignR([H]FOC,MaP). Candidate a., which
respects both these constraints, is the optimal one. As for candidate d., it contains
the phonological phrase edge that is otherwise always present at the left edge of the
verb, as well as the edge induced by the FOCUS morpheme, all organized into a
prosodic structure respecting Exhaustivity. But, as was proposed in section 2, this
post-FOCUS phrasing is ruled out by the markedness constraint which aligns the
head MaP with the right edge of IP. The optimal candidate a. lacks any Major
238 ELISABETH SELKIRK
Phrase intervening between the head MaP of the FOCUS and the end of the
sentence, and so is not considered to count as a violation of AlignR (MaP, IP). It
does show a violation of the lower ranked constraint Exhaustivity (IP) (Selkirk
1995), which requires the Intonational Phrase to immediately dominate only major
phrases, i.e. constituents at the next level down in the prosodic hierarchy.
So this, then, is the explanation for the presence of the right edge of phonological
phrase at the right edge of a nonfinal FOCUS constituent in Bengali. The FOCUS
morpheme, through its own, independently motivated, subcategorizational prosodic
alignment constraint AlignR ([H]FOC, MaP), induces the presence of the phrase edge
observed. This means that there is no reason to follow Hayes and Lahiri in positing a
FOCUS-prosody interface constraint which aligns the right edge of a FOCUS
syntactic constituent with the right edge of phonological phrase. The Hayes and
Lahiri analysis is incompatible with the Focus Prominence theory of the interface of
focus and phonology, so it is a welcome result that there is an alternative to that
theory which falls out from the independently motivated analysis of Bengali
intonation that has been proposed here.
While the current proposal might be preferable on the grounds of theoretical
economy, given that it successfully excludes the class of Focus-Phrasing interface
alignment constraints from universal grammar, it would desirable to clinch the case
on the basis of empirical fact. Fortunately, the facts are in principle available,
though they have not yet been investigated. In a current collaborative project with
Aditi Lahiri, we hope to bring the facts to light.
The theory proposed here predicts that if the [H]FOC morpheme is for some
reason absent at the right edge of a FOCUS constituent in surface representation,
there should be no right edge of phonological phrase at that location. The Hayes and
Lahiri theory predicts on the other hand that, regardless of the presence or absence
of the FOCUS morpheme, a phonological phrase edge should appear at the right
edge of a syntactic FOCUS constituent. Now there happens to be a case of nonfinal
FOCUS in Bengali where the [H] FOC morpheme fails to be realized in the output.
This occurs in interrogatives, where <H> indicates the deleted FOCUS [H] tone:
The tonal morpheme for interrogatives is [HL]QUES, and, as Hayes and Lahiri point
out, the absence of the FOCUS morpheme [H]FOC at the right edge of the FOCUS
constituent could be attributed to the OCP. Given the Hayes and Lahiri analysis of
FOCUS phrasing, there are no implications of this tonal deletion for the phrasing.
But in the analysis of Bengali intonation that I have proposed, the loss of the tonal
morpheme implies an absence of phonological phrase edge at the right edge of the
FOCUS constituent, since there is no other constraint that would produce that
phrasing. Now it turns out that there is a way of probing this difference in phrasing
predictions in Bengali.
BENGALI INTONATION REVISITED 239
The optimal candidate d. lacks the FOCUS morpheme, and in so doing respects the
higher ranked OCP and Realize [HL]QUES, while incurring a violation of Realize
[H]FOC. In this optimal candidate, there is no phrase edge at the right of the FOCUS
since there is no [H]FOC to require it. Note that candidate e. has the same tones as the
optimal d. but differs in having a phrase edge present at the right edge of the
FOCUS. In this particular case a phrase edge in that medial position would be ruled
out by the constraint Exhaustivity (IP), since the stretch between the major phrase it
demarcates and the end of the intonation phrase is not itself parsed into major
phrase.
Observe that the new ranking in (38) is consistent with the other rankings
motivated above for Bengali tonology. (40) is the summary ranking in (33),
modified in virtue of (38). Here the OCP is promoted from the lower rank it had
been given in (33) for want of any further evidence.
Realize [H]DECL AlignR ([L]DECL, IP) Realize [H]FOC AlignR ([H]FOC, MaP)
Align R (MaP, H)
*Tone
MaxTone
The claim embodied by exploiting a tonal grammar of this sort is that the
tonal/intonational patterns of sentences—in any language-- must be seen as deriving
from the interaction of different types of constraints, including morpheme-specific
realization and alignment constraints, generic faithfulness constraints like MaxTone,
prosodic enhancement constraints calling for (default) pitch accent or edge tones,
and classic tonal markedness constraints like the OCP and *Contour Tone. Of
course these tonal constraints interact with the constraints of the grammar which
define the prosodic structure of sentences. They may either collaborate within a
prosodic structure that is independently defined, or, as in the case of the morpheme-
specific constraint AlignR ([H]FOC, MaP), may in fact be responsible for the
presence of some aspect of prosodic structure.
4. SUMMARY
In the early sections of the paper, I sketched out a theory of Bengali FOCUS-related
phrasing that would be consistent with the Focus Prominence hypothesis, and in the
last section this theory was further fleshed out, and shown to be viable. To
summarize, the constraints and rankings crucially involved in the analysis of
Bengali FOCUS phrasing patterns are:
BENGALI INTONATION REVISITED 241
assuming that the OCP governs possible output representations has permitted a
pared down theory of what the tonal morphemes of Bengali are in the first place,
restricting them in this language to sentence-final morphemes, as in the case of the
declarative [L] and interrogative [HL] illocutionary force morphemes, or to the
constituent-final [H] FOCUS morpheme. All other tones in Bengali intonation are
analyzable as default tones, whose presence, and quality, is determined by
phonological markedness constraints.
5. NOTES
* The research for this paper was supported in part by National Science Foundation grant BCS000438
The Reflexes of Focus in Phonology, Principal Investigator: Elisabeth Selkirk.
1
[H*+L]FOC pitch accent in European Portuguese(Frota 2000), [H]FOC phrase-edge tone in Bengali
(Hayes and Lahiri 1991), [H]FOC accent-tropic tone in Swedish (Bruce 1977)
2
Selkirk 1984, 1995 proposes that pitch accents are a default reflex of the presentational Focus status of a
word in English. Selkirk 2002 suggests that it is the L+H* which appears by default with contrastive
FOCUS.
3
Hungarian (Vogel and Kenesei 1987), Japanese (Pierrehumbert and Beckman 1988), Chichewa
(Kanerva 1989), Shanghai Chinese (Selkirk and Shen 1990), and others
4
Jackendoff 1972, Hayes and Lahiri 1991, Reinhart 1995, Roberts 1996
5
Pierrehumbert and Beckman 1988, Inkelas and Leben 1990
6
European Portuguese (Frota 2000)
7
Note than I am not saying that there are no alignment constraints at all which characterize the syntax-
phonology interface. Indeed, there is evidence that, independent of focus, you do need interface
constraints aligning the edges of syntactic constituents defined in X-bar level terms with prosodic
constituents at a designated level, e.g. Align R/L (XP, MaP) (see Selkirk 1986 et seq, Nespor and Vogel
1986, Chen 1987, Truckenbrodt 1998, Sugahara 2003, among others).
8
The position of the verb in the surface representation of these sentences is particularly in need of
clarification. Given the structure in (1), there can be no principled explanation for the systematic
appearance of a phonological phrase break at the left edge of the verb, seen in nonFOCUS sentences such
as (3) . But since this aspect of Bengali phonological phrasing is not of immediate concern, I will
continue to assume the structure in (1). It at least shows the analysis in terms of noun phrases that will
survive regardless of the ultimate decision about their position in a higher order syntactic structure.
9
For example, in some languages with lexical pitch accent, words lacking pitch accents in their input
form receive a default pitch accent on the main stressed syllable of the output form. See Zec 1999 on
Serbo-Croatian, Lahiri 2002 on Swedish .
10
Beginning with the analysis of “initial lowering” in Japanese as an alignment of L and H peripheral
tones (Poser 1984, Pierehumbert and Beckman 1988), there have been a variety of languages analyzed as
showing default, constraint-introduced edge tones, including the medial MaP-edge L phrase tone of
English (Selkirk 2000), the LH phrase edge tone of Korean (Jun 1993), etc.
6. REFERENCES
Bruce, Gösta. Swedish Word Accents in Sentence Perspective. Lund: Gleerup, 1977.
Chen, Matthew. “The syntax of Xiamen tone sandhi.” Phonology 4 (1987): 109-150.
Frota, Sonya. Prosody and Focus in European Portuguese: Phonological Phrasing and Intonation. New
York: Garland Publishing, 2000.
BENGALI INTONATION REVISITED 243
Grice, Martine, D.R. Ladd and Amalia Arvaniti. “On the Place of Phrase Accent in Intonational
Phonology.” Phonology 17 (2000): 143-185.
Gussenhoven, Carlos. The Lexical Tone Contrast of Roermond Dutch in Optimality Theory. In M. Horne
(ed.), Prosody: Theory and Experiment. Dordrecht: Kluwer Publishing, 2001.
Hayes, Bruce and Aditi Lahiri. “Bengali Intonational Phonology.” Natural Language and Linguistic
Theory 9 (1991): 47-96.
Inkelas, Sharon and W. R. Leben. “Where Phonology and Phonetics Intersect: The Case of Hausa
Intonation. In J. Kingston and M. Beckman (eds.), Papers in Laboratory Phonology 1: Between the
Grammar and Physics of Speech, pp. 17-34. Cambridge, Cambridge University Press, 1990.
Jackendoff, Ray. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press, 1972.
Jun, Sun-Ah. The Phonetics and Phonology of Korean Prosody. New York: Garland Publishing, 1995.
Lahiri, Aditi, A. Wetterlin, and E. Steiner. “Unmarked Tone in Scandinavian.” Manuscript. Fachbereich
Allgemeine Sprachwissenschaft, Unversity of Konstanz, 2002.
Kanerva, Jonni. Focus and Phrasing in Chichewa Phonology. Stanford University: Doctoral dissertation,
1989.
Kanerva, Jonni. “Focusing on Phonological Phrases in Chichewa.” In S. Inkelas and D. Zec (eds.), The
Phonology-Syntax Connection, pp. 145-162. Chicago: University of Chicago Press, 1990.
McCarthy, John and Alan Prince. “Generalized alignment.” In G. Booij and J. van Marle (eds.), Yearbook
of Morphology, pp. 79-153. Dordrecht: Kluwer, 1993.
Nespor, Marina and Irene Vogel. Prosodic Phonology. Dordrecht: Foris, 1986.
Pierrehumbert, Janet and Mary Beckman. Japanese Tone Structure. Cambridge, Mass.: MIT Press, 1988.
Poser, William. The Phonetics and Phonology of Tone and Intonation in Japanese. MIT: Doctoral
dissertation, 1984.
Prince, Alan and Paul Smolensky. Optimality theory: Constraint Interaction in Generative Grammar.
Manuscript, Rutgers University and Johns Hopkins University, 1993.
Reinhart, Tanya. “Interface Strategies.” OTS Working Papers, OTS-WP-TL-95-002, Utrecht University,
1995.
Rizzi, Luigi. “The Fine Structure of the Left Periphery.” In L. Haegemann (ed.), Elements of Grammar.
Handbook of Generative Syntax, pp. 281-337. Dordrecht: Kluwer, 1997.
Roberts, Craige. “Focus, Information Flow and Universal Grammar.” In P. Culicover and L. McNally
(eds.), The Limits of Syntax, pp. 109-160. New York, Academic Press, 1998.
Rooth, Mats. “A Theory of Focus Interpretation.” Natural Language Semantics 1 (1992): 75-116.
Rooth, Mats. “Focus.” In S. Lappin (ed.), The Handbook of Contemporary Semantic Theory. . London,
Blackwell, 1996a.
Rooth, Mats. “On the Interface Principles for Intonational Focus.” Proceedings of SALT VI, pp. 202-226.
Ithaca, NY: Cornell University, 1996b.
Schwarzschild, Roger. “Givenness, Avoid F, and Other Constraints on the Placement of Accent.” Natural
Language Semantics 7 (1999): 141-177.
Selkirk, Elisabeth. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass.:
MIT Press, 1984.
Selkirk, Elisabeth. “Sentence Prosody: Intonation, Stress, and Phrasing.” In John Goldsmith (ed.), The
Handbook of Phonological Theory, pp. 550-569. Cambridge: Blackwell Publishers, 1995.
Selkirk, Elisabeth. “Interface Constraints on Focus.” Talk delivered at the Workshop on Syntax-
Phonology Interface, Linguistic Society of Japan, Tokyo, November 1999.
Selkirk, Elisabeth. “Focus Types and Tone.” Paper presented at the First North American Phonology
Conference, Concordia University, Montreal, 2000.
Selkirk, Elisabeth. “The Interaction of Constraints on Prosodic Phrasing.” In M. Horne (ed.), Prosody:
Theory and Experiment, Dordrecht: Kluwer Publishing, 2001.
Selkirk, Elisabeth. “Contrastive FOCUS vs. Presentational focus: Prosodic Evidence from Right Node
Raising in English.” In B. Bel and I. Marlin (eds.), Speech Prosody 2002: Proceedings of the First
International Speech Prosody Conference, pp. 643-646. Laboratoire Parole et Langage, Université de
Provence, Aix-en-Provence, 2002.
Selkirk, E. and J. Katz (in preparation) Phrasal stress and focus types. Ms. UMass Amherst and MIT.
Selkirk, Elisabeth and Tong Shen. “Prosodic domains in Shanghai Chinese.” In S. Inkelas and D. Zec
(eds.), The Phonology-Syntax Connection, pp. 313-338. Chicago, University of Chicago Press, 1990.
244 ELISABETH SELKIRK
Selkirk, E. and K. Tateishi. “Syntax and downstep in Japanese.” In C. Georgopoulos and R. Ishihara
(eds.), Interdisciplinary Approaches to Language. Essays in Honor of S.-Y. Kuroda. Dordrecht,
Kluwer, 1991.
Sugahara, M. Downtrends and Post-FOCUS Intonation in Tokyo Japanese. University of Massachusetts,
Amherst: Doctoral dissertation, in preparation.
Truckenbrodt, Hubert. Phonological Phrases: Their Relation to Syntax, Focus and Prominence. MIT:
Doctoral dissertation, 1995.
Truckenbrodt, Hubert. On the relation between syntactic phrases and phonological phrases. Linguistic
Inquiry 30 (1999): 219-255.
Vogel, Irene and István Kenesei. “Syntax and Semantics in Phonology.” In S. Inkelas and D. Zec (eds.),
The Phonology-Syntax Connection, pp. 339-364. Chicago, University of Chicago Press, 1990.
Zec, Draga. “Footed Tones and Tonal Feet: Rhythmic Constituency in a Pitch-Accent Language.”
Phonology 16 (1999): 225-264.
MARK STEEDMAN
INFORMATION-STRUCTURAL SEMANTICS
*
FOR ENGLISH INTONATION
1. INTRODUCTION
245
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 245–264.
© 2007 Springer.
246 MARK STEEDMAN
It is important to be clear from the start that the set of alternative utterances from
which the actual utterance is distinguished by the tune is in no sense the set of all
possible utterances appropriate to this context, a set which includes infinitely many
things like “Mind your own business,” “That was no finger,” “What are you talking
about?” and “Lovely weather we’re having.” Rather, the presupposed set of
(presumably, ten) alternative utterances is accommodated by the hearer in the sense
of Lewis (1979) and Thomason (1990), like any speaker presupposition that is not
actually inconsistent with their beliefs. This does not imply that such alternative sets
are confined to things that have been mentioned, or that they are mentally
enumerated by the participants—or indeed that they are even finite.
In terms of Halliday’s given/new distinction pitch-accents are markers of “new”
information, although the words that receive pitch-accents may have been recently
mentioned, and it might be better to call them markers of “not given” information.
That seems a little cumbersome, so I will use the term “kontrast” from Vallduv´ı and
Vilkuna 1998 for this property of English words bearing pitch-accents, spelling the
corresponding verb “k-contrast”.3
I’ll further attempt to argue that there are just two independent semantic binary-
valued dimensions along which the literal meanings of the various pitch-accent types
are further distinguished. The first of these dimensions has been identified in the
literature under various names, and distinguishes between what I’ll continue to call
“theme” and “rheme” components of the utterance, using these terms in the sense of
Bolinger (1958, 1961) rather than Halliday. Theme can be thought of informally as
the part of the sentence corresponding to a question or topic that is presupposed by
the speaker, and rheme is the part of the utterance that constitutes the speaker’s
novel contribution on that question or topic. However, it will become clear below
that the notion of theme differs from that of topic as defined by, for example, Gundel
(1974); Gundel and Fretheim (2001) in being speaker-defined rather than text-based.
A great deal of the huge and ramifying literature on information structure can be
summarized as distinguishing two dimensions corresponding to the given/kontrast
and theme/rheme distinctions, although the consensus has tended to be obscured by
the very different nomenclatures that have been applied. (See discussion by
Steedman and Kruijff-Korbayov´ a (2001), which summarizes the terminology and
its lines of descent, along with some contiguous semantic influences.)
However, there is a further dimension of discourse meaning along which the
pitch-accent types are distinguished which has not usually been identified in this
literature. It concerns whether or not the particular theme or rheme to hand is
mutually agreed–that is, uncontentious. This notion is related to various notions of
Mutual Belief or Common Ground proposed by Lewis (1969), Cohen (1978), Clark
and Marshall (1981) and Clark (1996).4
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 247
+ -
θ L+H* L*+H
ρ H*, (H*+L) L*, (H+L*)
At first glance, this proposal might appear to miss the point entirely. Where are
notions like “topic continuation” (Brown, Currie and Kenworthy 1980) and
“evaluation with respect to subsequent material” (Pierrehumbert and Hirschberg
1990), or the latter authors’ scales of commitment and belief? I’m going to argue
that many of the effects that have been associated with intonational tunes arise as
conversational implicatures from the interaction with context of literal meanings
made up of the above simple components. To consider this claim we need some
examples.
3. AN EXAMPLE: PITCH-ACCENTS
The first example commemorates Miles Davis’ response to Dave Brubeck’s question
concerning his reason for playing E ڸas the final note of In Your Own Sweet Way, in
6
place of E ڷas written by Brubeck:
theme rheme
The LH% boundary splits the utterance into two intonational phrases and two
information units. The L+H* accent marks the first of these units as theme (L*+H
would also be appropriate). It falls on the word you because its referent (Brubeck) is
the element that distinguishes this theme from the other themes that are available.
(Lambrecht and Michaelis (1998) in a related approach call such “marked” or
contrastive themes “ratified topics”. Ratification certainly presupposes some
alternative. However, the example to hand suggests that ratification is only one of
many things that you can do with a contrastive theme or topic.)
The set of available themes, which we will call the “Theme Alternative Set”
(ThAS) is pre-supposed by Davis and accommodated by Brubeck as including just
two possible themes. These can be thought of informally as “Why did/didn’t Davis
do x?” and “Why did/didn’t Brubeck do x?” More formally we can think of the
Theme Alternative Set as a set of l terms, which for this context is as follows, in
which ± stands for polarity:
(3)
λvp.λreason.cause′ reason(± do′ vp brubeck′)
λvp.λreason.cause′ reason(± do′ vp davis′)
(It’s assumed here that the fragment Why didn’t you is assigned a meaning which is
a function from VP interpretations to why-question interpretations—the latter being
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 249
In both cases, words whose interpretation distinguishes the intended theme from
the others—which is how “k-contrasted” or “not given” is defined in the present
system—bear pitch-accents, while those that do not contribute to the distinction—
which is how we define “background” or “given”—do not. (See Prevost and
Steedman 1994; Prevost 1995 for further detail on the determination of pitch-accent
placement in sentence generation.)
We do not need to think of the Theme Alternative Sets as closed under terms that
are already in play in the conversation. A more general representation of the ThAS
for (2) reminiscent of the “Structured Meaning” approach of Cresswell (1973, 1985)
and von Stechow (1981) can be obtained by abstracting over the element(s)
corresponding to accented words, thus:
Of course, themes including this one may not, and in fact usually do not, bear
any pitch-accent at all, as in:
Thus according to the present theory, as Halliday and Brown insisted, what is
“new”, “not given,” or k-contrasted vs. what is “given” or background is in part
determined by the speaker, not a property of a text or context alone (Brown
1983:67). By the same token, the notion of theme is also partly speaker-determined,
not text-based as is the notion of topic of Gundel (1974); Gundel and Fretheim
(2001).
Similar considerations govern the effect of the rheme-tune in (2) and (4). The H*
accent marks the second information unit as a rheme, and it falls on the word write
because it is the interpretation of that word that distinguishes this rheme from the
others that the context affords. This set of available rhemes, which we will call
the “Rheme Alternative Set” (RhAS) is, again presupposed/accommodated by the
participants to include only doing things to E\. In this particular case we can think of
the RhAS as being closed under the things that have actually been mentioned—that
is as
(10)
λx.play′ e′ x
λx.write′ e′ x
Again, we can again think of the RhAS more generally by abstracting over the
transitive predicate in structured-meaning style:
(11) λtv.λx. tv e′ x
We have so far passed over the role of the particular boundary tones in (2) and
(4). Earlier we identified this role as assigning responsibility for theme/rheme status
to either speaker or hearer. Thus the claim must be that in the above examples the
theme is marked by Miles Davis as Brubeck’s responsibility, whereas the rheme is
marked as his own. To see what this means, and to understand the implication of
table 1 that both are “agreed”, we must look more broadly at the function of the
boundary tones.
4. AN EXAMPLE: BOUNDARIES
Brown (1980:30) identifies the role of high boundaries as indicating that there is
more to come on the current topic from some participant. Pierrehumbert and
Hirschberg (1990:304-308), from whom the following example is adapted, make a
related claim concerning interpretation with respect to succeeding material (again,
this may come from either participant):
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 251
d. If you’re lucky,
L+H* LH%
Pierrehumbert and Hirschberg don’t actually specify the pitch-accent types for
this example, but L+H* seems appropriate for all accents except those in the last
clause — in fact, H* accents sound quite odd, for reasons we’ll come to. In present
terms, this means that the earlier clauses are all themes, and illustrates the fact that
multiple themes, and in fact isolated themes without any rheme, are all possible.
It is interesting to consider the effect of replacing the LH% boundaries by LL%
boundaries, retaining the L+H* accents. This manipulation does not affect the
coherence of the example very much. The main effect is to make the speaker’s
prescription seem somewhat abrupt and discouraging of any interruption, and to be
generally unconcerned with whether or not it is making any sense to the hearer. In
comparison, the original (12) seems more attentive, and to invite the hearer to take
control of the discourse if they want to.
I’m going to claim that in both cases the forward motion of the discourse is the
same, and is brought about, not by the inclusion of high boundaries as such, but by
the rheme-expectation stemming from the theme-marking L+H* pitch-accents. The
specific “kinder, gentler” effect of the version with LH% boundaries arises from
their primary meaning of marking hearer-commitment. By marking the themes as, in
the speaker’s view, the hearer’s responsibility (although in fact they may be
completely new to the hearer), the possibility of the latter taking control of the
discourse is maintained at every turn.
These claims are borne out by considering the effect of substituting H* rheme
accents for L+H* in both high- and low- boundary versions. With high boundaries,
the instructions become quite irritating, and seem to imply that the hearer knows all
this already. With low boundaries, the effect is again abrupt and not hearer-oriented.
In both cases, coherence (though inferable from world knowledge) is reduced.
I’m further going to claim that all the related effects of high boundaries, which
have been variously described in the descriptive literature as “other-directed”, “turn-
yielding”, “discourse-structuring,” or “continuation” are similarly indirect
implicatures that follow from the basic sense of high boundaries, which is to identify
the hearer as in the speaker’s view committed to the relevant information unit.
252 MARK STEEDMAN
The above four responses can be assumed to consist of a single rheme.7The ones
involving an L* pitch-accent mark the rheme as being not agreed. However, the
pitch-accent itself does not distinguish who the opposition is coming from. This is
not an ambiguity in the pitch-accent itself. Rather, the identification of the source of
the conflict and the entire illocutionary force of the response depends on inference
on the basis of what else is known about the participants’ beliefs. Thus, in (14), the
one who appears to doubt the proposition in the second utterance is the hearer, but in
(16) it is the speaker. In different contexts, the difference could be reversed or
eliminated.
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 253
At first encounter, it may appear that these tunes must mark rhemes, like those in
(13) to (16). However in Steedman 2000a, I show that these are in fact isolated
themes, of the kind we have already noticed in connection with example (12). These
isolated themes achieve the effect of a response (as well as various other
implicatures of impatience, diffidence, incompleteness, etc.) via the indirect speech
act of leaving the hearer to generate the rheme for themselves.
As before, the tunes involving L*+H accents imply disagreement or absence
from mutual belief. Once again, the source of the disagreement can only be
identified from the full discourse context. In the case of (19) and (20), it is important
to remember that the speaker’s LH% boundary means only that the speaker views
the hearer as committed to these themes. As far as the hearer is concerned, that is
not the same as an actual commitment. Thus the L*+H in (20) simply has the effect
of correctly excluding from the mutual belief set AGREED this theme which the
boundary marks as in H, in spite of the fact that can also be inferred to be in the
speaker’s own beliefs S. This is the possibility that was noticed in the discussion of
tables 1 and 2: it seems a fundamental property of the system that there is a
distinction between a proposition merely being in both S and H and it actually being
in AGREED. The former amounts to a claim by the speaker that both participants
ought to be committed to it. The latter is a claim by the speaker that both actually
are committed.
Example (20) is identical in information structural terms to the following
example, extensively discussed by Ward and Hirschberg (1985) (see Pierrehumbert
and Hirschberg 1990:295, (26)):
254 MARK STEEDMAN
In terms of the present theory, the response is an isolated theme, which achieves its
effect of contradiction by: a) claiming via an LH% boundary that the hearer is
committed to the proposition (even though in fact they may not be); b) claiming via
the L*+H pitch-accent that the theme is not (yet) mutually agreed (even though the
hearer may in fact believe its content already); and c) leaving the hearer to infer for
themselves on the basis of their world knowledge about badminton players the
implicated rheme, that Harry is not in fact a total klutz. The contradiction is
particularly effective, because a and b between them further implicate that H’s
original remark was pretty stupid, and thereby force the hearer to infer this intended
further conclusion for themselves, without the speaker needing to explicitly uttering
it. However, this effect of the utterance is an indirect speech-act or conversational
implicature, not part of the literal meaning of the words or the tones.
As an aside, it is striking that within the present theory, such conversational
implicatures can be analyzed solely in terms of knowledge and modality, without
appealing explicitly to notions of cooperation, flouting, or to speech-act types and
illocutionary force recognition. Many of the examples discussed by Grice (1975)
and Searle (1975) seem to be susceptible to similar knowledge-based analysis,
making Speech-act-theoretic analyses merely emergent, as in Steedman and
Johnson-Laird (1980) and Cohen and Levesque (1990).
For example, consider Grice’s famous analysis of the sarcastic or ironic
conversational implicature achieved by saying “You’re a fine friend!” in a situation
where the hearer has actually done the speaker a disservice. His analysis requires the
hearer to detect that the speaker has flouted a conversational maxim (Quality), to
assume that the speaker is still cooperating and therefore (by a step that is not quite
clear), to infer that the speaker must mean the opposite of what they said. It is
interesting however, to observe that one intonation contour with which such
sarcastic comments are characteristically uttered is the following:
(23)
( – fine′(friend′self′))
(+fine′(friend′self′))
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 255
At this point, the speaker has achieved their goal of making the hearer aware of
their own misdeed, and the indirect speech-act is complete, without any appeal to
cooperation, maxims, or rules explicitly associating maxim-violating utterances with
their negation. Indeed the effectiveness of the indirect accusation is greatly increased
by the fact that the speaker has, so to speak, got under the hearer’s guard, forcing
them into coming up with this thought for themselves, rather than stating it as a
speaker commitment, which the hearer might reject. We as linguists may identify
this as illocutionary uptake of an act of sarcasm, but the participants don’t need to
know about any of this.
Its interpretation is written as a l-term associated with the syntactic category by the
operator “:”. The transitive verb admires has the category of a function from (object)
noun phrases (which the forward slash identifies as on the right) into predicates or
intransitive verbs:
In this case the syntactic type is simply the SVO directional form of the semantic
type. (Juxtaposition of function and argument symbols in logical forms as in
admire′x indicates function application. A convention of left association holds,
according to which admire′xy is equivalent to (admire′x)y).
In other cases categories may “wrap” arguments into the logical form, as in the
analysis of Bach (1979, 1980), Dowty (1982), and Jacobson (1992). For example,
the following is the category of the English ditransitive verb showed, which reverses
the dominance/command relation of indirect and direct object x and y between
syntactic derivation and the logical form:8
(The reason for doing this is to capture at the level of logical form the binding theory
and its dependence on the c-command hierarchy in which subject outscopes direct
object, which outscopes indirect (dative) object, which outscopes more oblique
arguments—see Steedman 1996 for discussion).
256 MARK STEEDMAN
The syntactic operations of CCG by which such interpretations are assembled are
distinguished by being strictly type-dependent, rather than structure-dependent. For
present purposes they can be regarded as limited to operations of type-raising
(corresponding to the combinator T) and composition (corresponding to the
combinator B ).
Type-raising turns argument categories such as NP into functions over the
functions that take them as arguments, such as the verbs above, into the results of
such functions. Thus NPs like Harry can take on categories such as the following:
For example, the simple transitive sentence of English has two equally valid
surface constituent derivations, each yielding the same logical form:
In the first of these, Harry and admires compose as indicated by the annotation > B
to form a non-standard constituent of type S/NP. In the second, there is a more
traditional derivation involving a verb phrase of type S\NP. Both yield identical
logical forms, and both are legal surface or derivational constituent structures. More
complex sentences may have many semantically equivalent derivations, a fact whose
implications for processing are discussed in SP.
This theory has been applied to the linguistic analysis of coordination,
relativization, and intonational structure in English and many other languages. For
example, since substrings like Harry admires are now fully interpreted derivational
constituents, they can undergo coordination via the schematised rule (31), allowing a
movement- and deletion- free account of right node raising, as in (32):
The way that CCG derivation is made sensitive to the presence of tones is as
follows (adapted from Steedman 1999). The presence of a pitch-accent on a word
infects its whole category with themehood or rhemehood, via a pair of feature-values
θ=ρ and ±AGREE, the latter here abbreviated as superscript +/-. For example the
transitive verb admires bearing an H* pitch-accent has the following category:9
The feature r ensures that a verb so marked can only combine with arguments
that are compatible with rheme marking—that is, which do not bear the theme
marking feature value θ—and marks its result as rheme marked as well. The element
in the logical form corresponding to the accented word itself is marked for k-contrast
with the asterisk operator.
Boundaries, by contrast are not properties of words or phrases, but independent
string elements in their own right. They bear a category which, by mechanisms
parallel to those discussed in more detail in SP, “freezes” θ± /ρ± -marked
constituents as complete information-/intonation-structural units, making them
unable to combine further with anything except similarly complete prosodic units.
For example, the hearer-responsibility signaling LH% boundary bears the
following category:
S : admire′louise′harry
7. EMPIRICAL ISSUES
The present paper has laid a considerable burden of meaning on the distinction
between pitch-accent types, and in particular that between H* and L+H*, which
according to the present theory are respectively the most frequent rheme accent and
theme accent. It might therefore appear to be an embarrassment that there is
controversy in the literature over the reality of this distinction.
Part of this controversy stems from the fact that trained ToBI annotators show
quite low inter-annotator reliability in drawing this particular distinction (John
Pitrelli, p.c.). When the characteristics of the actual pitch-accents annotated by them
as H* and L+H* are plotted in terms of objective TILT parameters, there is very
considerable overlap between the two categories (Taylor 2000).
However, this seems to be a problem with the definitions of the relevant pitch
contours that are provided in the ToBI annotation conventions (Beckman and
Hirschberg 1999). The distinguishing characteristic of the L+H* accent is that the
rise to the pitch maximum is late, typically beginning no earlier than onset of the
vowel in the accented syllable. H* accents typically begin to rise earlier, in many
cases much earlier. The definition of L+H* in the manual as “a high peak target on
260 MARK STEEDMAN
the accented syllable which is immediately preceded by relatively sharp rise from a
valley in the lowest part of the speaker’s pitch range” does not make this entirely
clear. Indeed it is likely that the distinction can only be drawn reliably if syllable
boundary alignment is taken into account, and this information is not provided in the
ToBI annotation system.
It is also important to recall in using ToBI-annotated material that the manual
explicitly instructs the annotator to use H* as the “default” accent type, explicitly
instancing L+H* accents as examples that when in doubt should be annotated as
H*.10
These characteristics of the ToBI annotation scheme mean that, useful though it
is for other purposes, extreme caution has to be exercised in drawing strong
conclusions concerning the reality of the H*/L+H* distinction from ToBI annotated
corpora. In particular, while Taylors conclusion that the H*/L+H* distinction as
drawn in the annotation to the relevant section of the Boston News Corpus is not
phonetically real, it does not follow that the pitch-accent types themselves are not
distinct.
It is similarly unsafe to assess the present claim that L+H* is distinctively
associated with theme by applying text-based criteria for identifying topics in free
text such as those proposed by Gundel (1988). The only definition of a theme that is
possible under the present proposal is in terms of contextually established or
accommodating alternative sets. While the definitions in Steedman 2000a would
allow restricted contexts to be manipulated to control the available alternatives, and
allow the predictions concerning tune to be tested, identifying themes in free
discourse is not easy, because of the pervading involvement of accommodation and
inference inhuman discourse. For example, as Hedberg notes in her paper in the
present volume, some of the L+H* accents which she finds not to be associated with
topics in Gundel’s sense would be classified as isolated themes in the terms of the
present theory (see Hedberg and Sosa 2001, note3; Hedberg and Sosa 2002).11
8. CONCLUSION
The system proposed here reduces the literal meaning of the tones to just three
semantically grounded binary oppositions. Crucially, it grammaticalizes a distinction
between the beliefs that the speaker claims by their utterance that the speaker is
committed to, and those that the hearer actually is committed to. It is only the latter
set that includes Mutual Beliefs. It is therefore consistent for the speaker to claim
and/or implicate that both they and the hearer are committed to a proposition, but
that it is not mutually believed. This is a move in the present theory that is forced by
examples like (21) and the minimal pairs in (13)-(20).
The theory places a correspondingly greater emphasis on the role of speaker-
presupposition (and its dual, hearer-accommodation, and by inference and
implicature. To that extent, the present theory follows the tradition of Halliday and
Brown, in claiming that it is the speaker who, within the constraints imposed by the
context and the participants’ beliefs and intentions, determines what is theme and
rheme, and what contrasts they embody, and not the text.
University of Edinburgh
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 261
9. NOTES
*
Thanks to Betina Braun, Daniel Büring, Klaus von Heusinger, Stephen Isard, Alex Lascarides, and
Bonnie Webber for comments on the draft. An earlier version of some parts of the paper appears as
Steedman (2002). The work was supported in part by EPSRC grants GR/M96889 and GR/R02450, and
EU FET grant MAGICSTER and EU IST grant PACO-PLUS.
1
The term “pitch-accent” is here restricted to what Ladd (1996) calls “primary” pitch-accents, sometimes
called “nuclear” pitch accents (although there may be more than one in a sentence). Ladd follows
Bolinger and many others in distinguishing primary accents from certain other accents that arise from the
interaction of lexical stress with metrical the metrical grid. While there is still no objective measure to
distinguish the two varieties, it is the primary accents that are perceived as emphatic or contrastive.
2
The notation for tunes is Pierrehumbert’s, see Pierrehumbert and Hirschberg 1990 for details including
characteristic pitch-contours.
3
In Steedman 2000a and earlier work I called this property “focus”, following the “narrow” sense of
Selkirk (1984). However this term invites confusion with the “broad” sense intended by Hajiþová and
Sgall (1988) and Vallduví (1990), which is closer to the term “rheme” as used in the present system, and
in Steedman 2000a and Vallduví and Vilkuna1998.
4
Hobbs (1990), who proposes a very different revision of Pierrehumbert and Hirschberg (1990) to the
present one, also gives a central role to Mutual Belief.
5
In Steedman 2000a, I called this dimension “ownership”.
6
The story comes from Dave Brubeck. Miles was of course absolutely right. The tones shown in the
example remain conjectural, however, given his complete lack of any trackable F0 .
7
Under the proposal in Steedman 2000a, they could also be analyzed as an unmarked theme “I’m” and a
rheme “a millionaire”. In this particular context it makes very little difference, and we’ll ignore these
readings.
8
The present analysis differs from that of Bach and colleagues in making Wrap a lexical combinatory
operation, rather than a syntactic combinatory rule. One advantage of this analysis, which is discussed
further in Steedman 1996, is that phenomena depending on Wrap, such as anaphor binding and control,
are immediately predicted to be bounded phenomena.
9
Number agreement is suppressed in the interests of reducing formal clutter.
10
“Implicit in our discussion of the five pitch-accents is the notion that H* is the ‘default’ accent type. So,
if there is any uncertainty about how low the F0 is before the peak, as in some cases of possible L+H*
near the beginning of an utterance, the transcriber should mark ‘H*’ rather than ‘L+H*’.” (Beckman and
Hirschberg 1999).
11
Similarly, the fact that non-native speakers often obliterate pitch-accent type distinctions, and yet
manage to be understood, should no more lead one to conclude that the distinctions are not real than does
the possibility of written communication.
10. REFERENCES
Bach, Emmon. “Control in Montague Grammar.” Linguistic Inquiry 10 (1979): 513–531.
Bach, Emmon. “In Defense of Passive.” Linguistics and Philosophy 3 (1980): 297–341.
Beckman, Mary, and Julia Hirschberg. “The ToBI Annotation Conventions.” Manuscript, URL
http://ling.ohio-state.edu/ tobi/ame tobi/annotation conventions.html. Ohio State University, 1999.
Bolinger, Dwight. “A Theory of Pitch Accent in English.” Word 14 (1958): 109–149. Reprinted in
Bolinger (1965), pp. 17-56.
Bolinger, Dwight. “Contrastive Accent and Contrastive Stress.” Language 37 (1961): 83–96. Reprinted in
Bolinger (1965), pp. 101-117.
Bolinger, Dwight. Forms of English. Cambridge, Mass.: Harvard University Press, 1965.
262 MARK STEEDMAN
Brown, Gillian. “Prosodic Structure and the Given/New Distinction.” In Anne Cutler, D. Robert Ladd,
and Gillian Brown (eds.), Prosody: Models and Measurements, pp. 67–77. Berlin: Springer-Verlag,
1983.
Brown, Gillian, Karen Currie, and Joanne Kenworthy. Questions of Intonation. London: Croom Helm,
1980.
Büring, Daniel. “The Great Scope Inversion Conspiracy.” Linguistics and Philosophy 20 (1997a): 175–
194.
Büring, Daniel. The Meaning of Topic and Focus: The 59th Street Bridge Accent. London: Routledge,
1997b.
Clark, Herbert. Using Language. Cambridge: Cambridge University Press, 1996.
Clark, Herbert, and Catherine Marshall. “Definite Reference and Mutual Knowledge.” In Aravind Joshi,
Bonnie Webber, and Ivan Sag (eds.), Elements of Discourse Understanding, pp. 10–63. Cambridge:
Cambridge University Press, 1981.
Cohen, Philip. On Knowing What to Say: Planning Speech Acts. University of Toronto: Doctoral
dissertation, 1978.
Cohen, Philip and Hector Levesque. “Rational Interaction as the Basis for Communication.” In Philip
Cohen, Jerry Morgan, and Martha Pollack (eds.), Intentions in Communication, pp. 221–255.
Cambridge, Mass.: MIT Press, 1990.
Cresswell, M.J. Logics and Languages. London: Methuen, 1973.
Cresswell, M.J. Structured Meanings. Cambridge, Mass.: MIT Press, 1985.
Dowty, David. “Grammatical Relations and Montague Grammar.” In Pauline Jacobson and Geoffrey K.
Pullum (eds.), The Nature of Syntactic Representation, pp. 79–130. Dordrecht: Reidel, 1982.
Grice, Herbert. “Logic and Conversation.” In Peter Cole and Jerry Morgan (eds.), Speech Acts, vol. 3 of
Syntax and Semantics, 41–58. New York: Seminar Press, 1975 [Written in 1967].
Gundel, Janet. The Role of Topic and Comment in Linguistic Theory. University of Texas, Austin:
Doctoral dissertation, 1974.
Gundel, Janet. “Universals of Topic-Comment Structure.” In Michael Hammond, Edith Moravcsik, and
Jessica Wirth (eds.), Syntactic Universals and Typology, pp. 209–242. Amsterdam: John Benjamins,
1988.
Gundel, Janet, and Torsten Fretheim. “Topic and Focus.” In Laurence Horn and Gregory Ward (eds.),
Handbook of Pragmatic Theory. Oxford: Blackwell, 2001.
Gunlogson, Christine. True to Form: Rising and Falling Declaratives in English. University of California
at Santa Cruz: Doctoral dissertation, 2001.
Gunlogson, Christine. “Declarative Questions.” In Brendan Jackson (ed.), Proceedings of Semantics and
Linguistics Theory XII, pp. 144–163. Ithaca, NY: Cornell University. 2002.
Gussenhoven, Carlos. On the Grammar and Semantics of Sentence Accent. Dordrecht: Foris, 1983.
Hajiþová, Eva and Petr Sgall. “Topic and Focus of a Sentence and the Patterning of a Text.” In Jánös
Petöfi (ed.), Text and Discourse Constitution, pp. 70–96. Berlin: de Gruyter, 1988.
Halliday, Michael. “The Tones of English.” Archivum Linguisticum 15 (1963): 1.
Halliday, Michael. Intonation and Grammar in British English. The Hague: Mouton, 1967a.
Halliday, Michael. “Notes on Transitivity and Theme in English, Part II.” Journal of Linguistics 3
(1967b): 199–244.
Hedberg, Nancy and Juan Sosa. “The Prosodic Structure of Topic and Focus in Spontaneous English
Dialogue.” This volume.
Hedberg, Nancy and Juan Sosa. “The Prosody of Questions in Natural Discourse.” In Proceedings of
Speech Prosody, Aix en Provence, Aptil. To appear.
Hirschberg, Julia and Janet Pierrehumbert. “Intonational Structuring of Discourse.” In Proceedings of the
24th Annual Meeting of the Association for Computational Linguistics, New York, pp. 136–144. San
Francisco, CA: Morgan Kaufmann, 1986.
Hobbs, Jerry. “The Pierrehumbert-Hirschberg Theory of Intonational Meaning Made Simple: Comments
on Pierrehumbert and Hirschberg.” In Philip Cohen, Jerry Morgan, and Martha Pollack (eds.),
Intentions in Communication, pp. 313–323. Cambridge, Mass.: MIT Press, 1990.
Jacobson, Pauline. “Flexible Categorial Grammars: Questions and Prospects.” In Robert Levine (ed.),
Formal Grammar, pp. 129–167. Oxford: Oxford University Press, 1992.
Karttunen, Lauri, and Stanley Peters. “Conventional Implicature.” In Choon-Kyu Oh and David Dinneen
(eds.), Syntax and Semantics 11: Presupposition, pp. 1–56. New York: Academic Press, 1979.
INFORMATION-STRUCTURAL SEMANTICS FOR ENGLISH INTONATION 263
Kartunnen, Lauri. “Discourse Referents.” In J. McCawley (ed.), Syntax and Semantics, vol. 7, pp. 363–
385. New York: Academic Press, 1976.
Ladd, D. Robert. Intonational Phonology. Cambridge: Cambridge University Press, 1996.
Lambrecht, Knud, and Laura Michaelis.“Sentence Accent in Information Questions: Default and
Projection.” Linguistics and Philosophy (1998): 477–544.
Lewis, David. Convention: a Philosophical Study. Cambridge Mass.: Harvard University Press, 1969.
Lewis, David. “Scorekeeping in a Language Game.” Journal of Philosophical Logic 8 (1979): 339–359.
Montague, Richard. Formal Philosophy: Papers of Richard Montague. Richmond H. Thomason (ed.).
New Haven, CT: Yale University Press, 1974.
Pierrehumbert, Janet. The Phonology and Phonetics of English Intonation. MIT: Doctoral dissertation,
1980.
Pierrehumbert, Janet, and Mary Beckman. Japanese Tone Structure. Cambridge, Mass.: MIT Press, 1988.
Pierrehumbert, Janet, and Julia Hirschberg. “The Meaning of Intonational Contours in the Interpretation
of Discourse.” In Philip Cohen, Jerry Morgan, and Martha Pollack (eds.), Intentions in
Communication, pp. 271–312. Cambridge, Mass.: MIT Press, 1990.
Prevost, Scott. A Semantics of Contrast and Information Structure for Specifying Intonation in Spoken
Language Generation. University of Pennsylvania: Doctoral dissertation, 1995.
Prevost, Scott and Mark Steedman. “Specifying Intonation from Context for Speech Synthesis.” Speech
Communication 15 (1994): 139–153.
Rooth, Mats. (1985). Association with Focus. University of Massachusetts, Amherst: Doctoral
dissertation.
Rooth, Mats. “A Theory of Focus Interpretation.” Natural Language Semantics 1 (1992): 75–116.
Searle, John. “Indirect Speech Acts.” In Peter Cole and Jerry Morgan (eds), Speech Acts, vol. 3 of Syntax
and Semantics, pp. 59–82. New York: Seminar Press, 1975.
Selkirk, Elisabeth. Phonology and Syntax. Cambridge, Mass.: MIT Press, 1984.
Silverman, Kim, Mary Beckman, John Pitrelli, Marie Ostendorf, Colin Wightman, Patti Price, Janet
Pierrehumbert, and Julia Hirschberg. “ToBI: A Standard for Labeling English Prosody.” In
Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta, pp.
867–870. Edmonton: University of Alberta, 1992.
Steedman, Mark. “Structure and Intonation.” Language 67 (1991): 262–296.
Steedman, Mark. Surface Structure and Interpretation. Cambridge, Mass.: MIT Press, 1996.
Steedman, Mark. “Connectionist Sentence Processing in Perspective.” Cognitive Science 23 (1999): 615–
634.
Steedman, Mark. “Information Structure and the Syntax-Phonology Interface.” Linguistic Inquiry 34
(2002a): 649–689.
Steedman, Mark. The Syntactic Process. Cambridge, Mass.: MIT Press, 2000b.
Steedman, Mark. “Towards a Compositional Semantics for English Intonation.” Manuscript, URL
http://www.cogsci.ed.ac.uk/~steedman/papers.html. University of Edinburgh, 2002.
Steedman, Mark, and Philip Johnson-Laird. “Utterances, Sentences, and Speech-Acts: Have Computers
Anything to say?” In Brian Butterworth (ed.), Language Production 1: Speech and Talk, pp. 111–
141. London: Academic Press, 1980.
Steedman, Mark, and Ivana Kruijff-Korbayová. “Two Dimensions of Information Structure in Relation to
Discourse Semantics and Discourse Structure.” Journal of Logic, Language, and Information,
Introduction to the Special Issue on Information Structure, Discourse Semantics, and Discourse
Structure, to appear.
Stone, Matthew. Modality in Dialogue: Planning Pragmatics and Computation. University of
Pennsylvania: Doctoral dissertation, 1998.
Taylor, Paul. “Analysis and Synthesis of Intonation Using the Tilt Model.” Journal of the Acoustical
Society of America 107 (2000): 1697–1714.
Thomason, Richmond. “Accomodation, Meaning, and Implicature.” In Philip Cohen, Jerry Morgan, and
Martha Pollack (eds.), Intentions in Communication, pp. 325–363. Cambridge, Mass.: MIT Press,
1990.
Vallduví, Enric. The Information Component. University of Pennsylvania: Doctoral dissertation, 1990.
Vallduví, Enric, and Maria Vilkuna. “On Rheme and Kontrast.” In Peter Culicover and Louise McNally
(eds.), Syntax and Semantics, Vol. 29: The Limits of Syntax, pp. 79–108. San Diego, CA: Academic
Press, 1998.
264 MARK STEEDMAN
Von Stechow, Arnim. “Topic, Focus and Local Relevance.” In Wolfgang Klein and Willem Levelt (eds.),
Crossing the Boundaries in Linguistics, pp. 95–130. Dordrecht: Reidel, 1981.
Ward, Gregory, and Julia Hirschberg. “Implicating Uncertainty: the Pragmatics of Fall-Rise Intonation.”
Language 61 (1985): 747–776.
KLAUS VON HEUSINGER
1. INTRODUCTION
Theories that relate discourse structure and intonational structure often concentrate
on the discourse functions of pitch accents and boundary tones. Intonational
phrasing, however, is less prominently investigated. T his paper focuses on
intonational phrasing and its contribution to the construction of a discourse
representation. I argue that intonational phrasing determines minimal discourse units
which serve as the building blocks in a discourse representation. Even though
minimal discourse units often correspond to syntactic constituents, sometimes they
cross constituent boundaries. The problem can be illustrated by the very first
sentence from the novel Das Parfum by Patrick Süskind, in (1).
H* !H* H* L% H*
| | | | |
(1) [Im achtzehnten Jahrhundert | lebte in Frankreich] [ein Mann, |
‘In the eighteenth century lived in France a man
H* H* !H*
| | |
der zu den genialsten | und abscheulichsten Gestalten dieser an
who was one of the most gifted and abominable personages
(H*) (H*) !H* H* !H* L%
| | | | | |
genialen und abscheulichen Gestalten nicht armen Epoche gehörte.]
in an era that knew no lack of gifted and abominable personages.’
We analyzed a read version of the novel with respect to intonational clues. The
novel was professionally read by the artist Gert Westphal in 1995. The text was
analyzed and intonationally segmented by Braunschweiler et al. (1988ff) in a project
on spoken text in Konstanz. Parts of the text were then labeled for the following
intonational properties: pitch accents (H*, L* or bitonal versions of it), boundary
tones (H%, L%), and intonational phrasing (intonational phrases “[...]”, and
intermediate phrases: “|...|”). We checked part of the labeling with Jennifer
Fitzpatrick.1
(1) is phrased into two intonational phrases, and both further into intermediate
phrases. The length of the different phrases differs quite remarkably. For example,
the second intonational phrase consists of the three intermediate phrases | ein Mann |
der zu den genialsten | und abscheulichsten Gestalten dieser an genialen und
265
C. Lee et al. Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation, 265–290.
© 2007 Springer.
266 KLAUS VON HEUSINGER
2. DISCOURSE STRUCTURE
Discourse structure is a cover term for different properties of a coherent text or
discourse. In the following I focus on (i) reference and anaphora, (ii) information
structure (topic-comment, or focus-background), and (iii) discourse relations between
different discourse units. There are different families of theories treating discourse
structure, each of which focuses on a different aspect. Discourse Representation Theory
(Kamp 1981, Kamp & Reyle 1993) concentrates on representing the conditions for
anaphoric reference. The discourse is incrementally (re)constructed. There is in
principle no difference between parts of sentences and whole sentences since the
construction algorithm does not recognize a special category of sentences (even though
such a category is determined by the syntactic categories of the input). A second family
of approaches (Klein & von Stutterheim 1987, Hobbs 1990, van Kuppevelt
1995, Roberts 1996, Büring 1997, 2003) understands a discourse structure as
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 267
(2b) {t,u,x | 18th cent(t) & France(u) & Man(x) & live(x,u,t)}
extension of the predicates that are ascribed to the discourse referents. For example,
the DRS (2a) or (2b) is true just in case that f(t) is in the 18th century, f(u) is in
France, f(x) is a man and f(x) lives in f(u) at f(t).
The sequence or conjunction of two sentences as in (3) receives a DRS
incrementally. We start with the already established DRS for the first conjunct in
(2a), and build the new DRS (3b) by inserting the new discourse referents for the
pronoun er and the NP Jean-Baptiste Grenouille, and a condition for the predicate
hieß. The anaphoric link of the pronoun is graphically represented as y = ?,
indicating that the reference of the pronoun is still unresolved. The discourse
referent which stands for an anaphoric expression must be identified with another
accessible discourse referent in the universe. In the given context, y is identified
with x, as in (3c). This mini-discourse is true if there is an embedding function f onto
a model such that f(t) is in the 18th century, f(u) is in France, f(x) is a man, f(x) lives
in f(u) at f(t), f(y) = f(x), f(z) is Jean-Baptiste Grenouille, and f(y) was named f(z).
(3) Im achtzehnten Jahrhundert lebte in Frankreich ein Mann. Er hieß
Jean-Baptiste Grenouille.
‘In the eighteen century France there lived a man. His name was Jean-
Baptiste Grenouille.’
t, u, x, y, z t, u, x, y, z
18th cent(t) 18th cent(t)
t, u, x France(u) France(u)
(3a) (3b) Man(x) (3c) Man(x)
18th cent(t)
France(u) live(x,u,t) live(x,u,t)
Man(x) y=? y =x
live(x,u,t) z = J.B. Grenouille z = J.B. Grenouille
name(y,z) name(y,z)
The new discourse referent introduced by the pronoun must be linked with an
already established and accessible discourse referent. DRT defines accessibility in
terms of structural relations, i.e. the discourse referent must be in the same (or in a
higher) DRS. With this concept of accessibility, the contrast between (4) and (5) can
be described by the difference in the set of discourse referents that are accessible for
the discourse referent v of the pronoun er in (4) and (5). The construction rule for
the negation in (4) creates an embedded discourse universe with the discourse
referent u and the conditions scent(u) and x gave u to the world. The anaphoric
pronoun er in the third (hypothetical) sentence cannot find a suitable discourse
referent since it has no access to the embedded discourse universe with the only
fitting discourse referent u. In (5a), however, the pronoun er in the second sentence
is represented by the discourse referent v and the condition v = ?. This referent can
be linked to the accessible discourse referent x, licensing the anaphoric link.
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 269
(4) So ein Zeck war das Kind Grenouille. An die Welt gab es nichts ab
(...) nicht einmal einen Duft1 . (04-061) #Er1 war stark.
‘The young Grenouille was such a tick. He gave the world nothing (...)
not even his own scent. #It was strong.’
x, y, z, v
Tick(x)
young Gr(y) x is y
z=x
(4a) u
not
scent(u)
z gave u to
the world
v = ? strong(v)
(5) Ein anderes Parfum aus seinem Arsenal war ein mitleiderregender
Duft1 , der sich bei Frauen mittleren und höheren Alters bewährte.
Er1 roch nach dünner Milch und sauberem weichem Holz. (38-015)
‘Another perfume in his arsenal was a scent for arousing sympathy
that proved effective with middle-aged and elderly women. It smelled
of watery milk and fresh soft wood.’
x, y, v
scent for arousing sympathy
that proved effective with
middle–aged and elderly women(x)
Another perfume in
(5a) his arsenal (y)
x is y
v=x
v smelled of watery
milk and fresh soft wood
In general, theories assume that one unit is linked to the established discourse, while
the other is said to express the new information in the sentence. Because of space
limitations, I cannot present a full survey of the different approaches and a
general criticism (see von Heusinger 2004). I only want to stress the point that
information structure is often understood as a sentence structure and not as part of a
discourse structure. Therefore, it is not included in discourse representation theories.
(8) “Was ist das?” sagte Terrier und beugte sich über den Korb und
schnupperte daran, denn er vermutete Eßbares. (02-002)
‘“What‘s that?” asked Terrier, bending down over the basket and
sniffing at it, in the hope that it was something edible.’
Continuation
(8a)
in the hope that
Conjunction it was something
edible.
bending down sniffing
over
the basket at it
(9) Technische Einzelheiten waren ihm sehr zuwider, denn Einzelheiten
bedeuteten immer Schwierigkeiten, und Schwierigkeiten bedeuteten
eine Störung seiner Gemütsruhe, und das konnte er gar nicht
vertragen. (02-015)
‘He despised technical details, because details meant difficulties, and
difficulties meant ruffling his composure, and he simply would not put
up with that.’
Causation
He despised
technical Elaboration
details,
(9a) because
details Elaboration
meant
difficulties
and difficulties meant and he simply would
ruffling his composure not put up with that.
Only Asher (1993, 2004) combines insights from DRT and discourse relation in his
theory of segmented DRT (= SDRT), which is not confined to the incremental
composition of DRSs, but also captures discourse relations between the sentences in
the discourse. He revises the classical DRT of Kamp (1981) and Kamp & Reyle
(1993). The classical version describes the dynamic meaning of words or phrases
with respect to a discourse structure. There is, however, no means to compare the
dynamic potential of a full sentence with the discourse so far established. Asher
(1993, 256) notes that
the notion of semantic updating in the original DRT fragment of Kamp (1981) (...) is
extremely simple, except for the procedures for resolving pronouns and temporal
elements, which the original theory did not spell out. To build a DRS for the discourse
as a whole and thus to determine its truth conditions, one simply adds the DRS
constructed for each constituent sentence to what one already had. (...) This procedure is
hopelessly inadequate, if one wants to build a theory of discourse structure and
discourse segmentation.
(8) “Was ist das?” sagte Terrier und beugte sich über den Korb und
schnupperte daran, denn er vermutete Eßbares. (02-002)
(8b)
x, y, z, ... u, p
Cont What(u) = p
basket(x)
Terrier(y) u =
.....
Cont v
v=?
Terrier(v)
asked(v ,p)
Causation
(8c) x, y, z,u, p, v
Cont w Conj k Caus l
basket(x) y bending y sniffing in the hope
Terrier(y) down at k that l was
..... over the k =? something
What(u) = p basket(w) edible
u =x v=y w=? l =?
Terrier(v)
asked(v,p)
To summarize this very short presentation of DRT, the discourse structure of DRT
provides not only a new structure but also introduces new semantic objects:
discourse referents, conditions, and discourse domains (“boxes”). DRT explains
semantic categories such as definiteness and anaphora in terms of interaction
between these representations. Furthermore, the extension to SDRT allows us to
express discourse relations between whole propositions, as well. These new tools,
objects, and representations form the basis for a new semantic analysis of
information structure. In the next section, this approach is sketched briefly.
3. INTONATIONAL STRUCTURE
Intonation contours are represented by phonologists as a sequence of abstract tones
consisting of pitch accents and two types of boundary tones. Pierrehumbert &
Hirschberg (1990, 308) assign discourse functions to the particular tones: “Pitch
accents convey information about the status of discourse referents (...). Phrase
accents [= boundary tones of intermediate phrases] convey information about the
relatedness of intermediate phrases (...). Boundary tones convey information about
274 KLAUS VON HEUSINGER
the directionality of interpretation for the current intonational phrase (...).” The
status of discourse referents can be accounted for in terms of given vs. new; the
boundary tones of intonational phrases indicate how the proposition expressed by
the whole phrase is integrated into the discourse. Similarly, boundary tones of
intermediate (or phonological) phrases that correspond to a full proposition indicate
the way these propositions are interpreted with respect to the linguistic context, as
illustrated in (10) and (11). While in (10), the L-boundary tone indicates that the two
clauses have no relation to each other, the H-boundary tone in (11) indicates that the
first clause is related to the second, suggesting a discourse relation of causation.
L L L%
| | |
(10) [(George ate chicken soup) | (and got sick) ]
H L L%
| | |
(11) [(George ate chicken soup) | (and got sick)]
However, in this view there is no way of treating phrases that correspond to units
below the clause level, such as the modification im achtzehnten Jahrhundert (‘in the
eighteenth century’), the unsaturated phrase lebte in Frankreich (‘lived in France’)
or the first part of the complex noun der zu den genialsten (‘one of the most gifted’)
in example (1), repeated as (12).
(12) [Im achtzehnten Jahrhundert | lebte in Frankreich] [ein Mann, | der zu
den genialsten | und abscheulichsten Gestalten dieser an genialen und
abscheulichen Gestalten nicht armen Epoche gehörte.]
All these phrases can constitute intermediate phrases in German. Even though
English and many other languages mark their intermediate phrases by boundary
tones, in German there is no evidence for boundary tones for intermediate phrases
(Féry 1993, 59-79). Evidence for intermediate phrases in German must be taken
from other criteria. I argue on the basis of discourse structure and discourse relations
that intonational phrasing (intonational and intermediate phrases) can sufficiently be
defined by its function in building a discourse structure. Before I give a
characterization of intonational phrasing for intonational phrases and intermediate
phrases, I first present some approaches to the functions of pitch accents and
boundary tones.
1984, Ladd 1996). This can be illustrated by (13) and (14). (13) is the first sentence
of the novel and introduces the time, the place and the person by phrases marked
with a H* pitch accent. (14) is the first sentence of the second chapter. The wet
nurse Jeanne Bussie was already introduced in the first chapter; so the L* indicates
that she is discourse-old.
H* !H* H* L% H*
| | | | |
(13) [Im achtzehnten Jahrhundert | lebte in Frankreich] [ein Mann,
In the eighteenth century lived in France a man
L* H% H* L* LH* H%
| | | | | |
(14) [Einige Wochen später] [stand die Amme | Jeanne Bussie] ...(02-001)
Few weeks later stood the wet nurse Jeanne Bussie
The pitch accent can also indicate contrast between two referents or unexpected
relations between two referents, as illustrated in the often quoted example (15) and a
sentence from our novel (16):
(15) First HE called HIM a Republican and then HE offended HIM.
(16) Grenouille folgte ihm, mit bänglich pochendem Herzen, denn er ahnte,
daß nicht ER DEM DUFT folgte, sondern daß DER DUFT IHN
gefangengenommen hatte und nun unwiderstehlich zu sich zog. (08-
036)
“Grenouille followed it, his fearful heart pounding, for he suspected
that it was not he who followed the scent, but the scent that had
captured him and was drawing him irresistibly to it.”
that takes the theme as an argument to yield the utterance. Steedman now defines the
syntactic function of the pitch accent L+H* as a theme that lacks a boundary tone,
i.e. as a function that needs a boundary tone to yield a theme. Analogously, the pitch
accent H* indicates a function that needs a boundary tone in order to yield a rheme.
Thus in the description of tones, Steedman assumes the boundary tones and the
whole tune as the primary units, while the pitch accents define the informational
status as theme or rheme (cf. Hayes & Lahiri 1991 for a similar approach with
respect to sentence type).
(18) Categorial functions of tones for English (Steedman 1991)
a LH% boundary tone simple argument
b LL% boundary tone simple argument
c L+H* pitch accent function from boundary tone into theme
d H* pitch accent function from boundary tones into rheme
e L+H*LH% contour simple argument: theme
f H* LL% contour function from themes into utterance
Steedman uses the terms theme and rheme as well as given and new. The first pair
can be defined with respect to the sentence under analysis. Yet the second pair can
only be defined by the discourse in which the sentence is embedded.
Even though the tones and their functions are different for German, the
following example from our novel may illustrate Steedman’s analysis. The first
phrase ends with a H% boundary tone representing the theme (with the global
contour of L*H%, cf. (18e)), while the second intonational phrase ends with L%
expressing the rheme (with the global contour ...H*L%, cf. (18f)).
L* H%
| |
(19) [Zu der Zeit, von der wir reden,] [herrschte in den Städten
‘In the period of which we speak, there reigned in the cities
H*L H* !H* L%
| | | |
ein für uns moderne Menschen | kaum vorstellbarer Gestank.]
to us modern men and women a stench barely conceivable’
However, not all sentences can be divided into one theme and one rheme, as in (20):
L* H% H* L* LH* H%
| | | | | |
(20a) [Einige Wochen später] [stand die Amme | Jeanne Bussie]
‘Few weeks later stood the wet nurse Jeanne Bussie
H* H%
| |
b [mit einem Henkelkorb in der Hand]
with a market basket in the hand
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 277
L* H* H%
| | |
c [vor der Pforte des Klosters von Saint-Merri]
at the gate of the cloister of Saint-Merri
H* !H* !H* L%
| | | |
d [und sagte dem öffnenden Pater Terrier,]
and said to the opening Father Terrier’
The first four intonational phrases end with an H% boundary tone, and only the last
phrase with an L% boundary tone. This is difficult to explain in terms of a view of
information structure that is sentence bound. In such a view we must assume several
themes before we get to the rheme, and the final sentence. The example suggests
that the boundary tones indicate the relation of the phrase to the already established
discourse on the one hand, and to the subsequent discourse on the other.
Pierrehumbert & Hirschberg (1990, 308) assign the following discourse functions to
the particular tones:
Pitch accents convey information about the status of discourse referents, modifiers,
predicates, and relationships specified by accented lexical items. Phrase accents convey
information about the relatedness of intermediate phrases–in particular, whether (the
propositional content of) one intermediate phrase is to form part of a larger
interpretative unit with another. Boundary tones convey information about the
directionality of interpretation for the current intonational phrase–whether it is
“forward-looking” or not.
Summarizing, pitch accents may indicate the discourse status of their respective
discourse referents. They can also form the nucleus of an informational unit, as in
‘
Steedman s approach, which is, however, limited to the sentence. Pierrehumbert &
Hirschberg define the function of boundary tones with respect to the relations
between clauses. However, they can only deal with phrases that are associated with
propositions. None of these approaches accounts for the discourse function of
subclausal units. Before I develop such an approach in section 5, I give a sketch of
the description of intonational phrasing in the next section.
4.1 Phrasing
The term intonational phrase (IP) is usually applied to spans of the utterance which
are delimited by boundary tones: “Like other researchers, we will take the melody
‘
for an intonational phrase to be the tune whose internal makeup is to be described.
‚
As a rule of thumb, an intonational phrase boundary (transcribed here as %) can be
taken to occur where there is a non-hesitation pause or where a pause could be
felicitously inserted without perturbing the pitch contour” (Pierrehumbert 1980, 19).
In (24) from Selkirk (1995, 566), there are three intonational phrases, such that the
relative clause corresponds to one, while each part of the matrix sentence to the right
and to the left constitutes one. In (25) from the novel Das Parfum (02-125), one
intonational phrase marks the direct speech, while the two others are associated with
the two conjuncts of the assertion. The second conjunct is further divided into two
intermediate phrases.
(24) H% H% L%
| ‘ | |
[Fred,]IP [who s a volunteer fireman,]IP [teaches third grade]IP
(25) H* L% L* H% (L*) H* L%
| | “ | | | | |
[“Na? ] [bellte Terrier] [und knipste ungeduldig | an seinen Fingernägeln.]
‘
‘“Well?” barked Terrier, clicking his fingernails impatiently.
The terms in which we can define an intonational phrase are not very clearly
understood. There are phonetic, syntactic and semantic criteria for forming an
intonational phrase:
280 KLAUS VON HEUSINGER
The conflict between different criteria can be illustrated with the first sentence of
our novel (1), repeated as (27).
(27) H* !H* H* L% H*
| | | | |
[Im achtzehnten Jahrhundert |580 lebte in Frankreich]300[ein Mann,|590
In the eighteenth century lived in France a man
The subscript indicates the duration of the pauses, which is shorter between the two
intonational phrases than inside either of them. We rather assume the boundary tone
as a robust criterion for an intonational phrase. Unfortunately, German does not
show boundary tones for intermediate phrases (Féry 1993, 59-79). They can,
however, be detected by other criteria such as pauses, lengthening of the final
syllable and a pitch accent for each intermediate phrase. I argue that the discourse
function of the intermediate phrase is one of the most reliable criteria.
There are very short and very long intonational phrases, which means that the
phrases do not depend on length. They rather depend on their appropriateness for
building a coherent discourse. A discourse is coherent if at least the following two
requirements are met: (i) anaphoric relations can be established; (ii) discourse
relations hold between the discourse units, as argued in section 4.4.
‘.
Any text in spoken English is organized into what may be called information units
‚
(...) this is not determined (...) by constituent structure. Rather could it be said that the
distribution of information specifies a distinct structure on a different plane. (...)
‘
‚
Information structure is realized phonologically by tonality , the distribution of the text
into tone groups.
The utterance is divided into different tone groups, which are roughly equivalent to
intermediate phrases. These phrases exhibit an internal structure. Analogously,
Halliday assumes two structural aspects of information structure: the informational
partition of the utterance, and the internal organization of each informational unit.
He calls the former aspect the thematic structure (theme-rheme), and the latter
aspect is treated under the title givenness. The thematic structure organizes the linear
ordering of the informational units, which corresponds to the Praguian view of
theme-rheme (or topic-comment, or topic-focus, see section 2.2). The theme refers
to that informational unit that comprises the object the utterance is about, while the
rheme refers to what is said about it. Halliday (1967, 212) assumes that the theme
always precedes the rheme. Thus theme-rheme are closely connected with word
order, theme being used as a name for the first noun group in the sentence, and
theme for the following: “The theme is what is being talked about, the point of
departure for the clause as a message; and the speaker has within certain limits the
option of selecting any element in the clause as thematic.”
The second aspect refers to the internal structure of an informational unit, where
elements are marked with respect to their discourse anchoring. Halliday (1967, 202)
writes: “At the same time the information unit is the point of origin for further
options regarding the status of its components: for the selection of point of
information focus which indicates what new information is being contributed.”
Halliday calls the center of informativeness of an information unit information
focus. The information focus contains new material that is not already available in
the discourse. The remainder of the intonational unit consists of given material, i.e.
material that is available in the discourse or in the shared knowledge of the discourse
participants. Halliday (1967, 202) illustrates the interaction of the two systems of
organization with the following example (using bold type to indicate information
focus; // to indicate phrasing). Sentence (28a) contrasts with (28b) only in the
placement of the information focus in the second phrase. The phrasing, and thus the
thematic structure, is the same. On the other hand, (28a) contrasts with (28c) in
phrasing, but not in the placement of the information focus. However, since the
information focus is defined with respect to the information unit, the effect of the
information focus is different.
(28)a //Mary//always goes to town on Sundays.//
b //Mary//always goes to town on Sundays.//
c //Mary always goes to //town on Sundays.//
Halliday does not connect the sentence perspective with the discourse perspective,
even though he makes some vague comments on it:
‘
The difference can perhaps be best summarized by the observation that, while given
‘ ‚
‘ ‘
means what you were talking about (or what I was talking about before ), theme
‚ ‚ ‚
282 KLAUS VON HEUSINGER
‘ ‘
‚ ‚
means what I am talking about (or what I am talking about now ); and, as any student
of rhetoric knows, the two do not necessarily coincide. (Halliday 1967, 212)
Selkirk (1984, 286ff) defines the correlation between intonational phrase and the
sense unit in (29), and in (30) she determines the sense unit as a complex of
constituents that stand either in a modifier-head or argument-head relation:
(29) The Sense Unit Condition on intonational phrasing
The immediate constituents of an intonational phrase must together
form a sense unit.
(30) Two constituents Ci, Cj form a sense unit if (a) or (b) is true of the
semantic interpretation of the sentence:
(a) Ci modifies Cj (a head)
(b) Ci is an argument of Cj (a head)
This can be illustrated with (31). The first intermediate phrase im achtzehnten
Jahrhundert modifies the head lebte in Frankreich, and the argument ein Mann...is
an argument of the complex predicate im achtzehnten Jahrhundert lebte in
Frankreich.
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 283
(31)
licensed by (30b)
Head Argument
| |
licensed by (30a) [ein Mann, ...
Modifier Head
| |
[Im achtzehnten Jahrhundert | lebte in Frankreich]
Selkirk herself (1984, 295f) notes that the Sense Unit Condition is very closely
related to argument structure, so it does not cover cases where material is preposed
or in nonrestrictive modifiers such as nonrestrictive relative clauses. The latter is a
typical instance of backgrounding, which expresses a discourse relation rather than
an argument-head relation, as illustrated by (32):
(32) [und sagte dem öffnenden Pater Terrier,] [einem etwa fünfzigjährigen
| kahlköpfigen, | leicht nach Essig riechenden Mönch:] [“Da!” ]
‘... and the minute they were opened by Father Terrier, a bald
monk of about fifty, with a faint odour of vinegar about him, she
said “There!”’
While the background information about the Father Terrier is “embedded” into an
independent intonational phrase, this phrase itself is divided into three intermediate
phrases that each give one characteristic property of the person. Thus, it is not the
argument structure that triggers the intonational phrasing, but rather the discourse
relation of backgrounding.
(32a)
Backgrounding
[und sagte
dem öffnenden Enumeration
Pater Terrier,]
We can assign different discourse relations to the discourse units associated with the
intonational phrasing. A discourse unit is defined by its appropriateness to serve as
an argument in a discourse relation, rather than by its content or some other intrinsic
property. This means that we can only define discourse units by defining discourse
relations that operate on them.
‘
What s that? asked Terrier Causation
(33a)
in the hope that
Conjunction it was something
edible.
bending down sniffing
over
the basket at it
The relation between the first two sentences can be described by Continuation,
while the relation between the last clauses are Causation. Approaches to discourse
or text structure that use these kind of discourse relations are fairly widespread (e.g.
Mann & Thompson 1987, 1988 for Rhetorical Structure Theory (RST) or Asher
1993, 2004 for segmented DRT).
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 285
None of these approaches allow for subclausal discourse units and relations between
them. However, we have seen in the last sections that intonational phrasing often
corresponds to subclausal units. We have also said that discourse units are defined
by the relations they establish. If we assume subclausal discourse units we must also
define discourse relations that hold between them. In the following I discuss five
discourse relations: (i) non-restrictive modification, (ii) backgrounding, (iii)
enumeration, (iv) topicalization, and (v) frame-setting. While the first four are
discussed in the literature, the relation of frame-setting is new.
(34) [ein Mann | der zu den genialsten | und abscheulichsten Gestalten ....
gehörte]
„
a man who was one of the most gifted and abominable personages”
y
t, u, x y =x
(34a)
18th cent(t) non – y ∈ most gifted personages
France(u) restr.
Man(x) Mod y
live(x,u,t) y =x
y ∈ most abnominable personages
5.2 Backgrounding
In the example (35) below, a more general type of backgrounding can be found.
Actually, there are even two levels of backgrounding: First the phrase in contrast to
the names of other gifted abominations and second the actual names. The discourse
relation of backgrounding relates these discourse units directly to the already
established main DRS — there is no need to wait for the interpretation of the actual
sentence. This is informally represented in (35a).
(35) [Er hieß | Jean-Baptiste Grenouille,] [und wenn sein Name]
His name was Jean-Baptiste Grenouille, and if his name –
286 KLAUS VON HEUSINGER
l, m
t, u, x, y, z
name of l(m) l =x
18th cent(t)
France(u) ?
(35a) Man(x) in contrast to the names of
live(x,u,t) other gifted abominations
y=x
z = J.B. Grenouille a, b, c, d
name(y,z) de Sade(a), Saint–Just(b),
Fouché(c), Bonaparte(d)
5.3 Enumeration
A classical case of independent units is enumeration, which is here illustrated by
(36). The intonational phrasing suggests that the discourse structure is constructed
via independent representations for each predicate NP with goat‘s milk, with pap,
and with beet juice, as given in (36a).
(36) [Jetzt könnt Ihr ihn selber weiterfüttern]
‘Now can you him yourselves feed
[mit Ziegenmilch, | mit Brei, | mit Rübensaft.]
with goat‘s milk, with pap, with beet juice.’
goat's milk(z)
x, y
(36a)
feed(x,y,z)
y =a pap(z)
x = you
beet juice(z)
(37) H* H* H* H* H*
| | | | |
[sie hatte doch schon Dutzende | genährt, | gepflegt, | geschaukelt, | geküsst...|
after all she had fed, tended, cradled and kissed dozens of them...
(37a)
x, y
nurse(x) dozen-babies(y)
5.4 Topicalization
Topicalization or thematization is one of the central concepts of the functional
sentence perspective of the Prague School, which was later adapted by Halliday and
‘
others (see section 4.2). Steedman s analysis of the thematic structure of a sentence
focuses exactly on this aspect (see section 2.2 for discussion). The fragment (38)
(02-126) illustrates this. The theme-rheme or the topic-comment establishes a
functor-argument structure on a sentence that is independent from the grammatical
relations. Since this issue is repeatedly discussed, I will continue to the next
subclausal discourse relation.
(38)[also an den Füßen zum Beispiel|da riechen sie wie ein glatter | warmer |Stein]
Their feet for instance, they smell like a smooth warm stone
[wie frische Butter riechen sie.] [Und am Körper] [riechen sie wie... ]
They smell like fresh butter. And their bodies smell like...’
5.5 Frame-setting
The discourse relation of “frame-setting” is illustrated by the first sentence of the
second chapter (14), repeated as (39). The phrase einige Wochen später cannot be
the topic, since the topic is the introduced person or the thing the sentence is about.
However, it stands in its own phrase. I therefore assume the discourse relation of
frame-setting. The phrase “sets the frame” for what there is to come. Here it shifts
the reference time. The phrase can be integrated into the already established
discourse before the rest of the sentence is interpreted, as illustrated in (39a) (see
Maienborn 2003 for a related concept with the same name):
288 KLAUS VON HEUSINGER
(39) [Einige Wochen später] [stand die Amme | Jeanne Bussie] ...(02-001)
‘Few weeks later stood the wet nurse Jeanne Bussie...’
x, y, z, t 1, ... t2
(39a) t2 = few weeks later as t1
wet nurse(x)
... u
... stood(u)
... u=x
the wet nurse Jeanne Bussie(u)
6. SUMMARY
The presented analysis associates intonational phrasing with discourse units. I have
‘
proposed an extension of Asher s SDRT with smaller discourse representations and
new relations between subclausal discourse representations. This allows us to assign
discourse functions to intonational phrases, including phrases that do not correspond
to entire clauses. Many more discourse relations must be defined, and I am
convinced they can be defined in terms of discourse construction rules.
Universität Stuttgart
7. NOTES
*
The paper is a revised version of a talk given at the Topic/Focus Workshop, at the UC Santa Barbara,
July 2001, and at the Linguistic Circle at the University of Edinburgh, October 2002. I would like to
thank the audiences for the comments. In particular I would like to thank Jennifer Fitzpatrick, Carlos
Gussenhoven, Bob Ladd, Aditi Lahiri, and Mark Steedman for discussion of earlier versions of this paper,
and Daniel Büring, Matthew Gordon, and Chungmin Lee for editing this volume and for the very helpful
and constructive review of this paper. The research was supported by a Heisenberg-Fellowship of the
German Science Foundation and by a research Grant (HE 2259/9-2).
1
An intonational phrase boundary always coincides with an intermediate phrase boundary, therefore we
shorten “[|...|...|]” to “[...|...]”. Even though English and many other languages mark their intermediate
phrases by boundary tones, in German it is very controversial if there is evidence for boundary tones for
intermediate phrases (Féry 1993, 59-79).
2
A short summary of the novel: “ In the slums of 18th-century Paris a baby is born and abandoned, passed
over to monks as a charity case. But the monks can find no one to care for the child—he is too
,
demanding, and he doesn t smell the way a baby should smell. In fact, he has no scent at all.
Jean-Baptiste Grenouille clings to life with an iron will, growing into a dark and sinister young man
who, although he has no scent of his own, possesses an incomparable sense of smell. Never having
known human kindness, Grenouille lives only to decipher the odors around him, the complex swirl of
smells—ashes and leather, rancid cheese and fresh-baked bread—that is Paris. He apprentices himself to
a perfumer, and quickly masters the ancient art of mixing flowers, herbs, and oils. Then one day he
catches a faint whiff of something so exquisite he is determined to capture it. Obsessed, Grenouille
follows the scent until he locates its source—a beautiful young virgin on the brink of womanhood. As his
demented quest to create the “ultimate perfume” leads him to murder, we are caught up in a rising storm
of terror until his final triumph explodes in all of its horrifying consequences.” (Short decription of the
English translation of the novel, Süskind 1987)
DISCOURSE STRUCTURE AND INTONATIONAL PHRASING 289
8. REFERENCES
Asher, Nicholas. Reference to Abstract Objects in Discourse. Dordrecht: Kluwer Academic, 1993.
Asher, Nicholas. “From Discourse Macro-Structure to Micro-Structure and Back Again: Discourse
Semantics and the Focus/Background Distinction.” In H. Kamp, and B. Partee (eds.), Context
Dependence in the Analysis of Linguistic Meaning. Amsterdam: Elsevier, 2004.
Braunschweiler, Norbert, Jennifer Fitzpatrick and Aditi Lahiri. The Konstanz Intonation Database:
German, Swiss German, American English, British English, East Bengali, West Bengali. University
of Konstanz, 1988ff.
Büring, Daniel. The 59th Street Bridge Accent. On the Meaning of Topic and Focus. London: Routledge,
1997.
Büring, Daniel. “On D-Trees, Beans, and B-Accents.” Linguistics and Philosophy 26 (2003): 511-545.
Féry, Caroline. German Intonational Patterns. Tübingen: Niemeyer, 1993.
Gussenhoven, Carlos. On the Grammar and Semantics of Sentence Accents. Dordrecht: Foris, 1984.
Halliday, Michael “Notes on transitivity and theme in English. Part 1 and 2.” Journal of Linguistics 3
(1967): 37-81, 199-244.
Hayes, Bruce, and Aditi Lahiri. “Bengali intonational phonology.” Natural Language and Linguistic
Theory 9 (1991): 47-96.
Heim, Irene. The Semantics of Definite and Indefinite Noun Phrases. University of Massachusetts,
Amherst. Ann Arbor: University Microfilms, 1982.
Hobbs, Jerry. “The Pierrehumbert-Hirschberg Theory of Intonational Meaning Made Simple. Comments
on Pierrehumbert and Hirschberg.” In P. R. Cohen, J. Morgan, and M. E. Pollack (eds.), Intentions in
Communication, 313-323. Cambridge, Mass.: MIT, 1990.
Kamp, Hans. “A theory of truth and semantic interpretation.” In J. Groenendijk, T. Janssen, and M.
Stokhof (eds.), Formal Methods in the Study of Language, pp. 277-322. Amsterdam: Amsterdam
Center, 1981.
Kamp, Hans, and Uwe Reyle. From Discourse to Logic. Introduction to Modeltheoretic Semantics of
Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht: Kluwer, 1993.
Klein, Wolfgang and Christiane von Stutterheim. “Quaestio und referentielle Bewegung in Erzählungen.”
Linguistische Berichte 109 (1987): 163-183.
Ladd, Robert. Intonational Phonology. Cambridge: Cambridge University Press, 1996.
Maienborn, Claudia. Die logische Form von Kopula-Sätzen. Berlin: Akademie Verlag, 2003.
Mann, William, and Sandra Thompson. “Rhetorical Structure Theory: Description and Construction of
Text Structures.” In G. Kempen (ed.), Natural Language Generation. New Results in Artificial
Intelligence, Psychology, and Linguistics, 85-95. Dordrecht: Nijhoff, 1987.
Mann, William, and Sandra Thompson. “Rhetorical Structure Theory: Towards a Functional Theory of
Text Organisation.” Text 8.3 (1988): 243-281.
Nespor, Marina, and Irene Vogel. Prosodic Phonology. Dordrecht: Foris, 1986.
Pierrehumbert, Janet. The Phonology and Phonetics of English Intonation. Ph.D. Dissertation.
Cambridge, Mass.: MIT, 1980.
Pierrehumbert, Janet, and Julia Hirschberg. “The Meaning of Intonational Contours in the Interpretation
of Discourse.” In P. R. Cohen, J. Morgan, and M. E. Pollack (eds.), Intentions in Communication,
pp. 271-311. Cambridge, Mass.: MIT, 1990.
Roberts, Craige. “Information Structure in Discourse. Towards an Integrated Formal Theory of
Pragmatics.” In J.-H. Yoon, and A. Kathol (eds.), Ohio State University [=OSU] Working Papers in
Linguistics. vol. 49, 91-136. Columbus, Ohio, 1996.
Selkirk, Elisabeth. Phonology and Syntax. The Relation between Sound and Structure. Cambridge, Mass.:
MIT, 1984.
Selkirk, Elisabeth. “Sentence Prosody: Intonation, Stress, and Phrasing.” In J. Goldsmith (ed.), The
Handbook of Phonological Theory, pp. 550-569. Oxford: Blackwell, 1995.
Sgall, Petr, Eva Hajičová, and Eva Benešová. Topic, Focus and Generative Semantics.
Kronberg/Taunus: Scriptor, 1973
Steedman, Mark. “Structure and Intonation.” Language 67 (1991): 260-296.
Steedman, Mark. The Syntactic Process. Cambridge, Mass.: MIT, 2000.
Süskind, Patrick. Das Parfum. Die Geschichte eines Mörders. Zürich: Diogenes, 1985.
Süskind, Patrick. Das Parfum. Die Geschichte eines Mörders. Gelesen von Gert Westphal. Hamburg:
Litraton, 1995.
290 KLAUS VON HEUSINGER
Süskind, Patrick. Perfume: The Story of a Murderer. Translated from the German by John E. Woods.
New York: Vintage Books, 2001.
Van Kuppevelt, Jan. “Discourse Structure, Topicality and Questioning. ” Linguistics 31 (1995): 109-147.
Von Heusinger, Klaus. “Focus particles, sentence meaning, and discourse structure.” In W. Abraham, and
A. ter Meulen, eds. The composition of Meaning. From Lexeme to Discourse, 167-193 Amsterdam:
Benjamins.
Studies in Linguistics and Philosophy
Volumes 1–26 formerly published under the Series Title: Synthese Language Library.
Studies in Linguistics and Philosophy
79. G. Grevendorf and G. Meggle (eds.): Speech. Acts, Mind, and Social Reality. Discussions with
R. Searle. 2002 ISBN 1-4020-0853-8; Pb: 1-4020-0861-9
80. G.-J.M. Kruijff and R.T. Oehrle (eds.): Resource-Sensitivity, Binding and Anaphora. 2003
ISBN 1-4020-1691-3; Pb: 1-4020-1692-1
81. R. Elugardo and R.J. Stainton (eds.): Ellipsis and Nonsentential Speech. 2005
ISBN 1-4020-2299-9; Pb: 1-4020-2300-6
82. C. Lee, M. Gordan and D. Bü ring (eds.): Topic and Focus : Cross-linguistic Perspectives on
Meaning and Intonation. 2006 ISBN 1-4020-4795-9
springer.com