The Complexities of Morphology
The Complexities of Morphology
The Complexities of
Morphology
Edited by
PETER ARKADIEV
and
FRANCESCO GARDANI
1
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© editorial matter and organization Peter Arkadiev and Francesco Gardani 2020
© the chapters their several authors 2020
The moral rights of the authors have been asserted
First Edition published in 2020
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2020932944
ISBN 978–0–19–886128–7
Printed and bound in Great Britain by
Clays Ltd, Elcograf S.p.A.
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Contents
vi
IV. DISCUSSION
13. Morphological complexity and the minimum description
length approach 331
Östen Dahl
References 345
Language Index 383
Subject Index 387
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Figures
2.1. Word types per inflection class across different granularities 43
2.2. Complexity measures across granularities of Russian nouns 44
2.3. Conditional entropy of real and a hundred Monte Carlo simulations of
Russian nouns across granularities 45
2.4. Effect of the irregularity of each layer on system complexity (entropy
difference) 48
3.1. Ackerman & Malouf (2015) mechanism for predicting unknown
inflectional forms 58
4.1. Traditional languages and Aboriginal communities of the Victoria River
District 87
4.2. Fixed and random effects used to measure the use vs. non-use of subject
marking in Gurindji Kriol 92
5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a
particular derivational relation R 108
5.2. Degrees of complexity in the restrictedness of stem X in the morphology of
lexeme L, where X serves as L’s base stem in a particular derivational
relation 109
7.1. Mean CC 1 standard deviation for three areal breakdowns and selected
families 177
7.2. Complexity x longitude 179
7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three
complexity counts 181
8.1. The language sample 198
8.2. Patterns of change in the language sample 207
9.1. Western Amazonian languages sampled 249
9.2. Kernel distribution of densities across the languages of this study 255
11.1. The meaning space of the experimental languages with the corresponding
sentences from an example generation 0 language 289
11.2. A schematic representation of the chains in the normal (a), temporarily
interrupted (b), and permanently interrupted (c) conditions 290
11.3. Change of the overspecification of agreement, as measured by
expressibility, over time 294
11.4. Relative frequency of the agreement marker which denoted the round
animal in the initial language of the chain 295
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Tables
1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’ 3
1.2. Sample paradigms of Lithuanian nouns 4
2.1. An example of morphosyntactically conditioned stress alternation in
Russian nouns 30
2.2. Illustration of the four-class system, based on inflectional suffixes 35
2.3. Illustration of stress classes of Russian nouns 37
2.4. Number of nominal inflection classes of Russian nouns as a function of
which paradigmatic layers are included 42
3.1. Warlpiri verb inflection classes 55
3.2. Examples of inflected classifier forms 62
3.3. Examples of classifier forms and their formative analyses 63
3.4. Inflectional exponence of na ‘(27)’ 64
3.5. Variably inflected classifier stem forms 68
3.6. Allomorphs selected by Ackerman & Malouf (2015) simplification
mechanism 69
3.7. Exponence probabilities of older and newer forms 70
3.8. Classifier stem paradigm for ma ‘(34)’ 73
3.9. Classifier stem paradigm for ɾa ‘(28)’ 74
4.1. Allomorphic reduction in subject marking in Gurindji Kriol 89
4.2. Comparison of case systems and allomorphy across three generations 89
4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according
to predictors 94
4.4. Output of generalized linear mixed model analysis on 3,575 tokens 95
4.5. Relative effect of the significant predictors according to dominance analysis 97
4.6. Occurrence of subject marking in child Gurindji Kriol speakers according
to predictors 98
4.7. Output of generalized linear mixed model analysis on 2,975 tokens 99
4.8. Relative effect of the significant predictors according to dominance analysis 100
5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013) 115
5.2. Comparison of and .3 forms in French with long and short
forms in Mauritian 117
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
5.3. Sample comparison of long and short forms in four French-based creoles 117
5.4. Stem space of ‘to form’, ‘to finish’, and ́ ‘to defend’ 119
5.5. Verb alternations in Mauritian 121
5.6. Reduplication in Mauritian 123
5.7. Deverbal nominalizations in Mauritian 124
5.8. Verb alternations in Guadeloupean 125
5.9. Deverbal nominalizations in Guadeloupean 129
5.10. Verb alternations in Haitian 132
5.11. Deverbal nominalizations in Haitian 133
5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean,
and Haitian 135
7.1. Gender unpredictability for some example languages 171
7.2. Areal and family breakdown 176
7.3. Complexity values for four historical groups of languages 180
8.1. Third person pronouns in standard Swedish 201
8.2. Clustering of patterns of change at language-family edges within Eurasia 208
8.3. Direction of change and asymmetries in the structure of the population
and/or prestige dynamics 213
9.1. Anderson’s (2015a) schematization of morphological complexity 233
9.2. Similar classifier forms in Guaporé-Mamoré languages (van der Voort
2005: 397) 239
9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593) 243
9.4. Number of morphemes coded in this study by language and functional
domain 250
9.5. Number of allomorphs per morpheme attested across the sample 251
9.6. Percentage of morphemes for each EC value across the languages sampled 254
9.7. Rank correlations between EC level and bound status values across
languages 258
9.8. Rank correlations between EC level and contiguity value across languages 259
9.9. Rank correlations between EC level and prosodic dependence across
languages 261
10.1. Wolof noun class markers 273
11.1. An example of a final language with a fully preserved agreement system 292
11.2. An example of a language with a fully lost agreement system 292
11.3. A language with a fully lost agreement system 296
11.4. A language with an irregular distribution of the agreement markers 297
13.1. Hypothetical noun inflection templates 339
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
List of Abbreviations
1 first person
2 second person
3 third person
A most agent-like or experiencer-like argument of transitive; A-class verb
ablative
abilitative
absolutive
accusative
ACLA Aboriginal Child Language (project)
grammatical agent
animate
. anaphoric pronoun
antipassive
appositional mood
applicative
. ‘article of noun’
aspect
associative
augmentative
auxiliary
BGW Bininj Gun-Wok; Gunwingguan, northern Australia
8 noun class 8 plural
causative
CAY Central Alaskan Yup’ik
CC canonical complexity
cislocative
classifier; class marker
completive
contrast
comitative
connector
conditional
continuative
contrastive
copula
direct case marker
declarative
definite; default (in Mansfield and Nordlinger, Chapter 3)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
demonstrative
desiderative
determiner
indexical marker
different event
diminutive
. direct experience evidential
discourse marker
discontinuitive
dual
duplicative
dynamic
E-class verb
APL applicative
EC enumerative complexity; exponence complexity
E-complexity enumerative complexity
ELAP Endangered Languages Project
ergative
evidential
eyewitness
feminine
factual
focus
Fr. French
vowel frontness
frustrative
future
G more goal-like argument of ditransitive
geminate
genitive
GLMM Generalized Linear Mixed Models
GYN Gbe languages, Yoruba, and Nupe
habitual (aktionsart)
high vowel height
hearsay
IALL iterated artificial language learning
I-complexity Integrative complexity
IC inflectional class; inventory complexity
IE Indo-European
intransitive inanimate verb
immediate
imperative
imperfective
inanimate
inchoative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
indicative
indefinite
infinitive
intentional
intransitive (subject orientation)
interactional
irrealis
joint agency
L lexeme
long form
linking particle
linker
locative
low vowel height
masculine
MDL Minimum Description Length
middle
middle marker
neuter
N noun
NC noun class
negation
non-feminine
non-future
nominative
non-eyewitness evidential
NP noun phrase
non-past
non-singular
nonvisual
object; object of monotransitive
object
oblique
optative
2 second position
passive
grammatical patient
paucal
PCFP Paradigm Cell Filling Problem
perfective
peripheral
plural
P.N. proper name
potential
POS parts of speech
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
possessive
Poss possessor
process verbalization
present
pronoun
progressive
proprietive
prothetic vowel
presentational
partitive
proximate
past
past irrealis
realis
recent
reciprocal
reduplication
referential focus
relative
remote
respect
reflexive
reportative
ɾ-alternation
S subject; sole argument of intransitive
same event
SD standard deviation
sequential
short form
singular
simultaneous
semelfactive
same subject
stative (aktionsart)
strong form
suppletive
SV subject-verb
T more theme-like argument of ditransitive
transitive animate verb
TAM tense/aspect/mood
topic advancing voice
temporal
thematic suffix
topic
transitive
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xv
translocative
UG Universal Grammar
V verb
venitive
locative verbalization
VN verb-noun
VS verb-subject
weak form
Y/N yes/no
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The Contributors
Peter Arkadiev holds a PhD in theoretical, typological, and comparative linguistics from
the Russian State University for the Humanities and a habilitation degree from the Russian
Academy of Sciences. Currently he is Senior Researcher at the Institute of Slavic Studies of
the Russian Academy of Sciences and Assistant Professor at the Russian State University
for the Humanities. His fields of interest include language typology and areal linguistics,
morphology, case and alignment systems, tense-aspect, Baltic and Northwest Caucasian
languages. He has co-edited Contemporary Approaches to Baltic Linguistics (with Axel
Holvoet and Björn Wiemer) and Borrowed Morphology (with Francesco Gardani and Nino
Amiridze, both published by De Gruyter Mouton in 2015).
Aleksandrs Berdicevskis is a researcher in computational linguistics at the University of
Gothenburg, Sweden. At the time of writing he was Assistant Professor at Uppsala
University. He has worked on experimental and quantitative approaches to language
change and evolution with a focus on Slavonic languages. He has also participated in the
development of TOROT (Tromsø Old Russian and Old Church Slavonic Treebank) and
related resources. In his PhD dissertation (University of Bergen) he investigated linguistic
innovations in Russian computer-mediated communication.
Felicity Meakins is ARC Future Fellow in Linguistics at the University of Queensland and
Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. She is a
field linguist who specializes in the documentation of Australian Indigenous languages in
the Victoria River District of the Northern Territory and the effect of English on
Indigenous languages. She has worked as a community linguist as well as an academic over
the past twenty years, facilitating language revitalization programmes, consulting on
Native Title claims, and conducting research into Indigenous languages. She has compiled
a number of dictionaries and grammars of traditional Indigenous languages, and has
written numerous papers on language change in Australia.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Jeff Parker is Assistant Professor of Linguistics at Brigham Young University. His research
centres around better understanding inflectional structure from different methodological
perspectives, including investigations into how language specific traits contribute to the
complexity of inflection class systems, how inflectional structure affects lexical access of
inflected forms, and how computational models of learning help explain typological
tendencies in inflection class systems. He has published in journals such as Morphology,
Word Structure, and The Mental Lexicon, as well as the Slavic-focused Slavic and East
European Journal. He is also co-editor of a forthcoming volume, Morphological Typology
and Linguistic Cognition (forthcoming, with Andrea D. Sims, Adam Ussishkin, and
Samantha Wray).
Arturs Semenuks is a PhD student in the Department of Cognitive Science at the
University of California, San Diego. He uses experimental and computational methods to
investigate what sociocognitive pressures affect the structure of language, especially its
morphological complexity, as well as what constraints exist on how language can be
structured in principle, and how language affects human thought. His previous work at the
University of Essex focused on the relationship between sentence processing costs and
acceptability judgements.
Andrea D. Sims is Associate Professor at The Ohio State University, jointly appointed in
the Department of Linguistics and Department of Slavic and East European Languages
and Cultures. Much of her research focuses on the internal organization of inflection class
systems (defectiveness and irregularity, syncretism, inflection class complexity) and factors
influencing its emergence, reinforcement, and generalization. She is author of a research
monograph, Inflectional Defectiveness (2015), co-author of a morphology textbook,
Understanding Morphology (2nd edn, 2010, with Martin Haspelmath), and co-editor of
Morphological Typology and Linguistic Cognition (forthcoming, with Adam Ussishkin, Jeff
Parker, and Samantha Wray).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xx
Sasha Wilmoth is a PhD candidate at the Centre of Excellence for the Dynamics of
Language at the University of Melbourne, Australia, working on intergenerational
variation and change in Pitjantjatjara. She completed her BA (Hons) degree at the
University of Melbourne. She was previously a Research Assistant at the University of
Queensland, and Linguistic Project Manager at Appen, a Sydney-based company which
provides specialized linguistic data and services for speech and language technologies. Her
research interests include morphology, syntax, and digital methods for language
documentation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1
Introduction
Complexities in morphology
Peter Arkadiev and Francesco Gardani
Peter Arkadiev and Francesco Gardani, Introduction: Complexities in morphology In: The Complexities of Morphology. Edited
by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Peter Arkadiev and Francesco Gardani.
DOI: 10.1093/oso/9780198861287.003.0001
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
philosophers of the early nineteenth century about the ‘complex’ classic Indo-
European languages as opposed to the ‘primitive’ languages of ‘uncivilized people’
to explicit statements that all languages are equally complex. The latter view,
which is known under the label of ‘equicomplexity hypothesis’, takes into account
obvious differences between languages in the mere degree of elaboration of
different structural subdomains (such as, e.g., vowels vs. consonants or nominal
vs. verbal morphology); it states that ‘these isolable properties may hang together
in such a way that the total complexity of a language is approximately the same for
all languages’ (Wells 1954: 104; see also Hockett 1958: 180). Such a position, which
is still commonly held by linguists of different backgrounds and theoretical
persuasions (see, again, Joseph & Newmeyer 2012: 348–9; and Miestamo 2017),
has been challenged by others, who have shown that ‘complexity in one area of
grammar [correlates] positively with complexity in another area’ (Sinnemäki
2014: 190).
With the development of contact linguistics and especially of pidgin and creole
studies in the second half of the twentieth century, claims started being made that
pidgins and creoles are structurally overall simpler than languages with a ‘regular’
sociolinguistic history (see, e.g., such work as Bickerton 1984; McWhorter 2001,
2005; Parkvall 2008; Bakker et al. 2011; Good 2012b, 2015), and, more generally, it
has been claimed that linguistic complexity is subject to diachronic change and the
effects of language contact (see Dahl 2004 and Trudgill 2011). As a matter of fact,
statements to the effect that sociolinguistic parameters such as the number of
speakers and degree of contact with other languages affect the complexity of
linguistic (sub)systems go back as early as Jakobson (1929) and Trudgill (1983).
Once it had been recognized that morphological complexity is a parameter of
crosslinguistic variation, the urge arose to develop non-impressionistic and cross-
linguistically applicable ways of measuring and quantifying the degree of mor-
phological complexity of individual languages. The most important proponent of
this line of thought is certainly Greenberg (1954), who developed a methodology
of quantitative measurement of different types of morphological structure, the
most famous of which is the ‘synthetic index’ (p. 185), that is, morpheme-to-
word¹ ratio in a sample of texts, which arranges languages into a continuum
spanning from radically isolating to polysynthetic. This simple metric, however, is
clearly insufficient for the assessment of morphological complexity, since morph-
ology is much more than mere arrangement of morphemes into words. As a
simple illustration, consider the case-number paradigms of Turkish (Lewis 2001:
28) and Lithuanian (P.A.’s own knowledge) nouns in Table 1.1.
Both Turkish and Lithuanian have two number and six case values, yielding
twelve word forms. However, while in Turkish case and number are expressed
3
Table 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’
² In this connection, Haspelmath (2009) has shown that parameters traditionally attributed to
‘flexion’, as opposed to ‘agglutination’, such as cumulation, stem allomorphy, and affix allomorphy,
are logically and empirically independent of each other.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 1.2. Sample paradigms of Lithuanian nouns
I hard ‘man’ () I soft ‘horse’ () II hard ‘day’ () II soft ‘bee’ () III hard ‘son’ () IV (soft) ‘night’ ()
I a.p. III a.p. IV a.p. II a.p. III a.p. IV a.p.
5
These problems constitute the main research questions of this volume, which
aims to tackle them in a principled way, by presenting a collection of original
research papers on different aspects of morphological complexity. This introduc-
tory chapter is meant to outline the field and take the reader through the volume,
and it is organized as follows: section 1.2 pursues the question of the scope of
‘morphological complexity’; section 1.3 surveys several conceptions and meth-
odological approaches to morphological complexity distinguishing between two
main types: formal approaches (section 1.3.1) and psycholinguistic approaches
(section 1.3.2). Section 1.4 presents the structure of the volume and summarizes
the contributions to it.
7
9
works already mentioned, while recently, Johanna Nichols (2009) has hinted at a
possible metric of morphological complexity related to non-canonicity (a proposal
she fully develops in Chapter 7, this volume). Most studies of non-canonical
phenomena in morphology have focused on the paradigmatic axis; however,
nothing per se precludes the application of this notion to syntagmatic phenomena,
such as combinatorics and mutual order of affixes (here comes to mind the
distinction between semantically driven layered organization of morphology vs.
opaque templatic morphology; see Stump 2006b, Good 2016), concatenative vs.
non-concatenative exponence, morphophonological transparency vs. opacity and
other issues belonging to the domain of morphotactics. It remains an empirical as
well as a conceptual question, though, which kind of morphotactic organization
should be considered ‘canonical’ and ‘less complex’. For instance, in languages
where affix order directly reflects semantics, it is usually possible to permutate
certain affixes depending on their mutual scope (Rice 2011; Mithun 2016);
whether such deviations from fixed ordering constitute additional complexity is
not at all obvious.
While teleologically different, also Natural Morphology (Dressler et al. 1987;
Dressler & Kilani-Schoch 2016; Dressler 2019) is centered on the idea of deviation
from a core.³ Aiming at accounting for morphological preferences based on
extralinguistic motivations, it theorizes a semiotically derived notion of natural-
ness, defined as the immediate, most unmarked, cognitively easiest, and thus
universally preferred option. Conversely, naturalness-defining criteria determine
deviation from the (most) natural option. This framework makes clear that other
factors come to play a role in the conception and interpretation of morphological
complexity, such as, for example, transparency vs. opacity of forms or morpho-
tactic rules. As Hengeveld & Leufkens (2018: 141) observe, ‘languages may be
complex, yet transparent, or simple, yet opaque’. To take the concrete case, the
Turkish vs. Lithuanian data in Table 1.1 show that Turkish morphology is more
complex in the sense that a single word form may potentially contain a high
number of morphemes. At the same time, however, it is transparent in that every
morpheme corresponds to one fixed meaning, while Lithuanian morphology is
more opaque. In the framework of Natural Morphology, Dressler (2011) views
unnaturalness as a source of complexity and morphological complexity as the sum
of all morphological categories, rules, and inflectional classes of a language,
including both productive and unproductive patterns. Distinguishing between
productive and unproductive patterns, he considers morphological complexity a
hyperonym of morphological richness, which is conceived only in terms of
productive patterns (Dressler 2003: 47; see also Dressler, Kononenko, et al.
³ Note that, while qualitatively oriented, both Natural Morphology and Canonical Typology are
implicitly able to quantify degrees of complexity, computing the degree of deviation from the natural
core or canon, respectively.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
11
2019). This distinction between active and static parts of morphology, is, in our
view, not only of crucial importance with respect to psycholinguistic approaches
to complexity but also foundational of approaches focused on predictability, as we
will see below.
Finally, information-theoretic approaches play down the role of combinatorics
and construe morphological complexity in terms of predictability and entropy.
Their development is intimately related to word-and-paradigm models of morph-
ology, which consider inflectional systems as networks of implicative relations
holding between fully-inflected word forms. Consequently, they aim to under-
stand to what extent the choice of exponence for a given cell is predictable from
any other information available to the speaker, with complexity being in an
obvious inverse relation to predictability (cf. Finkel & Stump 2007, 2009; Stump
& Finkel 2013). Ackerman & Malouf (2013) propose the term ‘integrative com-
plexity’, based on the notion of entropy as ‘a measure of the reliability of guessing
unknown forms on the basis of known ones’, that is, a measure of predictability.
They move from the intuition that ‘speakers must generalize beyond their direct
and limited experience of particular words’ (p.436) and posit a ‘Low Entropy
Conjecture’: morphological systems, such as paradigms, in which conditional
entropy among related word forms is low, are more efficient, as they ‘permit
these crucial inferences to be made easily’ (p. 436) (cf. ‘Paradigm Structure
Conditions’ of Wurzel 1989).⁴ In other words, complexity derives from opaque
intraparadigmatic relations, for opacity hampers the predictability and predictive-
ness among word forms in a lexeme’s paradigm. The ‘Low Entropy Conjecture’ is
supported by recent studies on inflection class systems clearly violating the
enumerative complexity-based constraints of the kind proposed by Carstairs-
McCarthy (see Baerman 2012, 2016; Sims 2015).⁵
The approaches to formal morphological complexity surveyed thus far share
the potential to seize the degree of complexity. However, some typological studies
have pursued the topic without a focus on metrics. One line of investigation, for
example, has concerned the relation of (certain aspects of) morphological com-
plexity to any other typological parameters such as phonological systems (Shosted
2006; Fenk-Oczlon & Fenk 2008, 2014), word order (e.g., Sinnemäki 2008; Bentz
& Christiansen 2013), among others. Other studies have focused on the differen-
tial elaboration of nominal and verbal morphology (e.g., Nichols 1986, 1992;
Mithun 1988; Kibrik 2012). In this domain, there are still more open questions
than established answers, partly because of the lack of consensus as regards the
⁴ Also morphomic stem distributions have been interpreted in terms of predictive relations by
Blevins (2016b: 123), a view partly criticized by Maiden (2018: 23–4).
⁵ It is likely that a conception of complexity based on entropy applies better to inflection than word
formation because inter-word relations are generally much more complex in inflectional than in
derivational paradigms.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
definition of the relevant aspects of complexity and the adequate ways of its
measurement.
Still another line of research is concerned with the relation between morpho-
logical complexity and sociolinguistic typology. In section 1.1, we already men-
tioned the idea that pidgins and creoles are in general less complex than languages
with a long history and uninterrupted transmission. More generally, in recent
work (e.g., Trudgill 1997, 2009, 2011, 2017; Kusters 2003, 2008; McWhorter 2007,
2008; Lupyan & Dale 2010; Bentz & Winter 2013; Bentz et al. 2015; Bentz 2016),
claims have been advanced that the overall degree of complexity as well as certain
particular types of grammatical complexity correlate with such socioecological
conditions of language use as high vs. low degree of contact, number of adult
learners, size and geographic expansion of the speaker population, and some
others (see also Tinits 2014 for a behavioural experiment with a miniature
artificial language). Significantly, most of such studies have focused on simplifi-
cation caused by language contact (see Dorian 1978; McWhorter 2001; among
many others), emphasizing that morphological complexity requires long-term
periods of socioecological stability to develop (Dahl 2004). Nevertheless, studies
exist showing that certain types of language contact (e.g., those involving stable
childhood multilingualism) can contribute to preserve complex patterns (Trudgill
2011; Mithun 2015) and even result in increase rather than loss of morphological
complexity due to borrowing and contact-induced grammaticalization (see
Vanhove 2001; Aikhenvald 2002, 2003a; de Groot 2008; Loporcaro 2018;
Loporcaro et al. forthcoming). Also processes of language genesis brought about
by language contact do not necessarily come along with morphological simplifi-
cation. In a study on the rapid birth of a new mixed language in Australia,
Gurindji Kriol, from the admixture of Gurindji and Kriol, Meakins et al. (2019)
demonstrate that there was no preferential adoption into Gurindji Kriol of less
complex variants and that, in fact, complex Kriol variants were more likely to be
adopted than simpler Gurindji equivalents. Given that Gurindji Kriol is the
primary language of the younger generation in the Gurindji community,
Meakins et al. interpret these results in light of the fact that the acquisition of
morphology in morphologically complex languages is less challenging for children
than for adults (cf. also Miestamo 2008). The issue of ease vs. difficulty of process-
ing in language acquisition leads us over to the second main type of morphological
complexity introduced in section 1.3, viz. psycholinguistic morphological
complexity.
As we have seen in the previous section, also Natural Morphology and Ackerman
& Malouf’s (2013) integrative complexity appeal to ease in processing and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
13
15
In section 1.1, we identified four issues we deem among the most urgent to solve in
research on morphological complexity. In order to tackle these issues in a prin-
cipled way, we convened a dedicated workshop ‘Morphological Complexity:
Empirical and Cross-Linguistic Approaches’ at the 48th Societas Linguistica
Europaea (SLE) meeting in Leiden in 2015. The present volume is a collection
of original research papers consisting in equal measure of papers delivered at the
workshop and of invited contributions. (Each chapter was subject to a threefold
reviewing process consisting of an anonymous external reviewing, a non-
anonymous internal review performed by a fellow contributor, and comments
by the editors.) The volume features: (a) various theoretical, methodological, and
typological perspectives on morphological complexity (from ‘classic’ morpho-
logical description to experimental and information-theoretic approaches); (b)
both detailed investigations of individual languages and wider crosslinguistic
studies; (c)synchronic and diachronic analyses; (d) a broad coverage of topics
including structural and sociolinguistic issues, such as the development of mor-
phological complexity under different sociohistorical conditions (prominently,
language contact); (e) empirical evidence drawn from languages from all contin-
ents and belonging to a number of typologically diverse language families.
Unfortunately, the volume does not cover the complexity of word formation
and the complexity of sign language morphology. We hope that future research
will take care of these issues.
The volume, introduced by the present chapter, consists of three parts organ-
ized according to the chapters’ main focus and scope, and is closed by a discussion
in Chapter 13 by Östen Dahl on the volume’s contributions and on the minimum
description length approach. Part I includes five chapters dealing with issues of
morphological complexity from a language-specific perspective. Jeff Parker and
Andrea Sims’s Chapter 2, ‘Irregularity, paradigmatic layers, and the complexity of
inflection class systems: A study of Russian nouns’ follow Stump & Finkel’s (2013:
55) definition of complexity of an inflection class system as ‘the extent to which
the system inhibits motivated inferences about a lexeme’s full paradigm of realized
cells [ . . . ]’. Using data from Russian, the authors explore the implications of
gradient (ir)regularity for measuring and comparing the complexity of inflection
class systems. They find that some, but not all, less regular inflectional patterns
significantly increase the complexity of the system, but that the increased com-
plexity is mitigated by structural and distributional properties of the inflectional
system. In Chapter 3, ‘Demorphologization and deepening complexity in
Murrinhpatha’, John Mansfield and Rachel Nordlinger investigate diachronic
changes in the complexity of verb inflection in Murrinhpatha, a polysynthetic
non-Pama-Nyungan language of northern Australia, which displays a high level of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
17
19
measures such as the one developed by Nichols (Chapter 7, this volume) are also
possible, but they are not holistic, either, and in many cases are based on a
significant reduction of empirical data.
In conclusion (question 4), we wanted to investigate the role of such extra-
morphological factors as diachronic development and (in)stability, susceptibility
to loss vs. spread in situations of language contact, and, generally, of sociolinguis-
tic and socioecological parameters, in affecting morphological complexity. As
several chapters in this volume have demonstrated, in spite of at times diverging
results, the study of the correlation between morphological complexity and
extralinguistic factors such as the role of language contact or speakers’ sociolin-
guistic attitudes, is fruitful and promising.
Of course, the answers we have provided here are per force partial and by far not
definitive, as much more case studies and comparative evidence are necessary to
get to a reliable picture of such complex phenomena as morphological complex-
ities. We hope that future research will pursue these pathways.
Acknowledgements
The volume’s editors wish to thank the authors, the external reviewers, and our editors
at OUP. The support of the Swiss National Science Foundation (SNF CRSII1_160739)
is gratefully acknowledged. Besides that, we thank Aleksandrs Berdicevskis, Wolfgang
Dressler, Michele Loporcaro, and Franz Rainer for their insightful comments on a
preliminary version of this introductory chapter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
I
THE LANGUAGE-SPECIFIC
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
2
Irregularity, paradigmatic layers, and the
complexity of inflection class systems
A study of Russian nouns
Jeff Parker and Andrea D. Sims
2.1 Introduction
¹ As another example, even PARSLI (PARadigm Shape and Lexicon Interface), which is designed
to explicitly represent non-canonical inflectional properties like stem change, defectiveness, overabund-
Jeff Parker and Andrea D. Sims, Irregularity, paradigmatic layers, and the complexity of inflection class systems:
A study of Russian nouns In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Jeff Parker and Andrea D. Sims.
DOI: 10.1093/oso/9780198861287.003.0002
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
In this chapter, we explore the role that irregularity and non-affixal exponence
play in the complexity of inflection class systems.² Recent typological studies of
inflection class complexity have focused on the implicative structuring of inflec-
tion classes and the extent to which this structure is informative about the
exponence of inflected forms (Ackerman et al. 2009; Ackerman & Malouf 2013;
Blevins et al. 2017; Bonami & Beniamine 2015; Sims 2015; Sims & Parker 2016;
Stump & Finkel 2013). This is reflected in the way that Stump & Finkel define the
complexity of an inflection class system as ‘the extent to which the system inhibits
motivated inferences about a lexeme’s full paradigm of realized cells from subsets
of its cells’ (Stump & Finkel 2013: 55; emphasis ours). Throughout this chapter we
will assume a similar definition; see (1).
(1) Complexity of an inflection class system: the average extent to which the
system inhibits motivated inferences about the realized form of a lexeme,
given one or more other realized forms of the same lexeme.
ance, etc., does not include non-segmental information like stress as a possible deviation from canonicity
(Walther 2017).
² Since inflection classes are an example of a purely morphological phenomenon, that is, not
syntactically relevant, this type of complexity seems to avoid the problematic questions about the
division between morphology and syntax (see discussion in Arkadiev & Gardani, Chapter 1, this
volume).
³ For a distinct but somewhat related notion, see the discussion of ‘relative’ and ‘absolute’ measures
of complexity in Miestamo (2008) inter alia. Miestamo’s discussion of relative approaches focuses on
psycholinguistic and acquisition-oriented approaches/evidence. While our information-theoretic
measures are not psycholinguistic in nature, they (and their use in previous work, for example,
Ackerman et al. 2009) could be classified as relative in terms of their focus on the potential ‘cost and
difficulty to language users’ (Miestamo 2008: 24). (See also discussion in Arkadiev & Gardani,
Chapter 1, this volume; Dahl, Chapter 13, this volume.)
⁴ See section 2.5 for some justification of this claim, and for defining inflection class complexity in
terms of the predictability of individual forms, rather than the lexeme’s class membership (i.e., its entire
paradigm of forms).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
inflection class), with two other descriptions that split the burden of explanation
between the inflection class system and lexical specification. As they observe
(p. 42), it makes little sense to evaluate an inflectional system based only on the
morphological description and not what is lexically specified, since a morpho-
logical description can always be made simpler by positing more lexical specifi-
cation. They thus evaluate the analyses in terms of description length, including
both the morphological description and lexically specified information. Equating
the degree of complexity of the system with the length of its description, they show
that the complexity of the different analyses differs significantly; a description with
twenty classes and up to twelve lexically specified suppletive stems for some
lexemes results in the shortest length.⁵ The point here is that degree of complexity
is a property of a particular description of French verbs.⁶ This makes it particularly
important to examine and justify the description itself.
Stump & Finkel (2015) make a similar point along a different dimension of
description. They contrast two potential representations of the same set of English
verbs, one based on acoustics alone (what they call ‘hearer-oriented’) and one
based on structure known to a speaker that does not surface in the production of
forms (‘speaker-oriented’). For example, the exponence of the past participle(s) of
and are identical in a hearer-oriented representation, that is, /εnt/, but
a speaker knows that they contain different structure, that is, /εn-t/ vs. / εnd-t/.
Stump & Finkel show that the two representations exhibit differences in their
complexity based on various information-theoretic and set-theoretic measures.
(See also Bonami 2013 for similar issues with French verbs.)
Mansfield & Nordlinger (Chapter 3, this volume) also draw attention to how
systems are represented. Investigating Murinhpatha (non-Pama-Nyungan,
Northern Australia), they show that speakers have made analogical changes to
the verbal system which, surprisingly, do not lead to greater predictability among
allomorphs. They suggest that using existing measures of conditional entropy to
calculate the complexity of the system would be misrepresentative because verbs
in the language are a closed class with largely idiosyncratic exponence. The
exponence for the verbs is made up of intersecting formatives that are partially
⁵ See also Goldsmith (2001, 2011) for arguments for description length-based evaluation metrics in
morphological analysis.
⁶ In employing an evaluation metric based on description length, Sagot & Walther (2011) argue that
descriptions of shorter length (i.e., of less complexity in their sense) are more adequate. However, it is
not obvious to us that for a given inflectional system, the description with the shortest description
length should be taken to be the most adequate one. This is a question of the evaluation metric. For
instance, see Derwing (1990) for arguments against evaluation metrics based on economy of storage
(incl. minimum description length) and for metrics based on economy of processing speed and Dahl
(Chapter 13, this volume) for discussion on the relationship between Minimum Description Length
and other notions/metrics of complexity. It is not a foregone conclusion that a description that is most
cognitively realistic will be the description with the lowest estimated complexity in terms of either
description length or the implicative notion outlined in (1) above. This is a question for investigation,
but beyond the scope of the present work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
It has long been known that high type frequency inflection classes create ana-
logical pressure on irregular patterns. When irregular patterns resist regulariza-
tion, the most common argument for their persistence despite analogical pressure
is that they are lexically stored, leaving them relatively impervious to regulariza-
tion. The typically high token frequency of such lexemes also makes lexical
specification psycholinguistically plausible. This and other evidence of lexical
storage is sometimes taken as a basis for treating irregulars as falling outside of
the grammatical system—in this case, the inflectional system.
⁷ Cotterell et al.’s work was presented at the Society for Computation in Linguistics just as we were
completing final revisions to this chapter, so did not have the opportunity to apply their joint entropy
metric to our data, nor to explore whether it produces estimates of system complexity that are less
dependent on the particular descriptive analysis that is made of an inflectional system. However, we see
this as a promising avenue for investigation.
⁸ Unlike Sagot & Walther (2011), we do not offer a formal analysis of Russian nouns, and make no
particular assumptions about what inflectional information is part of the grammatical system, and what
is lexically specified. However, like them, we include both regular, productive forms, and also ones that
analyses might treat as lexically specified. And of course, their paper and our chapter are similar in
investigating how different analytic assumptions affect assessments of the complexity of the inflection
class systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
However, a categorical division into regular and irregular types has long been
recognized as problematic. First, the scope of a form’s irregularity can range from
having an exponent associated with a different class to having a fully suppletive
form. The extent to which a lexeme is irregular can also range from a single cell to
the majority of the paradigm. (See Corbett et al. 2001 for examples from Russian.)
Aside from the most extreme cases of suppletion, irregular lexemes exhibit
irregularity in only a subset of their paradigms’ cells. And even in suppletion,
stem distributions are often shared with regular patterns (Aski 1995; Bonami &
Boyé 2002; Hippisley et al. 2004; Boyé & Cabredo Hofherr 2006). Thus, even the
most irregular lexemes frequently overlap with regular ones and tend to exhibit at
least some degree of systematicity (Brown & Hippisley 2012). In fact, Brown &
Hippisley argue that ‘there is no hard-and-fast contrast between rules and lexical
specification. Rather, we must make a distinction between the rule on the one
hand and how the lexeme accesses that rule’ (p. 80). In their theory, Network
Morphology, rules are information held at nodes in an inheritance hierarchy. This
information is inherited ultimately by individual lexemes, defining their patterns
of inflectional exponence. However, lexemes may inherit information by default
or by direct specification of the node from which the lexeme should inherit. This
means that within their theory, regularity is defined in terms of how a lexeme
accesses a rule, and a single rule may represent regularity in some lexemes and
irregularity in others.
Second, speakers draw on their knowledge of irregular patterns when general-
izing to new lexemes (Bybee & Slobin 1982; Albright & Hayes 2002, 2003). Words
that are traditionally categorized as irregular play a crucial role in predicting how
speakers generalize morphological patterns to new words. Irregular inflectional
patterns can be more reliable in certain contexts (e.g., phonological neighbor-
hoods) than more regular patterns. Correspondingly, inflectional patterns that are
highly irregular can be extended. The athematic 1 marker -m in Common Slavic
spread from just a handful of verbs to become the dominant 1 marker in some
West and South Slavic languages (Janda 1994). Thus, even highly irregular
patterns can exhibit a degree of productivity.
Third, it is now generally accepted that both irregularly and regularly inflected
words are stored in the mental lexicon and leave traces in memory (Alegre &
Gordon 1999a; Baayen 2007 inter alia). Baayen et al. (2007), among many others,
find a surface frequency effect for regularly inflected words in a lexical decision
task even with low frequency lexemes. Starting with Taft (1979), such a frequency
effect has been widely interpreted as reflecting direct lexical storage of the forms,
rather than storage via component morphemes.⁹ Thus, showing that irregulars are
⁹ See Taft (2004) and Taft & Ardasinski (2006) for more recent, sceptical interpretations of surface
and base frequency effects. Models with different primitive assumptions about representational
structure also interpret surface frequency effects somewhat differently, for example connectionist
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
subject to lexical storage is not a sufficient basis on which to argue that irregular
items are not part of the system of inflectional patterns.
Evidence of this sort blurs the binary classification of inflectional patterns into
‘regular’ and ‘irregular’ types and undermines any concomitant claim that there is
a categorical distinction between patterns generated by the inflectional rule system
(and thus appropriately described in terms of inflection classes) and those that are
lexically-stored exceptions. Yet in the context of knowing that the description of
an inflection class system makes a big difference for calculations of its complexity,
analytic assumptions that place irregulars outside of the inflectional system are
pernicious because they preclude even asking important questions about how
irregulars interact with regulars and the consequences of this for the complexity of
the system.
models (Daugherty & Seidenberg 1994) and discriminative learning models (Baayen et al. 2011).
However, the important thing in the present context is that none of these models posit that irregular
and regular inflected forms are processed and stored in the mental lexicon in categorically different
ways (an idea put forward most famously by Prasada & Pinker (1993) and advocated for from a
neurolinguistic perspective by Ullman (2001, 2004), but now widely rejected).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
inflection actually leads to less complexity than would be expected given each
layer independently. The inclusion of stress information in Russian nouns neces-
sitates a second, distinctly structured inheritance hierarchy. Ultimately, paradig-
matic layers can reveal organizational properties of inflectional systems that are
otherwise hidden. Thus, as with irregularity, analytic assumptions that exclude
non-affixal paradigmatic layers from consideration preclude important questions
about how elements in an inflectional system interact to determine its overall
complexity.
Inflection classes are a layer of structure that mediates between form and meaning,
without bearing meaning directly (they are morphomic in Aronoff’s 1994 terms),
and some languages do not have inflection classes, showing that classes are not
‘needed’. These observations have led to the idea that inflection classes create
unnecessary complexity in morphological systems and have raised the question of
whether there are limits on that complexity.
As noted in the introduction, the focus of this question has shifted away from a
notion of complexity defined in terms of absolute number of inflection classes/
exponents/cells and towards one that is rooted in implicative paradigmatic struc-
ture. Stump & Finkel (2013) define the complexity of an inflection class system as
‘the extent to which the system inhibits motivated inferences about a lexeme’s full
paradigm of realized cells from subsets of its cells’ (2013: 55; emphasis ours). When
defined in this way, the complexity of an inflection class system may, but need not,
be related to the absolute size of the system. Systems with a large number of
inflection classes and/or in which lexemes have a large number of paradigm cells
can exhibit low complexity if there is strong implicative structure within the
paradigm. Likewise, small inflectional systems can be highly complex if inflected
forms are not held together by strong implicative relations (Sims 2015: ch. 5).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
¹⁰ However, at least for the languages that we are most familiar with (Russian, Greek), they base
their analyses on grammatical descriptions that exclude irregularities and non-affixal layers of expo-
nence. See Sims (2015: ch. 5) for a comparison between their analysis of Greek nouns and one based on
a more robust representation of the nominal system.
¹¹ In a dynamic principal parts analysis, the principal parts need not reflect the same morphosyn-
tactic properties from one inflection class to another. Stump & Finkel primarily differentiate this from a
static principal parts analysis, in which the set of principal parts is required to correspond to the same
morphosyntactic properties for all lexemes in a given syntactic category, and thus all inflection classes
within that category.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
At the same time, Stump & Finkel observe a difference in complexity between
predicting one inflected form and predicting class membership (i.e., all forms). In
contrast with the relatively uniform ease with which a single inflected form can be
deduced, ‘Languages vary widely in the number of dynamic principal parts they
require to distinguish a given I[nflection] C[lass]’ (Stump & Finkel 2013: 215).
Similarly, Ackerman & Malouf find greater crosslinguistic differences in average
declensional entropy (an unconditioned entropy measure of inflection class pre-
dictability) than in average conditional entropy (a conditional entropy measure of
inflected form predictability). This suggests that the complexity of an inflection
class system as a whole is not necessarily a direct product of the complexity of the
individual exponents. It is therefore important to investigate how the complexity
of the system as a whole relates to the complexity of the component elements of
the system.
A few steps have been taken in this direction. Sims & Parker (2016) find that
nine investigated inflection class systems show roughly similar degrees of overall
complexity, when calculated over pairs of forms using conditional entropy,
consistent with the Low Conditional Entropy Conjecture. Crucially, however,
they also show that implicative structure does very different amounts of ‘work’
in the languages to produce this result. In some languages, knowledge of one
inflected form is crucial to predicting another. In other languages, inflected forms
are independently fairly predictable, and knowledge of another form does little or
nothing to improve that predictability. Thus, paradigmatic implication is not
always an important determinant of the complexity of inflectional systems.
Additionally, based on data from Icelandic and French, Stump & Finkel (2013)
propose the Marginal Detraction Hypothesis: ‘[m]arginal I[nflection] C[lasse]s
tend to detract most strongly from the IC predictability of other ICs’ (p. 225).
Marginal classes here are defined as ones with few lexemes. The Marginal
Detraction Hypothesis thus asks whether the internal structure of inflection
class systems is homogeneous. The hypothesis is that the implicative structure
of low type frequency classes may differ from that the most frequent classes. (See
also Sims & Parker 2016 for a similar idea.) Related to this, Blevins et al. (2017)
argue that the Zipfian distribution of morphological patterns helps balance two
opposing pressures: the importance of predicting forms and the importance of
discriminating forms. Frequently occurring patterns facilitate prediction.
Suppletive patterns, which are likely to belong to low type frequency classes,
may detract from predictability but at the same time have benefits like being
highly discriminative. Both types of patterns contribute, in different ways, to
ensuring the patterns in the language are usable by speakers.
Together these studies explore the idea that competing pressures may lead
different components of inflectional systems to exhibit different properties. They
also suggest that if there is a strong crosslinguistic tendency for languages to
exhibit low inflection class complexity, this both results from and occurs despite
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3. Nouns that belong to Class I except that they have a null genitive plural, for
example, : raz ‘time..’;
4. Nouns that belong to Class I except that they have -a in the nominative
plural, for example, : goroda ‘city..’;
5. Nouns that belong to Class IV, except for -ov in the genitive plural, for
example, : oblakov ‘cloud..’;
6. Nouns that belong to Class IV, but with nominative plural -i, for example,
: jabloki ‘apple..’;
7. Nouns that belong to Class II except they have an overt genitive plural, for
example, : rasprej ‘strife..’;
8. Nouns that belong to Class IV, but have a nominative plural -i and genitive
plural -ov, for example, č: očki ‘point..’ and očkov ‘point..’;
9. Nouns that belong to Class I, but have a nominative plural -e and a null
genitive plural, for example, ’: krest’jane ‘peasant..’ and
krest’jan ‘peasant..’;
10. Nouns that belong to Class I but have a nominative plural in -a and a null
genitive plural, for example, ̈: teljata ‘calf..’ and teljat ‘calf.
.’.¹²
Table 2.2. Illustration of the four-class system, based on inflectional suffixes
I II III IV
‘law’ ‘map’ ’* ‘bone’ ‘place’
Note:
* Here and throughout the chapter we use scientific transliteration, rather than transcription. This is
a convenience that accommodates Russian speakers and makes it easier to check the examples in
a dictionary (because the spelling is maintained). However, the transliteration is sometimes
misleading with regard to the phonological (or morphological) shape of words. Although it is not
clear in the transliteration of this example, the stem-final consonant cluster in ’ is [sjtj] throughout
the paradigm (e.g., nominative singular [kosjtj], genitive singular [kosjtj-i], instrumental singular
[kosjtj-ju]).
¹² Nouns like ’ and ̈ also exhibit changes in their stems. See discussion below.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
¹³ Russian nouns usually have zero exponence in either the nominative singular or genitive plural,
depending on class; see Table 2.2. When a form has no overt inflectional suffix in a given paradigm cell,
lexemes that otherwise would have stress on the suffix have stress on the last syllable of the stem instead
(see in Table 2.3).
Table 2.3. Illustration of stress classes of Russian nouns
3. Fixed stress on the inflectional ending, but with stem-initial stress in both
nominative plural (and accusative plural when syncretic) and accusative
singular, for example, ‘beard’;
4. Two patterns that combine a shift according to number with retraction in
the nominative plural (and accusative plural when syncretic) and accusative
singular, for example, ‘portion’ and š ‘soul’.¹⁴
¹⁴ Due to the stress shift between singular and plural, the distribution of the retraction of stress onto
the stem is ambiguous. Nouns like are consistent with stress shift in both nominative plural and
accusative singular, but since there is stem stress throughout the singular, the accusative singular is
ambiguous. Conversely, nouns like š are also consistent with both stress shifts, but since there is
stem stress throughout the plural, the nominative plural is ambiguous. Except for ambiguous instances
of this sort, accusative singular stress retraction never occurs unless nominative plural stress retraction
also does, so it seems safe to analyse š as having both stress retractions, with the nominative plural
one being opaque. The proper analysis of is less clear. Stress retraction in the accusative singular
happens (unambiguously) only in nouns with the Class II suffix pattern. While belongs to this
class, other nouns with the same stress pattern do not (e.g., ‘tooth’ (Class I), šč’ ‘city square’
(Class III)). An alternative possibility is therefore to analyse these nouns as having only the nominative
(and accusative) plural stress retraction, since it occurs in combination with a wider range of stem
classes. We do not have a firm opinion about which analysis is ultimately the right one, or even whether
speakers themselves make only one or the other analysis. But it also makes no difference in the present
context. Since our analysis of implicative relations in the following section is based on surface patterns,
all six patterns in Table 2.3 are treated as distinct in the analysis.
¹⁵ Walther (2017) distinguishes between ‘deficient’ and ‘defective’ lexemes where the former are
lexemes for which a speaker could determine what forms would fill the cells but does not use those
forms, and the latter are lexemes for which there is uncertainty about which form would fill missing
cells. We include both types of lexemes in our category of defectiveness.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
the former is masculine while the latter is feminine. They are treated in our
analyses as belonging to the same class since gender is not expressed inflectionally
in nouns. We also abstract away from predictable phonologically-conditioned
variation and predictable semantically-conditioned variation. For example, vowels
reduce when not stressed, but given information about stress, vowel quality is fully
predictable and purely phonological. Thus, we abstract away from vowel reduc-
tion in our class representations. Some genitive plural forms have no overt
exponent, for example, kart ‘map..’, and others have an overt suffix, for
example, zakon-ov ‘law-.’ and učitel-ej ‘teacher-.’. Whether a lexeme
has a zero genitive plural form or an overt ending is morpholexically conditioned
and thus depends on its inflection class, so we include this distribution in our
description. However, which of the two overt exponents will occur is fully
predictable from the phonology of the stem: -ej occurs with morphologically
soft stems and -ov occurs elsewhere (Timberlake 2004: 84–5).¹⁶ Thus, we represent
-ov and -ej as a single exponent. Similarly, we do not include differences in
accusative marking that are predictable based on animacy (see Corbett & Fraser
1993: 129–30 for justification).¹⁷ Thus, our analysis reflects only information
about exponence that is directly a property of inflection class membership.
See Parker (2016) for a more complete description of the patterns and para-
digmatic layers of Russian nouns.
(2) Complexity of an inflection class system: the average extent to which the
system inhibits motivated inferences about the realized form of a lexeme,
given one or more other realized forms of the same lexeme.
(3) Entropy
X
HðAÞ ¼ pðaÞlog2 pðaÞ
a∈A
¹⁸ We recognize that these measures do not capture all aspects of a system’s complexity, especially
because they are limited to comparisons between individual cells (as opposed to larger subsets of the
paradigm). See, for example, Stump & Finkel (2013) and Bonami & Beniamine (2015) for investigations
that consider complexity based on predictiveness/predictability of multiple paradigm cells. Expanding
the current work to take account of paradigm structuring would be valuable. However, our focus here is
on comparing across different descriptions of the Russian nominal system, and the importance of the
description for estimates of inflection class complexity. A simple measure gives us the best perspective
on this issue.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Conditional entropy H(AjB) represents the average surprisal associated with the
outcome of a random variable A, given knowledge of the outcome of another
random variable B. In the present context, A and B are paradigm cells in which
A ≠ B. Implicitly conditioned on the lexeme, the outcomes of A and B are two
inflected forms of the same lexeme. Conditional entropy thus represents the
average surprisal associated with the exponent that realizes a given morphosyn-
tactic property set, knowing the exponence of another inflected form of the same
lexeme.
X pðbÞ
HðAjBÞ ¼ pðb; aÞlog2
a∈A;b∈B
pðb; aÞ
Averaging across the entropy values H(A) for all licensed morphosyntactic
property sets produces an estimate of the potential complexity of the system as a
whole. This mean entropy value represents the average uncertainty associated
with predicting the exponent of a paradigm cell knowing only the possible
exponents that realize that cell in different classes. Exponents of different mor-
phosyntactic property sets are thus treated as independent of each other. By
comparison, averaging across the conditional entropy values H(AjB) of all
licensed combinations of morphosyntactic property sets A and B produces an
estimate of the complexity of the inflectional system as a whole, taking into
account implicative relations holding between pairs of cells. This represents the
uncertainty associated with a given cell of a lexeme knowing the exponence of one
other cell of the same lexeme.
The conditional entropy H(AjB) will never be higher than the entropy H(A)
and will be lower whenever the exponent that realizes B is informative about the
exponent that realizes A. Knowing one form of a lexeme cannot increase the
surprisal associated with another form, but it can lower it. The extent to which
knowing one cell reduces the uncertainty associated with another cell (the differ-
ence between entropy and conditional entropy) represents how much ‘work’ is
being done by the implicative structure of the system.
We now turn to the primary questions of this chapter, starting with: To what
extent does including more paradigmatic layers into the system affect its com-
plexity? Our approach is to develop multiple parallel descriptions of Russian
nominal inflectional structure based on the paradigmatic layers. Each description
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
is based on the same set of lexemes but the lexemes are distributed across classes
differently depending on which layers are included in the analysis. This allows us
to investigate how paradigmatic layers interact, and specifically, how those inter-
actions influence the complexity of the system as a whole.
14 +
21 + +
22 + +
33 + + +
42 + +
57 + + +
64 + + +
82 + + + +
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
4 4
2 2
0 0
Figure 2.1. Word types per inflection class across different granularities
natural languages, including word frequencies (see Baayen 2001 for detailed
discussion).
To assess how the complexity of the system changes with granularity, we calcu-
lated the mean entropy (= estimated potential complexity) and mean conditional
entropy (= estimated actual complexity) of each representation of the system
presented in Table 2.4. In light of the type frequency distribution of classes
shown in Figure 2.1, we calculated mean conditional entropy both with and
without type frequency weighting. In the weighted condition, the probabilities
of each exponent were weighted by the type frequency of the exponent. This
measure represents the complexity of the system when both implicative structure
and the uneven distribution of lexemes across classes are taken into account.
Figure 2.2 shows that as granularity increases, and more paradigmatic layers are
included in the system, the entropy and unweighted conditional entropy of the
system tend to increase. This is unsurprising from the perspective of information
theory—as more elements are present in the system, there will be greater surprisal
associated with those elements on average. More interestingly, the weighted
conditional entropy values remain low regardless of inflection class granularity;
the weighted conditional entropy only increases 0.12 bits from a representation of
the system that includes only suffixes (fourteen classes) to one with all
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Entropy (unweighted)
2.5 Conditional Entropy (unweighted)
Weighted Conditional Entropy
Complexity Measures in Bits
2.0
1.5
1.0
0.5
0.0
14 21 22 33 42 57 64 82
Number of Classes
paradigmatic layers together (eighty-two classes). This means that the uncertainty
associated with a large number of classes is mitigated by a combination of the
implicative structure of the system and the unequal distribution of lexemes across
classes. Implicative structure and the distribution of lexemes across classes con-
spire to maintain low systemic complexity.
However, even a random distribution of exponents will tend to produce a
system with lower mean conditional entropy than mean entropy, because some
of the exponents will be accidentally informative about other exponents. Thus, we
should ask whether the implicative structure of the system minimizes the com-
plexity of the inflection class system in each granularity more than is expected by
chance. Employing Monte Carlo simulation, we created a hundred simulated data
sets for each granularity. In each granularity the simulated data sets contained the
same exponents and the same number of classes as in the real granularity, but the
exponents were randomly distributed across the classes.¹⁹ The mean conditional
entropy of the simulated data sets represent the amount of complexity we expect
in systems of this size based on a random distribution of exponents. If the actual
complexity falls outside of the simulated values, we can conclude that the ‘work’
done by the implicative structure in that granularity is significant at a level of
p<0.01.
As can be seen in Figure 2.3, in every granularity the actual mean conditional
entropy of the system is lower than that of all of the simulated data sets, and as the
¹⁹ Here we calculate the mean conditional entropy of the system without weighting classes by type
frequency. Weighting classes equally approximates a random distribution of lexemes across classes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
2.5
Simulated min/mean/max
Actual mean
2.0
Conditional Entropy in Bits
1.5
1.0
0.5
0.0
14 21 22 33 42 57 64 82
Number of Classes
Figure 2.3. Conditional entropy of real and a hundred Monte Carlo simulations of
Russian nouns across granularities; for the ‘simulated’ series, vertical bars indicate
maximum and minimum values
We now turn to our final question about complexity and regularity: Do regular
and irregular classes contribute similarly to the overall complexity of the inflec-
tional system?
To investigate how irregularity affects the complexity of the system, we took the
most granular representation (eighty-two classes) and within each class classified
the pattern found in each layer of exponence as regular or irregular.²⁰ We assume
any effects of irregularity will be most evident in the most granular representation
of the system.
We base our definition of (ir)regularity on the type frequency of particular
patterns within the Russian nominal system.²¹ For instance, since the large
majority of Russian nouns (80.8%) do not exhibit stem alternation, we define
²⁰ We originally classified irregularity within each layer on an ordered non-binary scale; however,
due to how few types occur at some points of the scale, we were forced to adopt a binary classification to
avoid data sparsity.
²¹ An anonymous reviewer asked what justifies only consistent stems or fixed stress being counted as
regular, noting that stem alternations or variable stress can be considered regular if either is the most
frequent pattern in a language or class. We agree that stem alternations or variable stress can be regular
in a language (see discussion in Sagot & Walther 2011: 5–7 for examples); however, neither are regular
in Russian nouns, which is evident given the large percentage of nouns that exhibit fixed stress and do
not have stem alternations. One might also argue that, for example, stem and/or stress alternation
should be considered regular because they are the most frequent pattern within a particular affixal class.
This is true of some Russian nouns, for example, all nouns with the affixal pattern exemplified by
‘time’ (see section 2.4) have both stem extensions and stress alternations. However, defining
regularity in this way relies on a privileged status for affixal exponence—a position we reject (see
discussion in section 2.1). Furthermore, our notion of granularity is based on the idea that affixal
and non-affixal exponence co-determine the number of inflection classes. When both affixal and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
When classified in this way, the majority of Russian nouns exhibit no irregu-
larity (33,144 lexemes in eight classes); many exhibit some type of irregularity in
one layer (9,709 lexemes in forty-three classes); some exhibit irregularity in two
layers (607 lexemes in twenty-five classes); and a few exhibit irregularity in three
layers (twenty-six lexemes in six classes). No lexemes exhibit irregularity in all
four layers, at least partly because irregularity in one layer can limit the possibility
of irregularity in another, for example, defectiveness in the singular or plural makes
stress shift between numbers impossible. Thus, the majority of lexemes are fully
regular, but the majority of classes have irregularity in some subset of their layers.
Importantly in the present context, this frames the question of inflection class
complexity as one having to do, in part, with the extent to which the large number
of classes that exhibit some degree of irregularity detract from the predictability of
the small number of high type frequency classes that exhibit no irregularity. Do the
many lower type frequency irregular classes contribute disproportionately to the
complexity of the whole inflection class system in Russian?
non-affixal exponence determine classes, all lexemes of a single class exhibit the exact same patterns,
making any attempt at class-specific determinations of regularity meaningless.
²² We found no significant interactions between layers. We also ran the same model with class type
frequency as an additional independent variable; class type frequency was not significant in the model.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
0.007 0.007
0.006 0.006
0.005 0.005
Entropy Difference
Entropy Difference
0.004 0.004
0.003 0.003
0.002 0.002
0.001 0.001
0.000 0.000
–0.001 –0.001
Reg Irreg Reg Irreg
Suffixes Stems
0.007 0.007
0.006 0.006
0.005 0.005
Entropy Difference
Entropy Difference
0.004 0.004
0.003 0.003
0.002 0.002
0.001 0.001
0.000 0.000
–0.001 –0.001
Reg Irreg Reg Irreg
Stress Defectiveness
Figure 2.4. Effect of the irregularity of each layer on system complexity (entropy
difference); the vertical bars show 95% confidence intervals
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
This study highlights the need for caution in interpreting results from data whose
representations include only affixal and regular inflectional patterns, since they
may misrepresent the complexity of inflectional systems and/or obscure import-
ant aspects of inflectional structure. For example, the four most granular repre-
sentations of Russian nouns in our study (forty-two, fifty-seven, sixty-four, and
eighty-two classes) have an unweighted average conditional entropy that exceeds
the largest unweighted average conditional entropy value among the ten languages
investigated by Ackerman & Malouf (2013),²³ even though the conditional
entropy of a four-class system of Russian falls in the middle of the range for
languages they investigate. The mean conditional entropy of our most granular
representation (eighty-two classes) is twice as high as the value for the four-class
Russian system in Ackerman & Malouf’s paper. This raises questions about the
extent to which typologically low systemic complexity is a reflection of assump-
tions adopted when creating representations of those systems.
At the same time, it is equally important to point out that for every represen-
tation of the Russian nominal inflectional system that we investigated—that is,
every granularity—the estimated complexity of the Russian noun class system was
substantially lower than the potential complexity of the system, as shown in
Figure 2.2 in section 2.6.2. The estimated complexity of the system was also
significantly lower than would be expected by chance (Figure 2.3 in section
2.6.2). This indicates that a significant amount of ‘work’ is done by implicative
structure, regardless of the particular representation that is assumed. The latter
result contradicts Ackerman & Malouf’s (2013: 451) speculation that Russian has
no need to rely on implicative organization. However, arguably the more import-
ant conclusion is that in the end, our results are consistent with their Low
Conditional Entropy Conjecture, if it is interpreted as a claim that inflection
class systems self-organize to minimize the amount of complexity embodied in
the system (rather than as a claim about a particular maximum possible condi-
tional entropy value). No matter what particular representation we assume,
Russian nouns show a pattern that is consistent with low systemic complexity,
suggesting that a typological tendency towards low systemic complexity may
extend beyond affixal and highly regular patterns.
While the Low Conditional Entropy Conjecture focuses on a global measure of
the complexity of inflection class systems, an equally interesting question has to
do with how the component parts of the system shape this global complexity.
From this perspective, an important result in this chapter is that the estimated
actual complexity of the system changes very little, despite the fact that the
²³ Amele, with a conditional entropy of 1.105 bits; Ackerman & Malouf (2013: 443, table 3).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
²⁴ See Parker et al. (to appear) for computational modelling of inflection class learning that moves in
this direction.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Acknowledgements
We thank Peter Arkadiev, Gregory Stump, and an anonymous reviewer for their
helpful comments. All errors remain entirely our own. This work was supported in
part by The Ohio State University, through a Presidential Fellowship awarded to Jeff
Parker and a sabbatical granted to Andrea Sims.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3
Demorphologization and deepening
complexity in Murrinhpatha
John Mansfield and Rachel Nordlinger
3.1 Introduction
¹ ‘Demorphologization’ is used rather differently by Joseph & Janda (1988), who use it in reference
to regularization of phonological processes such that they become independent of an erstwhile
morphological context.
John Mansfield and Rachel Nordlinger, Demorphologization and deepening complexity in Murrinhpatha In: The Complexities
of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John Mansfield and
Rachel Nordlinger.
DOI: 10.1093/oso/9780198861287.003.0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
53
Street 1987; Nordlinger 2015; Forshaw 2016; Mansfield 2016). We present data on
analogical changes observed by comparing recent fieldwork documentation with
forms documented some forty years earlier, showing that the process of demor-
phologization is still underway. Analogical changes show that classifier stem forms
are not learnt and memorized as isolated units, but rather that speakers draw on
paradigmatic semi-regularities to predict unknown forms. Though the system does
not exhibit regular, productive inflection, neither can it be characterized as a set of
‘frozen forms’. Rather it is a relational system, and one that is in flux. We treat
analogical predictability as a form of linguistic complexity, and show that through
ongoing demorphologization, the complexity of Murrinhpatha classifier stems is
increasing. We quantify this unpredictability by adapting probabilistic tools devel-
oped by Ackerman et al. (2009) and Ackerman & Malouf (2015). However, while
the latter hypothesize limits of complexity for systems of productive inflection, the
Murrinhpatha classifier stems are a closed-class system of 1,638 inflectional forms,
where semi-regularities aid acquisition and processing, but whole-form memor-
ization may mitigate the requirement for analogical predictability.
Murrinhpatha is a non-Pama-Nyungan polysynthetic Australian language of
the Daly River region of the Northern Territory. It has maintained a vibrant
speech community some eighty years after its speakers shifted to settled life under
the influence of Catholic missionaries (Pye 1972). Murrinhpatha has some of the
characteristics, both linguistic and social, that might associate it with the ‘isolated,
complex’ language type proposed in sociolinguistic typology (Kusters 2003;
Lupyan & Dale 2010; Trudgill 2011: 136; Bentz et al. 2015). However it is doubtful
that notions of sociolinguistic ‘isolation’ or ‘low-contact’ apply in this instance,
since evidence points to a tradition of regional multilingualism (Falkenberg 1962:
13; Dixon 2002: 674). A crucial distinction for sociolinguistic typology is that
between child-acquired versus adult L2-acquired multilingualism: child multilin-
gualism has been argued to maintain or increase complexity, and adult acquisition
to reduce complexity (Thomason & Kaufman 1988: 65ff; McWhorter 2007 and
Chapter 10, this volume; Trudgill 2011: 34). In the case of Murrinhpatha, we know
too little of traditional multilingualism to know which is more applicable. However
in the post-settlement era (1930s–present) a large number of people from Marri
Ngarr, Marri Tjevin, and other language groups have shifted to Murrinhpatha, in
some cases learning both languages as children but switching to Murrinhpatha
during adolescent years spent in a multi-ethnic school dormitory established by
the missionaries (Mansfield 2014: 98). This influx of new speakers has not brought
about any drastic simplifications or other language contact effects in the contem-
porary grammar of Murrinhpatha, although it has led to the demise of the other
languages of the region.² In this chapter, we demonstrate more specifically that
² Note however that the influx of speakers from other language groups may have had some influence
on the distribution of sociolinguistic variables (Mansfield 2015a, 2015b: 183).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
There are several distinct dimensions of morphology that can be treated as forms
of linguistic complexity (Kusters 2003, 2008; Anderson 2015a), but in this chapter
we focus solely on (lexically specified) inflectional allomorphy. For example, in the
Australian language Warlpiri verbs are suffixed with one of four lexically specified
past tense allomorphs, -ca, -ŋu, -ɳu, -nu (Hale 1969; Nash 1980: 40). Where
lexemes share the same allomorph selection in all their forms, the shared para-
digms are usually referred to as ‘inflection classes’. Inflectional allomorphy of this
type can be seen as prototypical morphological complexity, since it directly
reduces form:meaning transparency (Aronoff 1998).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
55
Table 3.1. Warlpiri verb inflection classes (Hale 1969; Nash 1980: 40)
allomorphs for all Warlpiri inflection classes. The syncretism between some
classes for some tense categories makes these inflectional forms less than fully
predictive of other inflectional forms of the same lexeme. For example, knowing
that the presentational form is in -ɳiɲa narrows the range of possible imperative
allomorphs, but does not help us to decide between the two possibilities -ka, -ɲa.
Residual uncertainty in predicting an inflectional form, given knowledge of other
forms of the same lexeme, has been labelled integrative complexity (Ackerman &
Malouf 2013).
Integrative complexity meets several of the desiderata enumerated in Arkadiev
& Gardani’s Introduction to this volume (Chapter 1). First, it is quantifiable and
can be used to compare typologically diverse languages. Second, its conceptual-
ization in terms of speaker inferences from known to unknown forms gives it a
clear basis in psycholinguistic processing. Finally, whereas enumerative complex-
ities lean heavily on the distinction between morphology and syntax, integrative
complexity is relatively independent of this issue. Lexical selection of allomorphs
generally occurs within units that are identified as words, but if a similar phe-
nomenon occurred in phrase-like structures (e.g., periphrastic inflections with
allomorphy on the auxiliary), this would have no real effect on the modelling of
integrative complexity in the paradigm.
In this chapter, we focus on the effects that language change may have on
inflectional predictability. It has been shown that inflection class structure may
persist in a language over long time periods (e.g., Maiden 2005; Gardani 2013), but
even if it may in some instances be relatively stable, it is of course not completely
static. The inflectional allomorphs selected by lexemes exhibit synchronic vari-
ation, with fluctuating variation rates over time leading to language change
(Weinreich et al. 1968). The long-term patterns of changing allomorph selection
have been studied in historically documented languages such as Latin (Gardani
2013: 201–28) and English (Jespersen 1949; Bybee & Moder 1983). An interesting
question is whether the direction of such change reflects limits on overall com-
plexity and, conversely, what mechanisms lead to an increase in complexity.
There must be some upper limit of unpredictability at which inflectional
systems remain learnable. If allomorphic distributions were too unpredictable,
their prospects of being stably transmitted from one generation to the next would
become rather slim. The obvious way to reduce unpredictability is to replace
improbable allomorphs with more probable ones. We have little idea of how
much unpredictability is too much, though crosslinguistic studies by Ackerman
& Malouf (2013, 2015) and Stump & Finkel (2013) have documented the range
of unpredictability found in genetically and typologically diverse samples.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
57
Aii = unknown
Ai = x1 compare Bi = x1, Ci = x1, Di = x1, Ei = x2, Fi = x2
relate
Aii = y2 induce Bii = y1, Cii = y2, Dii = y2
Figure 3.1. Ackerman & Malouf (2015) mechanism for predicting unknown
inflectional forms
not share exponence with the source form. The comparable lexemes analogically
present both y₁ and y₂ as exponence candidates for the target form, but y₂ wins out
because it occurs more frequently in this distribution. If Aii is used as a source
form or comparable lexeme form in a subsequent iteration, it will have the
exponence y₂.
Ackerman & Malouf (2015) computationally simulate this model of inflectional
change based on a ‘highly unrealistic language’ in which allomorphy is almost
completely unpredictable in the initial state. The simulation language has a
hundred lexemes, each of which inflects for eight morphosyntactic categories,
giving a total of 800 forms in the system. Each morphosyntactic category has
three allomorphs, which are randomly assigned to each lexeme. Thus there are
3⁸ = 6,561 possible inflectional paradigms, so that most of the hundred lexemes
have an idiosyncratic paradigm, that is, not shared with any other lexeme. In this
initial state, there are no inflectional classes. As the simulation iterates, replace-
ment of unknown allomorphs with the most predictable allomorph leads to
massive convergence of lexemes towards shared inflectional paradigms. The
simulation ends when allomorphy stabilizes (i.e., the unknown form already is
the most predictable form) for twenty-five consecutive iterations. Given hundreds
of trials of the simulation, in a large proportion of simulations (no exact figure is
given), all lexemes converge on a single set of allomorphs (i.e., no allomorphy),
creating a single inflectional paradigm. In the remaining simulations, lexemes
converge on between two and eighty-eight inflectional classes, the median number
being twelve (Ackerman & Malouf 2015: 8).
In terms of inflectional predictability, the initial random distribution of allo-
morphs [x₁, x₂, x₃] for each inflected form means that knowledge of other inflected
forms does not offer any reduction to uncertainty (except by occasional accident
of the distribution), and conditional entropy is therefore only marginally less than
unconditional entropy, that is, H(a, b, c) = 1.58 bits. But the replacement by most
predictable allomorph mechanism in the simulated language change reduces this
entropy to 0 bits in the instances where all lexemes converge on a single paradigm,
and an average of 0.64 bits in the instances where the simulation converges on a
set of inflectional classes (Ackerman & Malouf 2015: 9). The average conditional
entropy found in these simulated inflectional systems sits neatly within the range of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
59
0–1.1 bits found in the study of natural languages (Ackerman & Malouf 2013). This
provides support for the notion that the model’s simplification mechanism may
have something in common with mechanisms deployed in natural language.
One issue that has been insufficiently addressed in work on integrative com-
plexity is the question of open versus closed lexical classes. The Ackerman &
Malouf (2015) simulation works with a set of a hundred lexemes, that is to say a
finite set, and therefore a closed class. The basic formulation of the Paradigm Cell
Filling Problem (PCFP; Ackerman et al. 2009) presumes that unknown inflec-
tional forms must be predicted by a speaker, but also that the correct inflectional
exponence is in some way defined—perhaps by a dictionary, or a more erudite
speaker. Now, if we take ‘open class’ to mean a lexical class to which entirely new
words can be added, then there must be a point at which inflectional forms of
these words are not pre-defined, and there is no correct or incorrect selection of
exponence. In other words, for truly open-class lexemes, the PCFP is undefined. In
the next section, we will see that Murrinhpatha classifier stems are a closed class,
with rather fewer members than may be intended in the original PCFP formula-
tion. However, we argue that the model is still relevant, as Murrinhpatha speakers
are not born with complete knowledge of the classifier stem paradigms, and must
therefore use predictive mechanisms to extrapolate from known to unknown
forms.
³ In other work these have been called ‘auxiliaries’ (Walsh 1976), ‘classifier-subject pronominals’
(Nordlinger 2011), and ‘finite verbs’ (Mansfield 2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
details of the system the reader is referred to Blythe (2009), Nordlinger (2011,
2015), and Mansfield (2016, 2019) among others.⁴
(1) wuɾan
3S.(6).
‘She goes.’
(2) muŋam-paɭ
3S.(11).-break
‘She broke it off.’
(3) pam-ŋin̪t̪a-nu-ma-ɻaʈal
3S.:(24).-.---tear
‘They (two female non-siblings) tore the (cloth) from each other.’
(RN-20070531-002:011)
(4) piɾim-nin̪t̪a-nu-bu-wuj-waɖa-ya
3S.(3).-.--thigh-put.into--
‘They put them in their pockets.’ (JB 43JBc743652_747130)
(5) puddan-wunku-ɭaɭ-dejida-ŋime=pumpan-ka
3S.(29).-3O-drop-in.turn-.=3S.(6).-
‘They (dual, sibling) are dropping them (paucal, female, non-sibling) off,
one after the other, as they go along.’ (Blythe 2009: 134)
For most classifier stems the exponence pattern making up the paradigm of forty-
two inflected forms is unique to that stem. Thus the concept of ‘inflectional
classes’—a set of exponence paradigms shared by many lexemes—is not directly
applicable to Murrinhpatha. (1)–(5) show classifier stems as unsegmented wholes,
and this has been the representation used in most work on Murrinhpatha.
However there are semi-regular subcomponents evident in these stems, and it is
these that we treat as exponents of inflectional categories. These are not product-
ive morphs that are applicable to new lexemes in an open class, however they do
constitute morphology in the sense of form:meaning associations between sys-
tematically related forms (Anderson 2015b).
⁴ In the Appendix we have provided paradigms for five classifier stems, to exemplify the complexity
amongst them. Previous descriptions of the Murrinhpatha verbal system (e.g., Blythe et al. 2007;
Nordlinger 2011, 2015) have tended to treat these classifier stem paradigms as consisting of synchron-
ically unanalysable portmanteau forms, due to the substantial amounts of unpredictability and
suppletion within the paradigms. The full set of thirty-nine paradigms as analysed in this chapter is
available at http://langwidj.org/Murrinhpatha-inflection.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
61
⁵ Fuller description is available in Mansfield (2016, 2019), drawing on earlier partial analyses (Walsh
1976: 224; Green 2003; Forshaw 2016: 37). As shown in the examples above, there is also further
inflectional morphology in the verb that is not part of the classifier stem paradigms, and can be applied
equally to verbs based on any classifier stem (Nordlinger 2015, 2017). This morphology has no bearing
on the issues discussed in this chapter and will therefore not feature in our remaining discussion.
⁶ The full paradigms are provided in the Appendix.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
⁷ The description of ‘vowel-only stems’ is somewhat different from Mansfield (2016), where they are
simply labelled ‘phonologically empty stems’. The analysis there nonetheless depends on underlying
‘theme vowels’ in such stems, though this is not explicitly discussed. An alternative analysis would
propose a zero theme vowel, to avoid the use of unrealized underlying vowels. We have experimented
with calculation of Murrinhpatha integrative complexity using both analyses, and found that the
difference is very small (< 1%). The unrealized vowel alternative produces slightly lower complexity
measurements, and we therefore select this option to keep our complexity measurements conservative.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
63
. - . - [, , ] -
la ‘(26)’ kila dilam pilla
k-i-la[]-∅ d-i-la[]-m p-i-lla[:]-∅
ma ‘(34)’ ma mam pume
∅-u-ma[]-∅* ∅-u-ma[]-m p-u-me[:]-∅
ɾu ‘(6)’ kuɾu wuɾan puɳi
k-u-ɾu[]-∅ w-u-ɾa[:]-n p-u-ø[:]-ɳi
i ‘(1)’ ki dim piɾini
k-i-∅[]-∅ d-i-∅[]-m p-i-ɾi[:]-ni
Notes: * PrefV, like the stem vowel, does not surface unless it can syllabify with an onset consonant.
Thus we can analyse a PrefV u- formative in ø-u-ma-m 3.(34)., in keeping with this
classifier’s overall paradigmatic pattern, though the surface form is mam.
= default; = geminate; = ɾ-alternation.
(26) and (1), but PrefV u- in (34) and (6). Importantly, these patterns
are often orthogonal—for example, the PrefV selection is independent of the PrefC
selection in 3..
As shown in the full paradigm examples in the Appendix, the complete morpho-
syntactic paradigm of a classifier stem consists of forty-two inflectional forms.
Subjects are distinguished for 1/2/3 person, cross-cutting a three-way //
number distinction (although / is consistently collapsed in tense, and
in all tenses for some paradigms).⁸ There is also a 1+2 ‘we inclusive’ person category,
which has no number distinctions. These are the core number/person categories of
Murrinhpatha, but more specific subcategories can be encoded using various pre-
dictable suffixes not discussed here (Nordlinger 2015). There are four basic tense/
modality categories (henceforth ‘tenses’): non-future (), irrealis (), past (),
and past irrealis (), as well as ‘subtense’ distinctions between vs presen-
tational (), and vs future indicative (), which apply only to third-person
forms. Again, these core categories can be further specified by predictable suffixes
encoding tense, modality, and aspect (Nordlinger & Caudal 2012). Table 3.4 illus-
trates a complete paradigm of inflected forms for one of the more regular classifiers,
na ‘(27)’, with both surface forms and intersecting formative analysis.
Some formatives in some cells have a consistent form (i.e., no allomorphy), such
as PrefC p- in 3.. More typical is a selection between a handful of formative
allomorphs, for example Suffix -m, -n, -ŋam, -ŋan in , or PrefV a-, e-, i, u- for
all cells. A particularly wide selection of allomorphs is PrefC p-, w-, d-, n-, j-, k-,
⁸ The category here labelled is used for both dual and paucal referents; it is labelled PAUCAL
(PC) in Mansfield (2016) and DAUCAL in Blythe (2009).
Table 3.4. Inflectional exponence of na ‘(27)’
65
ø- 3., and StemC allomorphy also has a large selection of allomorphs, once
we take into account various suppletive (i.e., altogether unpatterned) consonant
alternations.
From the point of view of integrative complexity, that is, the predictability of an
inflected form given knowledge of some other form, the formatives individually
have an intermediate degree of predictability. In certain dimensions there is very
high predictability: for example, if one form takes Suffix -ŋam, there is a very
high likelihood (though not quite categorical) that any other form of the
same verb will take Suffix -ŋam. This is illustrated in the consistent tense pattern-
ing of Suffix allomorphs in Table 3.4. Among cells that have the same tense and
number categories but differ for 1/2/3 person, the only difference of exponence is
usually PrefC; these triplets of cells are therefore tightly integrated in terms of
implicational structure. However, when we consider the implicative relationship
between cells from different tenses, we find that, say, knowing -ŋam
provides little information about the Suffix allomorph for cells. Allomorph
selection across tenses is strongly orthogonal. Other formatives have generally
high degrees of integrative complexity, that is to say, inconsistent paradigmatic
patterning. This is especially true of the stem formatives StemC, StemVH, and
StemVF, and also to some extent of PrefV.
The problem of predicting an unknown inflected form of a Murrinhpatha
classifier stem therefore involves predicting allomorph selection for six intersect-
ing formatives, based on knowledge of such an intersection for some other form
of the classifier stem. Some formatives provide good chances of correct prediction,
while others are rather less helpful. This situation is not as extreme as the
completely random paradigmatic distribution of allomorphs in Ackerman &
Malouf (2015)’s ‘unrealistic language’, though the presence of six different dimen-
sions of allomorphy in Murrinhpatha nonetheless leads to a high degree of
complexity, since the unpredictability of the allomorphs is compounded.
Because Murrinhpatha classifier stems often have idiosyncratic exponents, that
is, allomorphs not shared by any other classifier stem, the entropy calculations
used in Ackerman & Malouf (2013) are not directly applicable. The latter’s
allomorphic entropy method assumes that all possible exponents have been
encountered in other lexemes, so that allomorphy prediction involves a distribu-
tion of possible outcomes. But in a system with idiosyncratic exponents, the
unknown target exponent may be one that has not previously been encountered
(cf. Dahl, Chapter 13, this volume). The speaker’s challenge is not one of entropy
in the distribution of previous observations, but of attempting to predict an
outcome that may or may not match any previous observation. Thus the math-
ematical analysis calculates chance of correct prediction (including zero chance
for a previously unencountered paradigmatic relation), rather than degrees of
entropy. Nonetheless, we can make a notional comparison of Murrinhpatha with
the crosslinguistic findings on entropy in Ackerman & Malouf (2013). The latter
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
finds average conditional entropy between 0 and 1.1 bits, and 1 bit of entropy
equates to a randomized prediction having 50% chance of matching the
outcome. Mansfield (2016) calculates that the average chance of correct prediction
from one Murrinhpatha classifier stem form to another is 43%, comparable to 1.22
bits of entropy.⁹ This is slightly outside the range of the Ackerman & Malouf
sample, suggesting that Murrinhpatha’s closed-class classifier stems have an
integrative complexity at the upper end of the scale found for open-class systems
in other languages. As far as we know, the only language that has been analysed as
having clearly higher integrative complexity is Seri (isolate, Mexico), which has
almost 2 bits average conditional entropy (Baerman 2016).
67
None of the seven variables thus identified have enough corpus tokens to support
a rigorous variationist analysis. Nor is there sufficient data to permit differentiation
between contextual factors such as phrasal context, speech style, speaker gender, etc.
Rather, in this study we focus purely on the distribution of variants among
speakers born in the first half of the twentieth century (‘older speakers’) versus
those born in the second half (‘younger speakers’). This method allows us to detect
proportions suggestive of change in progress in inflectional variants, and thereby
to search for signs of the Ackerman & Malouf (2015) simplification mechanism in
effect. In fact, for all seven of the variables, there is a striking difference between
variant distributions among older and younger groups, with the younger moving
strongly towards the variant not attested in earlier documentation.¹¹ This is likely
not an accident: the fact that these seven inflected forms were noted as variable is
primarily because they stood out in Mansfield’s fieldwork as conflicting with
earlier grammatical descriptions of the language. On the other hand, though
speakers showed clear awareness of social indexicality in phonological and lexical
variation among the generations, they were unaware of the intergenerational
variations in inflectional morphology (Mansfield 2014: 469ff).
It has often been observed that less frequent inflectional forms are more suscep-
tible to analogical change in morphology, though frequent forms may also undergo
such changes (e.g., Fertig 2000: 125). Since our method for identifying changes in
Murrinhpatha depends on the salience of these changes in fieldwork, these can all be
said to occur in fairly frequent forms. We presume that further analogical changes
occur in less frequent forms, though we have not had the opportunity to observe
these, and the corpus data drawn upon for this study does not permit robust
estimates of inflectional form frequency.
Table 3.5 lists the seven observed variables, with variants preferred by older
and younger speakers respectively according to the corpus evidence. Note
that where regular triplets of 1/2/3 person inflections are all involved, these
are treated as a single variable in view of their tight mutual implications.
Token numbers in parentheses indicate the number of tokens found for the
older:newer variants among that speaker group. For example, for 1S.(34).,
older speakers were found to have five tokens of me and one token of ŋeme,
¹¹ Some of the sources for older speakers are written (e.g., Bible; Street 1987) and do not have
accompanying audio sources. It is possible that these sources underreport use of innovative variants, by
correcting them to what may have been seen as the ‘correct’ form. This may account for some of the
strength of the swing in proportions from older to younger speaker groups.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
while younger speakers were found to have zero tokens of me and nine of
ŋeme.
Interestingly, one of the few forms earlier documented as being variable,
nuɻa ~ na 3.. (Street 1987: 84), showed only marginal variability in
the corpus data. There are dozens of attestations for na, and only one for nuɻa,
suggesting that the latter variant was already on its way out when Street
recorded it.
In the last section we saw that Murrinhpatha classifier stems are a closed class in
which the inflectional paradigms are large, and implicational relations are highly
unpredictable. We also saw that allomorphy of exponence in this system is not
static, but rather encompasses some variable forms, which show signs of change
over the last couple of generations. Thus we are now in a position to investigate
whether the changes observed in Murrinhpatha decrease or increase the complex-
ity of the system. To test this, we ran the Ackerman & Malouf (2015) simplifica-
tion method (with adaptions as described above) on the relevant classifier forms,
identifying the most predicted allomorphs. We show that the observed change
does not replace an incumbent allomorph with the most predictable allomorph in
any of the seven inflected forms. We then go on to consider a weaker form of the
Ackerman & Malouf (2015) simplification mechanism: when speakers replace an
old allomorph with a new one, do they at least select one that is more predictable
than the previous? We find that, on the contrary, most of the changes observed in
Murrinhpatha select less predictable allomorphs, thus increasing the complexity of
the system.
The Ackerman & Malouf (2015) simplification mechanism was implemented
for Murrinhpatha classifier inflections using intersecting formatives to draw
independent analogies, since this method has been shown to provide the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
69
¹² The implementation code is written in Python (Python Software Foundation n.d.), and takes as
input the inflectional paradigm data format established for the Principle Parts Analyzer (Finkel &
Stump 2013). Both code and data are available online at http://langwidj.org/Murrinhpatha-inflection.
¹³ ɾu [] ! ɻu [] may not seem like an obvious case of gemination, but it follows from a ɾɾ !
ɻ process observed in Murrinhpatha’s sister language Ngan’gityemerri (Reid 1990) and their shared
proto-language (Green 2003). In Murrinhpatha it is observable only in the classifier stem paradigms,
where it fits with a broader gemination pattern.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
71
¹⁴ In fact, their main argument focuses on the greater generality of their Low Conditional Entropy
Conjecture (Ackerman & Malouf 2013) as compared to the No Blur Principle (Carstairs-McCarthy
1994), which does not directly concern us here.
¹⁵ The other changes observed are potentially explicable by more subtle departures from the
Ackerman & Malouf (2015) simplification mechanism—for example, by weighting of comparable
classifier stems according to their respective entropies of prediction, with near-categorical predictors
given extra weight (2.(34).), or by allowing prediction to be based on phonological relation-
ships, including identity, rather than inflectional exponents (.(34).) (Bonami & Beniamine
2016). .(6). and..(7). seem to involve greater independence of formatives than has
been previously proposed for the system (Mansfield 2016). Satisfactory analysis of any of these
instances would require a separate study.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(6) 1.(34).
Older form
∅-a-me[]-∅
me
Ackerman & Malouf (2015) simplified Observed
∅-a-me[]-∅ ŋ-e-me[]-∅
me ŋeme
In the case of (6), there are two observed deviations from Ackerman & Malouf
(2015), the first of which is the selection of PrefC ŋ- instead of ∅-. Both ŋ- and ∅-
are in fact candidates implied by comparable classifier stems for various source
forms, with ∅- selected because it has an aggregate 0.73 probability among all
source forms, versus 0.27 for ŋ-. It is easy to imagine that this outcome might be
different, as in the observed innovation ŋeme, if there were some weighting in the
influence of source forms and comparable classifier stems. However the second
deviation from Ackerman & Malouf (2015) involves the introduction of PrefV e-,
and this is not even a candidate by analogy with comparable classifiers. Classifier
stems that do have PrefV e- are never selected as comparable, because none of the
ma ‘(34)’ source forms use this allomorph, as illustrated in Table 3.8. Rather,
the competing candidates are a- ~ u-. Notice, however, that 1.(34)., like
all (34). forms, has a StemVF [] alternation. It seems that rather than
arising from analogical prediction of PrefV allomorphy, the form ŋeme applies
vowel fronting beyond the morphological inner stem structure ma ~ me in which
the pattern is more generally established. On this view, the predicted form is
derived analogically from other forms, but the prediction of vowel fronting has
been inherited upwards into a morphological unit larger than the inner stem. Such
abrogation of the structural distinction between inner stem and prefix is perhaps
not surprising, given the widespread lack of phonological transparency in
Murrinhpatha classifier stems.
(7) 3.(28).
p-a-∅[]-ŋan
paŋan
Ackerman & Malouf (2015) simplified Observed
p-i-ɻa[:, :, :]-m p-i-ɾi[:, :, :]-m
piɻam piɾim
The case of (7) suggests more extensive breakdown of inner stem/affix structure in
the predictive mechanism. Here the observed deviations from the Ackerman &
Malouf (2015) simplification again include a consonant formative that is an
analogical candidate though not the aggregate strongest candidate, StemC []
instead of StemC [], which again could be accounted for in a system that
includes some weighting of candidates. The other deviation is in the vowel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
73
SG 1 ŋamam SG 1 ŋama me mi
ŋ-a-[]-m ŋ-a-[]-∅ ∅-u-[]-∅ ∅-u-[,
]-∅
2 nam 2 t̪ama ne ni
∅-a-[]- t̪- . . . ∅-u-[, ∅-u-[,
m ]-∅ ,]-∅
3 mam / 3 kama / pama me mi
kamam k- . . . / p . . . ∅-u-[]- ∅ ∅-u-[,
∅-a-[]- ]-∅
m / k- . . .
INCL 1+2 t a̪ mam INCL pama t̪ume t̪umi
t -̪ a-[]-m 1+2 p-a-[]-∅ t̪-u-[]-∅ t̪-u-[,
]-∅
PL/ 1 ŋamam PL 1 ŋujema ŋume ŋumi
DU ŋ-a-[]-m ŋ-uje-[]-∅ ŋ-u-[]-∅ ŋ-u-[,
]-∅
2 namam 2 nujema nume numi
n- . . . n- . . . n- . . . n- . . .
3 pamam / 3 kujema / pujema pume pumi
kamam k- . . . / p- . . . p- . . . p- . . .
p- . . . / k- . . .
75
relations in the classifier stem morphology have already long given way to lexically
specific, unpredictable allomorphy. But the changes nonetheless reflect incremental
steps on the path of demorphologization, undermining the morphological structure
of the classifier stem. Every time a paradigmatic cell in the system shifts from a more
predictable allomorph to a less predictable one, the formative structure of the system
is incrementally undermined. Processes of this type are probably responsible for
much of the integrative complexity in Murrinhpatha verbs—though pursuit of this
hypothesis would depend on more extensive historical reconstruction than is pres-
ently available (Green 2003).
3.7 Conclusions
Appendix
Illustrated below are the inflectional paradigms for classifiers discussed in this chapter.
The paradigms for (34) and (28) are illustrated in the body of the text.
77
DU 1 ŋa ŋuɳe ŋuje
ŋ-a-[]-∅ ŋ-u-[, ŋ-u-[,
]-ɳe ,
]-∅
2 na nuɳe nuje
n- . . . n- . . . n- . . .
3 ka / pa puɳe puje
k- . . . / p- . . . p- . . . p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Note: Street (1987) in addition lists a variant /nuɻa/ use.feet.3.. This variant does not appear in our
corpus data.
79
Acknowledgements
This research is funded by the Australian Research Council Centre of Excellence for
the Dynamics of Language (Project ID: CE140100041). We are greatly indebted to the
people of Wadeye, Australia, who have generously shared their knowledge of
Murrinhpatha with us. We also thank Peter Arkadiev and Francesco Gardani for
inviting us to present at the workshop which led to this volume, and for their
comments on our original submission. Bill Forshaw, Jeff Parker, and an anonymous
reviewer also provided insightful comments, as did audience members of the
‘Morphological Complexity’ workshop at Societas Linguistica Europaea (SLE), 2015.
We dedicate this chapter to the late Chester Street, whose detailed documentation
work revealed the extraordinary complexity of Murrinhpatha verbs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
4
Overabundance resulting from
language contact
Complex cell-mates in Gurindji Kriol
Felicity Meakins and Sasha Wilmoth
4.1 Introduction
One of the oft claimed results of language contact is the reduction of morphological
complexity. For example, syncretism, allomorphic simplification, the difficulty of
transferring morphemes, and increased paradigmatic regularity are all observed
outcomes of contact-induced change (e.g., McWhorter 1998; Myers-Scotton 2002;
Janse & Tol 2003; Gardani 2008). These processes reduce the expression of
morphological features, for example case, tense/aspect/mood (TAM), gender,
and number; and the complexity of relationships between cells in paradigms
expressing these features. In this sense, these changes represent an absolute
decrease in the number of morphosyntactic distinctions that a language makes
both in terms of the internal structure of words and their arrangement into
inflectional classes. This type of morphological complexity has been termed
‘complexity of exponence’ (Anderson 2015a: 20) or ‘E(numerative) complexity’
(Ackerman & Malouf 2013: 433; see also section 1.3.1 in the Introduction to this
volume). Such changes can be quantified as a measure of average paradigm
entropy, that is, the degree of uncertainty in predicting the content of a particular
cell in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013; Parker &
Sims, Chapter 2, this volume).
One area of complexity, which Anderson (2015a: 22) notes as having received
less attention in the morphological literature, is variation within the cells of a
paradigm, for example ‘dived’ and ‘dove’ which are different word forms of the
past tense form of {} in English. Thornton (2011) calls this type of complexity
‘overabundance’. Overabundance refers to multiple forms being realized within
the same cell in a paradigm, or lexemes with ‘cell-mates’, as Loporcaro quips
(see Loporcaro & Paciaroni 2011: 420 and Loporcaro, Chapter 6, this volume).
Thornton observes that variation between cell-mates may be subject to sociolin-
guistic and syntactic-semantic conditions.
Felicity Meakins and Sasha Wilmoth, Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol
In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Felicity Meakins and Sasha Wilmoth.
DOI: 10.1093/oso/9780198861287.003.0004
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
¹ In all examples, Gurindji elements are given in italics, Kriol in plain font and subjects are bolded.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Meakins (2009, 2015) shows that optional subject marking developed as a result
of contact between Gurindji and Kriol whereby the Gurindji ergative marker was
retained in the process of the formation of the mixed language, Gurindji Kriol,
but became optional and was later reanalysed as nominative marking when it
also came to mark intransitive subjects. In this respect, overabundance devel-
oped in the nominative cell of the case paradigm where an alternation now exists
between the forms -ngku/-tu and a zero morph (or nothing, depending on one’s
theoretical approach). Variation is driven by a number of semantic, syntactic,
and information structure features including transitivity and word order
(Meakins 2009; Meakins & O’Shannessy 2010). This optional case marking
system requires speakers of Gurindji Kriol to constantly monitor the clause
and its place in the discourse to make decisions about whether to overtly express
subject marking or not. Thus in this chapter, we make the case that over-
abundance in Gurindji Kriol is an example of a contact-induced change,
which involves the complexification of an inflectional paradigm rather than its
simplification.
In particular, we examine the further development of overabundance in subject
marking using new data from Gurindji children to determine whether the com-
plexity in the case paradigm has stabilized or whether complexification is on-
going. Changes in overabundance are quantified along two dimensions using
different quantitative methods: (i) the change between generations of Gurindji
speakers in the contribution of different predictors to the use of subject marking is
shown through GLMM (Marschner 2011); and (ii) generational differences in the
relative contribution of the different factors is demonstrated using dominance
analysis (Azen & Traxel 2009).
volume, and McWhorter, Chapter 10, this volume, for further discussions).²
Similarly, inflectional morphology is rarely borrowed or switched into the gram-
matical frame of another language (Myers-Scotton 2002; Aikhenvald & Dixon
2006; Matras & Sakel 2007; Gardani 2008).
Where inflectional morphology remains in situations of language contact,
different dimensions of complexity are affected. In particular what Anderson
(2015a: 20) terms the ‘complexity of exponence’ or Ackerman & Malouf (2013:
433) call ‘E(numerative) complexity’ often undergoes reduction. For example
syncretism, allomorphic simplification, and increased paradigmatic regularity
are all observed outcomes of contact-induced change and language obsolescence
(Dorian 1978; Gal 1989; Janse & Tol 2003). All of these processes reduce the
exponence of morphological features such as case, TAM, gender, and number, and
the complexity of relationships between cells within paradigms expressing these
features. At the extreme end, these features gather up their morphological skirts
and step out of paradigms and into periphrastic constructions, thereby transform-
ing from synthetic forms into analytic forms (see de Groot’s 2008 study of
Hungarian in contact for a recent example). Paradigmatic complexity can be
measured as ‘entropy’ which captures the degree of predictability of forms in a
paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013). Entropy has been
used to measure the relative complexity of different languages (see also Stump &
Finkel 2016 for related work), however it can also be used to measure changes in
complexity across time within the same language (see Mansfield and Nordlinger,
Chapter 3, this volume, for a case study of Murrinhpatha).
As Anderson (2015a: 22) has noted, a dimension of complexity which has
received less attention in the morphological literature is variation within the cells
of a paradigm, for example the ‘dived’ and ‘dove’ examples given in section 4.1—
and many more examples of co-existing regular and irregular past tense and plural
forms in English. Thornton (2011) calls the exponence of multiple forms in the
same cell in a paradigm ‘overabundance’. Overabundance (which can be thought
of as morphological ‘cell-mates’) is defined as ‘a cell in a paradigm . . . filled by two
or more synonymous forms which realize the same set of morpho-syntactic
properties’ (Thornton 2011: 2). She uses the Italian verb paradigm to demonstrate
how variation between forms is motivated by different phonological and
syntactic-semantic conditions.
Thornton’s examples of overabundance mostly involve cases of language
change and the regularization of inflectional paradigms. In this scenario, an
irregular form co-exists with a newer regularized form. Processes of regularization
are one source of variants. We argue that contact with another language provides
another source of variants. It is common for multiple forms from different
² Although, see a number of surveys (Plag 2003a, 2003b; Roberts & Bresnan 2008) and counter-
surveys (DeGraff 2005; Parkvall 2008; Bakker et al. 2011; Henri & Kihm 2015) in response to this claim.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
languages to co-exist with their use determined by other features in the clause. To
give another example from English, possession is expressed by the s-genitive
(<Saxon), the of-genitive (also Germanic, but its rise in usage is attributed to
contact with Norman French) and also by an innovative double genitive, such as
‘the book of John’s’. The choice of form largely depends on possessor animacy, the
weight of the possessive phrase, topicality, and definiteness (see Kreyer 2003; Abel
2006; O’Connor et al. 2013; Payne 2013 for overviews). The development of the
double genitive and the rise in usage of the of-genitive occurred during the
formation of Middle English as a result of contact with Norman French and
represents ‘overabundance’ and a complexification in the expression of
possession.
Variation in the expression of particular functions can be probabilistically
modelled. For example, O’Connor et al. (2013) use logistic regression to demon-
strate the factors which drive the differential use of the s-genitive versus the of-
genitive in English. GLMM provide a more appropriate procedure for measuring
the use versus non-use of a linguistic feature such as a type of possessor marking
against other features such as animacy and end-weight. The advantage of GLMM
over normal logistic regression is the use of random variables such as ‘speaker’,
which means the model is able to take into account differing degrees of contri-
bution of data to a corpus and the fact that speakers behave more like themselves
than other speakers (i.e., idiolectal variation) (Baayen 2008; Pinheiro & Bates
2000; Marschner 2011). GLMM analysis is commonly used in Probabilistic Syntax
studies to quantify grammatical variation (e.g., Bresnan 2007; Meakins &
O’Shannessy 2010; Bresnan & Ford 2013) and is increasingly replacing Varbul/
Rbrul in quantitative sociolinguistics.
Although GLMM analysis has not been previously used to measure complexity,
we suggest it provides a useful measure of complexity. In this case, what is being
measured is not E-complexity but rather I-complexity, that is, the predictability of
word forms based on other features. In the case of overabundance, the relevant
features go beyond the paradigm to other features of the clause or discourse. In
particular, we suggest the R² value is a useful metric for measuring both the overall
I-complexity of a form which is variably realized within a cell, and the relative
contribution of predictors to its use. In regression models, R² is a measure of how
well the independent variables predict the variable use of the dependent variable.
In the case of mixed models, two R² values can be calculated—conditional
R-squared (R²C) and marginal R-squared (R²M) (Nakagawa & Schielzeth 2013:
136). R²C calculates variance based on both fixed effects (dependent variables or
predictors) and random effects, and therefore takes account of all factors includ-
ing speaker variation, which are contributing to variation in the data set. The level
of I-complexity of overabundance or within-cell variation is measured by the
number of semantic, grammatical and information structure features (predictors)
required to reach a reportable R²C value or to increase an R²C (while not over-fitting
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
a model). The more variables a speaker needs to take into account when making
online decisions about how express variable linguistic features, the higher the
morphological complexity.
The R² value is also useful in determining the relative contribution of the
individual predictors to the variable use of a particular linguistic feature. Using
a method called dominance analysis, the R²M value is calculated for the individual
predictors as a proportional contribution to the overall R²M value of the mixed
model in order to determine which are the strongest predictors of the use of a
particular linguistic feature. Where the R²C value in GLMM analysis provides an
overall assessment of the complexity of overabundance, dominance analysis
provides a more nuanced picture of the individual contribution of linguistic
features to this complexity.
A similar measure, referred to as ‘factor weights’, is used in Varbul/Rbrul and
has been used to determine the influence of one language on another in situations
of language contact. For example, Meyerhoff (2009) uses factor weights to gauge
substrate influence of Tamambo on the variable expression of subject pronom-
inals in the variety of Bislama spoken on Malo Island, specifically whether null
subjects in Bislama are more likely if the referent was human and topical, as is the
case in Tamambo. Meyerhoff finds that, although the forms of subject pronouns
are different in Tamambo and Bislama, the relative effect of humanness and
topicality in Tamambo pronoun usage has been transferred in the development
of Bislama. In a similar vein, dominance analysis can be used to compare the
relative importance of variables between two languages in contact or across
generations which is the case study presented in this chapter.
In the following sections, we describe as an instance of abundance, optional
subject marking in Gurindji Kriol, and argue that the within-cell variation repre-
sents an instance of increased complexity in a situation of language contact.
³ Aboriginal communities in Australia are similar to many Indian reservations in the United States,
in that the majority of residents are Indigenous.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
ARAFURA SEA
TIMOR Nhulunbuy
Darwin
SEA
A R N H E M L A N D
Bulman
Barunga
Katherine
Ngukurr
Beswick
AY
AY
GULF OF
W
W
H
Jaminjung Mataranka
H
R HIG
IG
ROPE
H
CARPENTARIA
Timber Nungali
ST
Creek Alawa
U
Marra
A
RT
OR IA Ngaliwurru
V IC T Bulla Karrangpurru/
Ngarinyman Karranga Daly Waters Borroloola
Top Springs
Yarralin Binbinka
Amanbidji Dunmarra
AY
Pigeon W
Hole G
H
Kuwarrangu Warranganku Cape Crawford
Jingulu
I
H
Lajamanu Warlmanpa
HI
GH
WA
Y
Jaru Warumungu
A U S T R A L I A
B AR KL
Warlpiri Tennant
Y
H
IG
Creek H
W
AY
N O R T H E R N
Q U E E N S L A N D
W E S T E R N
T E R R I T O R Y
N
0 100 2 0 0 km
Area covered
Scale by this map
Figure 4.1. Traditional languages and Aboriginal communities of the Victoria River
District
Source: Meakins & Nordlinger (2014: xxxiii)
markers, such as -ngku ‘nominative (ex-ergative)’ in (2) and (3), largely derived
from Gurindji; and the verb phrase structure including the TAM auxiliaries, such
as bin ‘past’ in (2) and (3), mostly reflecting that of Kriol.⁴
Of relevance for this chapter is the alternation in the nominative cell of the
Gurindji Kriol case paradigm between zero and -ngku (and its consonant-final
allomorph -tu). This alternation is an example of what Thornton terms ‘over-
abundance’, that is, zero and -ngku/-tu can be analysed as cell-mates in the case
paradigm because the use and non-use of subject marking does not affect the
grammatical function of the stem. For example, in (2) and (3), the subject is
unmarked in the first clause and marked in the second clause, but both are
unambiguously subjects.
Overabundance in the expression of subject marking in Gurindji Kriol is the result
of language contact, and involves the complexification of an inflectional paradigm
rather than simplification. The combined story presented in Meakins (2009, 2015)
and Meakins & O’Shannessy (2010) argues that the subject marker originated in the
Gurindji ergative marker which was grammatically obligatory. During the forma-
tion of the mixed language, the ergative marker came into contact with Kriol,
which has a nominative system with argument differentiation performed by
word order and some pronoun forms rather than case marking. This contact had
two main effects on the argument marking system of the emergent mixed
language: (i) the Gurindji ergative marker became optional in Gurindji Kriol,
and acquired additional discourse-marking properties, specifically highlighting
the agentivity of subjects; and (ii) a change in case alignment occurred where the
ergative marker was extended to intransitive subjects thereby being reanalysed
as a nominative marker via a stage of optional ergativity.
⁴ The structure of Gurindji Kriol is described in detail elsewhere (Meakins 2011, 2013).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 4.2. Comparison of case systems and allomorphy across three generations
(syncretisms within generations bolded)
gurindji gurindji kriol gurindji kriol
generation 1 generation 2
ergative1 dative2 nominative3 dative4 relative
v-final -ngku, -lu, -ku -wu -ngku -wu, -yu -ngku
Notes: ¹(Meakins et al. 2013: 20–1); ²(Meakins et al. 2013: 22); ³(Meakins, 2011: 26); ⁴(Meakins, 2011: 23).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
1,917 transitive subjects produced by adult speakers of Gurindji Kriol. The use of
the subject marker was tested against ten sociolinguistic, grammatical, and seman-
tic variables. It was found that the ergative marker was significantly more likely to
appear if the agent was inanimate, positioned post-verbally and in conjunction
with a co-referential pronoun. In addition, the use of the ergative marker signifi-
cantly decreased when the verb was marked with continuous aspect and the event
denoted by the verb had not come to completion. These results were interpreted as
an indication that, although subject marking still had some argument disambigu-
ation functions, it had acquired discourse properties; specifically, its presence
highlighted the agentivity of a subject nominal. Similar analyses exist for other
languages with optional ergative marking such as Gooniyandi and Kuuk
Thaayorre (see McGregor 2010 for an overview). This analysis was later extended
to intransitive subjects in Meakins (2015). Whether or not this variability has
stabilized in the second generation of Gurindji Kriol speakers or has undergone
further shift is discussed in the following section.
This section builds on previous probabilistic observations about the variable use of
the subject (ex-ergative) marker in Gurindji Kriol by remodelling the original
dataset and augmenting it with data from intransitive clauses (cf. Meakins 2015).
It then uses data from the second generation of Gurindji Kriol speakers to quantify
changes in the use of subject marking. Any difference between Gurindji adults and
children is assumed to represent a language change scenario (cf. the apparent-time
hypothesis; Labov 1963).
Overabundance is presented as a new dimension of morphological complexity,
particularly in situations of language contact and change. Two aspects of over-
abundance are modelled using different quantitative methods: (i) the contribution
of different factors to the use of subject marking across time is shown through
mixed models (Marschner 2011); and (ii) the relative contribution of the different
factors is demonstrated using dominance analysis (Azen & Traxel 2009). We
begin by examining the use of subject marking in the adult data and then compare
it with the children’s data.
4.4.1 Data
The data for this study are 3,575 instances of transitive and intransitive subjects
from fifty adult Gurindji Kriol speakers (18–35-year-olds) and 2,975 instances of
transitive and intransitive subjects from fifty-three child Gurindji Kriol speakers
(8–14-year-olds). The speakers represent around 20% of the Gurindji population
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
4.4.2 Procedure
Two GLMMs with a logistic link function (glmr; glm2 package in R)⁵ (Marschner
2011) were applied to the adult and child data in turn to see what affected the use
of subject marking in Gurindji Kriol across time. Separate dominance analyses
were then run on the different models to test the relative effects of the variables on
the use of subject marking (Azen & Traxel 2009).⁶, ⁷
⁵ http://cran.r-project.org/web/packages/glm2/glm2.pdf
⁶ Note that the adult and child data were not run in a single model. Separate models were required
for the dominance analysis to tease out the relative effects of the different dependent variables or
predictors in the adult data set versus the child data set.
⁷ No R package yet exists for dominance analysis although the preliminary functions have been
developed by Claudio Bustos and are available on https://github.com/clbustos/dominanceAnalysis In
this chapter, we performed the calculations manually through a series of R² calculations as discussed in
section 4.4.3.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
All tokens were coded for a dependent variable (use of a subject marking), for
example present in (5) and (7), and absent in (6). Each token was also coded for
six independent variables: (i) clause transitivity, for example transitive in (5) and
(6), and intransitive in (7); (ii) the relative order of the subject and verb, for
example VS in (5), and SV in (6) and (7); (iii) the animacy of the subject,
for example inanimate in (5), and animate in (6) and (7); (iv) the presence vs.
absence of a co-referential pronoun in the clause, for example present in (5) and
(7), and absent in (6); (v) the actualization of event, for example underway in (6),
not yet happened in (7), and completed in (5); and (vi) the marking vs. non-
marking of a previous subject.
Each dependent variable was categorical and coded binary, that is, Y/N, in/
animate, or SV/VS. Speaker was coded as a random effect (Figure 4.2).
• Fixed effects:
– Dependent:
• Presence of subject marking
o Y, N
– Independents:
• Transitive
o Y, N
• Relative order of subject and verb
o SV, VS
• Animacy of subject
o animate, inanimate
• Presence of coreferential pronoun
o Y, N
• Actualization of event
o Y, N
• Priming of subject marking
o Y, N
• Random effect:
– Speaker
Figure 4.2. Fixed and random effects used to measure the use vs. non-use of subject
marking in Gurindji Kriol
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
4.4.3 Results
4.4.3.1 Adults
The occurrence of subject marking for fifty adult speakers of Gurindji Kriol
in 3,575 clauses according to the dependent variables is given in Table 4.3.
The output of the GLMM adult data is given in Table 4.4. The significant
results are bolded. The analysis shows that subject marking occurs 39%
(n=1379) of the time,⁸ but is significantly more likely if the clause is transitive
⁸ Note that this figure is lower than 66.5% reported in Meakins (2009; based on 1,917 overt
transitive subjects across different genres) and 64% reported in Meakins & O’Shannessy (2010; based
on 612 overt transitive subjects in narratives). The reason for this difference is partly because
intransitive clauses have been included. But even the transitive clauses alone report a lower use of
ergative marking (57%) compared with the earlier studies. This is a result of the data extraction
procedures. In the first two studies, transitive subjects including unmarked subjects were extracted
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according to predictors
Transitive SV Order Animate Priming Actualized Corefer TOTAL
Table 4.4. Output of generalized linear mixed model analysis on 3,575 tokens
Random effects Name Variance Std. Dev.
(p<0.001), the verb occurs after the subject (p<0.001), the previous nominal is
also marked (p<0.001), a co-referential pronoun is present (p<0.001), and the
event is actualized (p<0.001). The model explains a good amount of variation in
the data set (R²c=0.41).⁹ Each of these significant variables will be discussed in
section 4.4.4. Note that animacy was not significant which differs from the
earlier studies by Meakins (2009) and Meakins & O’Shannessy (2010).
The GLMM model provides information about which variables significantly
predict the use of subject marking. Dominance analysis is required to determine
the relative effect of the significant predictors, that is, which variables contribute
the most to the use of ergative marking. Dominance analysis for logistic regression
was developed by Azen & Traxel (2009) and measures the relative contribution of
each predictor to the R²m value (=0.36 in this model), that is, the marginal R² value
which is the variance explained when the random effect is not included in the
numerator. R²m is used rather than R²c because we are only concerned with the
relative contribution of fixed effects to the model not the additional contribution
of random effects.
manually from an unannotated corpus whereas in the present study, transitive subjects were
extracted using a Python script across the same corpus which is now coded for the dependent
variables. It is highly likely that many unmarked transitive subjects were simply missed in the first
studies due to the manual nature of data extraction. This would have artificially inflated the
frequency of subject marker use.
⁹ Recall that conditional R² (rather than a marginal R²) calculates variance based on both fixed and
random effects and therefore takes account of all factors which are contributing to variation in the data
set (2009). R²c was calculated using the MuMIn package in R (Bartoń 2015).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The second column of Table 4.5 shows the R²m which were individually calcu-
lated using the Multi-Model Inference (MuMIn) package in R (Bartoń 2015). In
the second, third, fourth, and fifth group, the additive effect of each variable on the
R²m was calculated, that is, the figures in column 2 are the increasing R²m value of
the model as more variables are added to the model (X₁ → X₁ X₂ → X₁ X₂ X₃ . . . ).
The remaining columns show the additional contribution of the predictors to the
R²m value of the model. For example, SV order (X₂) contributes .037 to a model
which has only SV order and Priming as predictors ((X₂ X₃) - X₃). The k rows
show the average predictive value of the variables. For example, the average
predictive value of SV order is .201 + .037 + .100 + .000 + .085 divided by the
number of predictors, that is, 4 ((X₁ +X₃ +X₄ +X₅) /4). These values are discussed
in more detail in Azen & Traxel (2009: 327–9). The results of the dominance
analysis show the relative predictive power of the variables: Transitive (.166) >
Priming (.085) > Co-referential pronoun (.046) > SV order (.034) > Potential
(.025), that is, if the clause is transitive, this is the most powerful predictor of
subject marking. If a subject marker is used in a previous clause which has a full
subject, this is the next most powerful predictor of the use of subject marking in
the clause, etc.
4.4.3.2 Children
The occurrence of subject marking for fifty-three child speakers of Gurindji Kriol
in 2,975 clauses according to the dependent variables is given in Table 4.6. The
output of the GLMM child data is given in Table 4.7. The significant results
are bolded. The analysis shows that subject marking occurs 43% (n=1283) of
the time (which is higher than in the adults), and is more likely if the clause is
transitive (p<0.001), the previous nominal is also marked (p<0.001) and a co-
referential pronoun is present (p<0.001). The model explains a reasonable amount
of variation in the data set (R²c=0.28). Note that, like the adult data, animacy was
also not significant, but unlike the adult data, word order and event actualization
were also not significant. These differences with the adult data will be discussed in
section 4.4.4.
A dominance analysis was performed and included the non-significant vari-
ables of actualized and SV word order simply to draw parallels with the adult data.
The results, shown in Table 4.8, show the relative predictive power of the vari-
ables: Transitive (.049) > Priming (.041) > Co-referential pronoun (.013) >
Actualized (.008—non-significant) > SV order (.008—non-significant). Note
that the relative order of the significant variables is the same as for adults, that
is, Transitive > Priming > Co-referential pronoun.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 4.5. Relative effect of the significant predictors according to dominance analysis
Table 4.7. Output of generalized linear mixed model analysis on 2,975 tokens
Random effects Name Variance Std. Dev.
4.4.4 Discussion
The overall question posed by this chapter is whether a change in the complexity
in the expression of subject marking has occurred across two generations of
Gurindji Kriol speakers. This question is set against the backdrop of broader
theoretical questions about how to measure complexity in cases of overabundance,
and whether all language contact leads to simplification. The combination of these
broader questions allows us to determine whether changes have taken place in
subject marking in Gurindji Kriol, and why these changes might have occurred.
The question of whether there has been a change in complexity of subject
marking was modelled using GLMM analysis. The results show three predictors
in common for adults and children. Transitive subjects such as (8) are signifi-
cantly more likely to be marked than intransitive clauses such as (9). Whether
the nominal subject is marked also primes the appearance of the nominative in
the next occurrence of a nominal subject. An example is given in (10) of
sequential clauses containing nominal subjects with overt nominative marking.
Third, subject marking is more likely when a co-referential pronoun is present,
as shown in (11) in comparison with (12) which does not have a co-referential
pronoun.
Table 4.8. Relative effect of the significant predictors according to dominance analysis
Additional contribution of fixed effects
Subset
R2 M Transitive SV order Coreferential Priming Actualized
model
(X1) (X2) (X3) (X4) (X5)
k = 0 average .047 .000 .006 .048 .000
X1 .047 - .047 .041 .056 .047
X2 .000 .047 - .001 .001 .000
X3 .006 .041 .007 - .008 .006
X4 .048 .056 .046 .050 - .049
X5 .000 .047 .000 .000 .001 -
k = 1 average .048 .025 .023 .017 .026
X1X2 .047 - - .006 .057 .000
X1X3 .047 - .006 - .064 .005
X1X4 .104 - .000 .007 - .001
X1X5 .047 - .000 .005 .058 -
X2X3 .007 .046 - - .050 .051
X2X4 .049 .055 - .008 - .000
X2X5 .000 .047 - .058 .049 -
X3X4 .056 .055 .002 - - .000
X3X5 .006 .046 .052 - .050 -
X4X5 .049 .056 .000 .007 - -
k = 2 average .051 .010 .015 .055 .010
X1X2X3 .053 - - - .059 .017
X1X2X4 .104 - - .008 - .001
X1X2X5 .047 - - .023 .058 -
X1X3X4 .111 - .001 - - .001
X1X3X5 .052 - .018 - .060 -
X1X3X5 .105 - .000 .007 - -
X2X4X4 .057 .055 - - - .001
X2X3X5 .058 .012 - - .000 -
X2X4X5 .049 .056 - .009 - -
X3X4X5 .056 .056 .002 - - -
k = 3 average .045 .005 .012 .044 .005
X1X2X3X4 .112 - - - - .001
X1X2X3X5 .070 - - - .043 -
X1X2X4X5 .105 - - .008 - -
X1X3X4X5 .112 - .001 - - -
X2X3X4X5 .058 .055 - - - -
k = 4 average .055 .001 .008 .043 .001
X1X2X3X4 .113 - - - - -
X5
Overall average .049 .008 .013 .041 .008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Adults had two more significant variables than children which predicted subject
marking—word order, that is, Gurindji Kriol-speaking adults are more likely to
mark subjects when they occur after the verb, as shown in (13) as opposed to (14);
and event actualization, that is, events that weren’t actualized were less likely to be
marked, as demonstrated in (15) which has a verb marked continuative and (16)
which uses the potential auxiliary.
In terms of E-complexity, both the adult system and the child system display
overabundance, while traditional Gurindji uses subject marking obligatorily.
Nonetheless the adult Gurindji Kriol system requires attention to a greater
number of variables to make decisions about the application of subject marking.
Thus the subject marking system seems to have complexified, in the sense of I-
complexity, at the point of contact with the genesis of the mixed language
(represented in the adult speech), then simplified in the next generation. The
child system seems to be a refined version of the adult system. Of the three
variables in common, the relative predictive power of variables is the same:
transitivity > priming > use of co-referential pronoun. For two of those
predictors—priming and co-referential pronoun—subject-marking usage seems
stable across the generations. For adults, 60% of primed subjects are marked
compared with 28% of unprimed subjects; and children: 64% of primed subjects
compared with 31% of unprimed subjects. Similarly for adults, 48% of subjects
with co-referential pronouns are marked compared with 28% of subjects without
co-referential pronouns; and for the children: 49% of subjects with co-referential
pronouns compared with 28% of subjects without co-referential pronouns. Thus
the influence of priming and the use of co-referential pronoun seem quite stable
diachronically. On the other hand, transitivity, which is the strongest predictor of
subject marking for both adults and children, shows larger differences across the
generations—adults: 59% of transitive subjects compared with 16% of intransitive
subjects; and children: 56% of transitive subjects compared with 35% of intransi-
tive subjects.
We argue that differences in the importance of transitivity, coupled with the
loss of SV order as a predictor of subject marking in the children’s speech, are the
results of decreasing contact with Gurindji. First, the subject marking in Gurindji
Kriol finds its origins in the Gurindji ergative marker, which marked only
transitive subjects. Many members of the first generation of Gurindji Kriol
speakers only used subject marking for transitive subjects, although it was clearly
beginning to spread to intransitive subjects. For child speakers of Gurindji Kriol,
this pattern is much more entrenched, suggesting that the original influence of the
Gurindji ergative pattern is waning. Second, the loss of SV order as a significant
variable reinforces the argument that there is a decreasing contact with the
Gurindji system. In general, SV order is more dominant for child speakers (only
5% of transitive clauses show VS order compared with 12% of adult speakers),
reflecting the Kriol system of argument disambiguation. For adult speakers,
ergative marking is more likely in VS clauses, which reflects the continuing
interplay of the Gurindji and Kriol systems of argument disambiguation. This
influence has been lost in child speakers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
This study has shown that complexification occurred in the area of subject
marking in Gurindji Kriol in the intense contact period which saw its genesis.
Subject marking was borrowed from Gurindji where it transformed from obliga-
tory to variable marking, leading to a situation of overabundance, that is, a
proliferation of cell-mates (E-complexity). Overabundance required speakers to
monitor other linguistic features in the clause and discourse more broadly—
transitivity, SV order, the marking of the previous nominal subject, the presence
of a co-referential pronoun, and event actualization, rather than just the phono-
logical composition of the stem, as is the case in Gurindji (I-complexity). Another
generation on and only three of these variables are now relevant—transitivity, the
presence of a co-referential pronoun, and priming. We argue that changes in the
relative importance of transitivity and SV order in the children’s speech, and
therefore simplification in the exponence of overabundance, is the result of
decreasing contact with Gurindji.
This chapter demonstrates that language contact does not always lead to the
simplification of morphology, and in the case of overabundance, complexity,
that is, the degree of variation in the expression of a form within the cell of a
paradigm, can be a result of language contact. In the situation outlined by this
chapter, the intense contact between Gurindji and Kriol argument marking
systems which led to the formation of Gurindji Kriol also saw the development
of a system of subject marking which was derived from Gurindji but was more
complex than the obligatory marking system of Gurindji. The new generation of
Gurindji Kriol has less access to Gurindji, that is, there are fewer speakers
of Gurindji in their linguistic environment and they have had fewer years of
exposure to Gurindji than the adult speakers. The result has been a simplifica-
tion of overabundance where the system is no longer an interplay between the
Gurindji and Kriol systems of argument disambiguation (i.e., SV order no
longer predicts subject marking), and there is an increase in the marking of
intransitive subjects, which is far removed from the function of the original
Gurindji ergative marker.
Acknowledgements
The data collection (see section 4.4.1) was funded by the Aboriginal Child Language
(ACLA) project from 2004 to 2007, the Jaminjungan and Eastern Ngumpin DoBeS
project from 2007 to 2008 (available in the DoBeS archives—http://dobes.mpi.nl/
projects/jaminjung/), a Hans Rausing Endangered Languages Project from 2008 to
2010 (IPF0134; available in the ELAP archive—http://elar.soas.ac.uk/deposit/0273), an
Australian Research Council APD project from 2009 to 2012 (DP0985024); and an
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
5
Derivation and the morphological
complexity of three French-based creoles
Fabiola Henri, Gregory Stump, and Delphine Tribout
5.1 Introduction
The claim of creole simplicity is pervasive in linguistics. This claim harks back to
the nineteenth-century view that linguistic complexity correlates with the prop-
erties of a language’s inflectional morphology and with its age (DeGraff 2001).
According to this view, isolating languages are ‘primitive’ in comparison with
synthetic languages, whose morphology is taken as evidence of heightened com-
plexity. Modern creolistic literature abounds with such assumptions. Creoles are
seen as newborn languages that emerge from rudimentary pidgins embodying a
break in the transmission of the lexifier. As such, they constitute a kind of
transition between primitive pidgin ‘protolanguages’ and mature languages
(Bickerton 1981). Complementing this view of creoles as ‘young’ languages are
comparisons with ‘complex’ languages that purportedly reveal creoles to be ‘the
world’s simplest grammars’ on the grounds that they exhibit no, or at most, insig-
nificant vestiges of the lexifier’s system of inflectional marking (Seuren & Wekker
1986; Bickerton 1988; McWhorter 2001; Parkvall 2008; Bakker 2014; among others).
As has been argued elsewhere (DeGraff 2001; Mufwene 2008; Blasi et al. 2017),
these assertions rest upon several controversial assumptions that may be ques-
tioned on empirical, theoretical, and sociohistorical grounds. In the domain of
morphology, for example, the received view that creoles are maximally isolating
has been decisively disconfirmed by unequivocal evidence of inflectional morph-
ology in many creoles (Kihm 1994; DeGraff 2001; Bakker 2003; Baptista 2003a,
2003b; Roberts & Bresnan 2008; among others). It is true that a creole may exhibit
less morphology than its lexifier,¹ but does this entail that it is less complex?
¹ Studies relating to the morphological complexity of creoles usually rely on comparisons with the
lexifiers rather than with the contributing substrates. A combination of factors has given rise to this
preference. First, the formation of a creole usually involves one contributing lexifier, but may involve
several substrates whose contributions to the creole’s formation are hard to evaluate in terms of
proportion. In the absence of adequate historical documentation, we cannot always attribute particular
contributions to particular substrate languages. Even so, we can definitely affirm that the substrates of
Fabiola Henri, Gregory Stump, and Delphine Tribout, Derivation and the morphological complexity of three French-based
creoles In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Fabiola Henri, Gregory Stump, and Delphine Tribout.
DOI: 10.1093/oso/9780198861287.003.0005
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Caribbean creoles differ from those of Indian Ocean creoles. Moreover, creolistics has a history of
Eurocentrism, which has favoured the comparison of creole grammars with the more familiar
grammars of their Indo-European lexifiers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
2013; Stump & Finkel 2013). But a language’s morphology may exhibit other
kinds of complexity as well.²
Here, we are concerned with the integrative complexity of a language’s morph-
ology as reflected by the interaction of a lexeme’s inventory of forms with its
participation in deverbal derivation. In general, a language’s derivational morph-
ology may exhibit complexity in two different dimensions. In order to distinguish
these, it is useful to distinguish not only between a derivational relation’s and
, but also between the relation’s and —the
specific stems of the base and derived lexemes whose morphology participates in
the formal expression of their derivational relation. Thus, the derivational relation
of the base lexeme to the derived lexeme is formally expressed by
means of the relation of the base stem thiev- to the derived stem thievish. Given
these distinctions, the first dimension of a derivational relation’s complexity is that
of the predictability of the base lexeme’s base stem; the second dimension is that of
a base stem’s restrictedness in the morphology of the base lexeme. Consider first
the dimension of base-stem predictability.
In discussing this dimension, we make the uncontroversial assumption
(Aronoff 1994; Stump 2001) that a lexeme L has a whose members
serve in the definition of both (i) the inflected word forms constituting L’s
inflectional paradigm; and (ii) the stem sets of lexemes derived from L. In general,
we assume that a lexeme’s stem set may include both free and bound stems. On
this assumption, the complexity of a particular derivational relation depends on
which member of the base lexeme’s stem set is its base stem in that relation. In the
simplest cases—those whose complexity is of degree 0—the base stem for a base
lexeme L in a particular derivational relation is the only member of L’s stem set.
From this endpoint of maximal simplicity, successively greater degrees of com-
plexity can be calibrated. In cases of derivation exhibiting complexity of degree 1
or 2, the base lexeme in a particular derivational relation possesses more than one
stem, only one of which serves as its base stem in that relation. In cases exhibiting
complexity of degree 0 or 1, the base lexeme’s base stem is predictable; in cases
exhibiting complexity of degree 2, the base lexeme’s base stem is unpredictable.
Thus, instances of derivation may evince three degrees of increasing complexity,
as in Figure 5.1.
This first notion of complexity calls to mind those approaches to complexity
based on information theory (Arkadiev & Gardani, Chapter 1, this volume); in
such approaches, complexity arises from a lack of predictability among a system’s
parts. In assessing complexity of this sort in a system of inflection classes, the
parts at issue are an inflectional paradigm’s cells (cf. Parker & Sims, Chapter 2,
this volume); here, by contrast, the parts at issue are those members of a base
² See Stump (2017) for a discussion of the wide range of possible measures of morphological
complexity.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
↕
1
goose (~ geese) → goosish
Figure 5.1. Degrees of complexity in the predictability of a base lexeme’s base stem
in a particular derivational relation R
lexeme’s stem inventory available for the definition of a derived lexeme’s stem
inventory. By this criterion, the derivational relation between boy and boyish is
least complex, since the stem on which boy-ish is based is the only available choice,
the sole stem of boy; the derivational relation between man and mannish (or
between goose and goosish) is more complex, since the stem on which mann-ish
(or goos-ish) is based is not the only available choice, though it does conform to a
general pattern favouring the use of the singular form’s stem; and the relation
between thief and thievish is most complex, since the stem on which thiev-ish is
based is not the only available choice and actually fails to conform to the general
pattern favouring the use of the singular form’s stem.
The second dimension of a derivational relation’s integrative complexity is that
of base-stem restrictedness. Where X is the particular member of a lexeme L’s
stem set that serves as L’s base stem in a particular derivational relation, how
restricted a role does X play in the morphology of L? In the simplest cases (e.g.,
that of English grass ! grassy), X is L’s only stem and therefore has an unre-
stricted role in the morphology of L. In more complex cases (e.g., that of English
leaf [~ leave(s)] ! leafy), a base lexeme L’s base stem in a particular derivational
relation is only used in the realization of certain cells in L’s inflectional paradigm,
so that its role in L’s inflectional morphology is restricted according to the
morphosyntactic property set to be realized. In the most complex cases (e.g.,
that of English louse /laʊs/ ! lousy /laʊzi/), a base lexeme L’s base stem is ‘hidden’
to the extent that it has no role at all in the inflection of L but is reserved for
defining the stems of some or all lexemes deriving from L. This second dimension
of complexity is schematized in Figure 5.2, where we again distinguish three
degrees of complexity.
This second notion of complexity is qualitative in the sense that it equates
complexity with deviation from a canonical ideal (cf. Nichols, Chapter 7, this
volume)—specifically, it equates complexity with deviation from a canonical
pattern in which the stem that defines a derived lexeme’s form also defines the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
base lexeme’s inflected forms. By this criterion, the derivational relation between
grass and grassy is least complex, since the stem on which grass-y is based is
employed in both inflected forms of grass; the derivational relation between leaf
and leafy is more complex, since the stem on which leaf-y is based is only
employed in one of the inflected forms of leaf; and the relation between louse
and lousy is most complex, since the stem on which lous-y is based isn’t employed
in either of the inflected forms of louse.
According to Seuren (1998: 292–3), ‘if a language has a Creole origin it is SVO, has
TMA particles, [and] has virtually no morphology’. Claims of this kind reflect an
ideology about creoles that finds its origin in the eighteenth century, when creoles
were described as ‘corrupt’ and ‘deficient’ compared to exemplary grammars such
as that of Latin. These deficiencies were presumed to result from the inability of
Africans to acquire the grammatical intricacies of European languages (Bertrand-
Bocandé 1849; Baissac 1880; see also Meijer & Muysken 1977 for discussion).
With the advent of generative grammar, Bickerton (1981) formulated the
Language Bioprogram Hypothesis, a theory that sees the process of creolization
as the complexification of a pidgin that creole children are exposed to. A pidgin,
according to Bickerton, is an unstable form of communication that results from a
simplification of the lexifier language by adults during the process of second-
language acquisition. The contact languages emerging from this sort of process
come closest to revealing Universal Grammar in its naked form, embodying ‘the
world’s simplest grammars’ (McWhorter 2001).³
³ Although McWhorter’s (2001) claim is about creoles, both pidgins and creoles are generally
characterized as simple languages (Romaine 1988). Bickerton’s (1988) hypothesis, however, ranks
pidgins as the simpler of the two, since pidgins are not systematic. On his view, it is as an effect of
UG that a pidgin is creolized. Research has cast doubt on this generalization. Rich inflection can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
French-based creoles spoken in the Indian Ocean all exhibit contextual inflection
(see section 5.6.1.1 on Mauritian). As for the question of allomorphy, the
approach we adopt in the next sections is that languages do not merely eliminate
allomorphy. What appears in a new system in terms of forms is heavily dictated by
frequency and the identification of paradigmatic patterns that will subsequently
serve to make new forms. Such a perspective doesn’t warrant the existence of a
prior pidgin. As Mufwene (2008) points out, a closer examination of the facts
shows that creoles do not evolve from pidgins but rather from the approximation
found in pidgins, even more so than in some creoles (Bakker 2003). If a creole develops through the
nativization of a pidgin, as the Language Bioprogram Hypothesis holds, we would expect the creole to
be more complex than the pidgin from which it develops.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
⁴ In French, the first conjugation constitutes the largest conjugation as well as the most regular and
productive.
⁵ Other research on the emergence of language also suggests that aside from the human genetic
endowment for language acquisition, human beings possess a mathematical or computational compo-
nent for language creation and complexification (Hauser et al. 2002; Fitch & Hauser 2004; Gervain &
Mehler 2010).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
simplification with regard to what has been identified as foreigner talk (Ferguson
1971). Foreigner talk refers to a simplified version of a language used by native
speakers when addressing non-natives; the omission of inflections is widespread
in these varieties (Hock & Joseph 1996). In any case, future creole speakers clearly
have no prior knowledge of the lexifier language before acquisition, begging the
question as to how they could have simplified it. These observations crucially
support the view that the input was already simplified.
The morphological complexity of creoles has generally been evaluated based on
comparisons with their lexifier languages using traditional views of morphology. It
must be said at the outset that the extent of a creole’s morphological complexity
cannot simply be equated with the extent to which it mirrors complex patterns in
the lexifier language; otherwise, as will be argued below, dimensions of complexity
in the creole that have no counterpart in the lexifier language may simply be
overlooked. This point is all the more crucial given that complexity can be
measured in more than one way.
Under a morpheme-based approach, a creole’s lexifier can be argued to be
morphologically complex because it distinguishes a large number of inflected
words, a large number of affixes, and, perhaps also, a large number of morpho-
logical processes. By these measures, the morphology of the creole under com-
parison appears much less complex.These measures, however, imply a particular
conception of what constitutes morphology. In the generative-transformational
tradition, it has been customary to see periphrasis as a syntactic construct; but
periphrasis has recently been argued to function as a kind of inflectional exponence
on a par with synthetic varieties of exponence (see Bonami 2015 and the references
cited therein). Under the assumption that not all morphology is synthetic morph-
ology, creole morphology takes on a higher degree of complexity, with larger arrays
of morphosyntactic properties, larger paradigms, and larger inventories of inflec-
tional exponents (Henri 2010; Kihm 2014; Henri & Kihm 2015).
Nevertheless, as we noted in section 5.2, the complexity of a system is not
simply enumerative; morphological complexity does not simply reduce to the
cardinality of its morphosyntactic properties, the size of its paradigms, or the
variety of its inflectional resources (Bonami et al. 2015). Even if creole inflectional
systems are smaller on average⁶ than those of their lexifiers, they exhibit a
comparable degree of integrative complexity. For example, Henri (2010) shows
that in Mauritian, the complementary environments in which a verb’s long and
short alternants appear cannot be characterized in morphological, syntactic, or
information-structural terms by complementary natural classes of properties
⁶ Verbs in both Mauritian and French exhibit alternating forms, but a Mauritian verb’s synthetic
paradigm is limited to two cells, neither of whose forms exhibits true affixation or any coherent
morphosyntactic content (Henri 2010); in French, by contrast, a verb’s synthetic paradigm exhibits
fifty-one cells, combinations of up to three inflectional affixes (e.g., i-r-i-ons ‘(we) would have gone’)
and arguably six morphosyntactic features (Bonami and Boyé 2003, 2007).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
(cf. section 5.6.1). Mismatches of this kind have been argued to be an indicator of
integrative complexity (see Stump 2017: 70–1 and the references cited there).
Mauritian is likewise more complex when it comes to interpredictability, that is,
the difficulty of predicting one form based on knowledge of another (Henri 2010;
Bonami & Henri 2010; Bonami et al. 2011). Luís (2014) also shows that Indo-
Portuguese creoles exhibit different types of form-meaning mismatches in their
inflectional system. Korlai, for example, presents both class-specific syncretism
and paradigmatic opacity that affect morphosyntactic transparency (Bonami et al.
2013; Luís 2014). Comparable mismatches are found in other Portuguese-based
creoles spoken in Africa (Kihm 2014).
Creoles are usually claimed to retain few if any of their lexifier’s inflectional
distinctions. In French-based creoles, this reduction has led to systems in which
each verb has at least a short form (SF) and a long form (LF); systems of this kind
are said to be characteristic of French-based creoles spoken in the Indian Ocean,
and in the Americas, of Louisiana Creole and Haitian. The formal distinction
between a verb’s SF and LF is claimed to be a syntactically-conditioned shape
alternation in Isle de France creoles—Seychellois, Rodriguais, Chagossian, and
Mauritian⁷—but not in Reunionese (Corne 1982; Seuren 1990; Syea 1992). Corne
(1982) argues for a typological difference between Reunionese and Isle de France
creoles on the basis of their verbal systems. Isle de France creoles’ verb alternations
are said to have been influenced by Bantu alternations while those of Reunionese
are reconciled with the assumption that it is merely a variety of French.⁸, ⁹
⁷ These languages are said to form varieties of the same creole, namely Mauritian, this for reasons
linked to colonization. Indeed, the Seychelles used to be part of British Mauritius together with
Rodrigues and the Chagos. Rodrigues remains a Mauritian dependence while the sovereignty of the
Chagos is still under dispute.
⁸ Depending on the verb, mesolectal varieties of Reunionese exhibit up to five inflected forms,
expressing distinctions of tense and aspect. For example the verb ‘eat’ has the three inflected forms
mâz, mâze, and mâzra, with the third one being restricted to negative future-tense contexts. Irregular
verbs like ‘come’ exhibit five inflected forms, for example viê, vne, viê(n)ra, vni, vnir, where the future
tense form viê(n)ra is again restricted to negative contexts and where there is a distinction between a past
participle form vne and an infinitive vnir (Corne 1982). Corne (1982) further notes that those forms are
unstable to the extent that the past tense, the past participle and the infinitive are interchangeable.
Wittmann & Fournier (1987) present a severe critique of Corne’s data and analysis, drawing attention to
a range of problems. They argue that his analysis is observationally inaccurate and theoretically
questionable (given, e.g., the disparate range of factors that must be assumed to condition the proposed
phonological rules; see also Henri 2010); that the analysis is not obviously informed by current thought
on the usual motivations for regular sound changes; that the analysis is not compatible with reasonable
assumptions about the uniformity of diachronic processes effecting language change; and that his
assumption that Mauritian and Reunionese have fundamentally different histories is highly question-
able.
⁹ Klingler (2003) and Rottet (1992) also assume that verb alternation in Louisiana Creole is
reminiscent of French, making Louisiana Creole a plausible variety of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Following Baker (1972), Corne argues that the LF/SF alternation affects 70% of the
Mauritian verb lexicon and that SFs are derived by truncation of the LF’s final
vowel under conditions that are syntactically and semantically determined.
Chaudenson (2003), Veenstra & Becker (2003), Veenstra (2009), and others
defend an alternative analysis according to which Mauritian inherits its long
and short forms from a French verb’s infinitive and third-person singular present
indicative forms (respectively) but without inheriting their corresponding func-
tions. This development, they argue, is based on universals at play during second-
language learning. Veenstra (2009: 110) further hypothesizes that the LF/SF
alternation is at first phonologically conditioned but that it gradually becomes
grammaticalized so that the appearance of a verb’s SF is conditioned by a
following complement. As discussed in section 5.6.1.1, the distribution of the
Mauritian alternation is much more complex than what Veenstra assumes (see
also Henri 2010). The function of the alternation seen in Mauritian—he says—
might reflect Bantu influence, since the conjoint and disjoint verb forms found
in Makhuwa and other Eastern Bantu languages exhibit similar functions. While
the hypothesis is plausible, it raises the question of the Bantu contribution in
Haitian, which shows an alternation associated with a more or less parallel
function. According to DeGraff (2001:75), the distinction in Haitian is subject
to prosodic or morphosyntactic constraints. Verb alternations are, according to
DeGraff (2001), manifestations of inflectional morphology, with a verb’s SF
arising from its LF by subtractive morphology in the context of a following
complement.
The evidence that we present below suggests that verb-stem alternations are
characteristic of all French-based creoles to a greater or lesser degree. While the
form of such alternations and the functions that they serve are innovated in each
individual creole, they are nevertheless relatable to the existence of comparable
though distinct alternations in the verb morphology of the lexifier. We advocate a
theory of creole genesis that includes unguided second-language acquisition as
one of the key components of creolization. In addition, we believe that there are a
number of additional factors that may influence the emergence of a creole; these
include frequency, salience, ease of perception, transparency, invariance, and
congruence (see also Corne 1982; Mufwene 2008).
As mentioned in section 5.2, the French verbal system is highly unpredictable and
therefore unlikely to remain unchanged in French-based creoles (Bonami et al.
2013). Standard written French distinguishes three conjugation classes of syn-
thetic paradigms consisting of a total of fifty-one cells expressing TAM, person,
number, and gender. The first conjugation is the productive class, into which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013)
/.2 dit
finise rɑ̃de kɥize puve
./3 dize
lave
rɑ̃dʁ kɥiʁ puvwaʁ diʁ
. fini rɑ̃dy py
kɥi di
. rɑ̃ pø
.3 lav pœv
finis rɑ̃d kɥiz diz
./3 pɥis
(2) a. Il va manger.
3 go.3 eat.
‘He will eat.’
b. Il mangera.
3 eat..3
‘He will eat.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Bonami et al. (2013) also note that French verb forms are often ambiguous with
respect to their inflection-class membership. For instance, /pɛɳe/ serves as the
.2 form for both the first-conjugation verb peigner ‘comb’ and the third-
conjugation verb peindre ‘paint’. Thus, certain differences in form may be widely
recurrent even if they don’t stem from a single inflection-class difference. If
creolization is at all sensitive to factors such as frequency, saliency, and perception,
we expect to find an LF/SF distinction in creole verbs as a reflection of the wide
recurrence of a comparable distinction in the lexifier (Bonami et al. 2013; see also
Corne 1999; DeGraff 2001).
Verb alternations are observable across the French-based creoles, though the
number of verbs exhibiting such alternations varies from one creole to another.
Verbs in Guadeloupean are customarily described as being invariable. For
example, Hazaël-Massieux (2002: 71) claims that Guadeloupean doesn’t show
any real inflection, and distinctions between two forms of the same lexeme, like
the distinction between fè /fɛ/ and fèt /fɛt/ ‘to do’, are French borrowings and are
purely exceptional. A similar type of description is provided by Ehrhart (1993:
158), who maintains that Tayo, a French-based creole spoken in New Caledonia,
behaves like American creoles (with the exception of Louisiana Creole) in having
only a few verbs with more than one form, such as mete /mete/ ~ met /met/ ‘to
put’, balaj /balaj/ ~ balaje /balaje/ ‘to sweep’, kouver /kuvɝ/ ~ kouvri /kuvʁi/ ‘to
cover’.
Granting the limited nature of verb alternations in these two creoles, we
nevertheless believe that even here, the role of such alternations in a creole’s
grammar cannot be ignored. When forms of a verb alternate, they exhibit sys-
tematic distributional differences. Moreover, the incidence of such alternations is
important as a feature shared by the French-based creoles; it constitutes a com-
mon aspect of their development from French, but also a significant dimension of
innovative divergence among the creoles themselves.
We claim that the verb alternations found in the French-based creoles were in
all cases shaped by but not necessarily inherited from their lexifier, pace
Chaudenson (2003), Veenstra & Becker (2003), and Veenstra (2009). Consider
the Mauritian verb forms shown in Table 5.2. The examples suggest that the
alternation stems from a single French form from which a second form is
independently innovated. The source form in French is very often the infinitive
but may instead be some other form. For example, Mauritian /kone/ ‘to know’,
though imported as a long form, stems not from the infinitive connaître but from
the . connai(t/s) (itself a ‘short form’ in French). For syncretic forms like
dwa ‘to owe’, there are two possibilities: either they are integrated as LFs (as in the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 5.3. Sample comparison of long and short forms in four French-based creoles
Reunionese Louisiana Guadeloupean Haitian Gloss
Creole
ale al ale alea ale ay ale al/ay ‘go’
vɛne vɛn vini vinb vini vin vini vin ‘come’
soɚti soɚrt sɔɾti sɔɾ sɔti sɔt sɔti sɔt ‘exit/go out’
save sav
konɛt kone kɔnɛ̃ kɔnɛ̃ kɔnɛt kɔnɛt kɔnɛ̃ kɔn ‘know’
Notes:
a
Louisiana Creole has a short form /al/ alternating with a longer form /ale/ meaning ‘to haul/pull’. The
suppletive French form /va/ 3. also appears in some French-based creoles as an irrealis
marker: va in Mauritian and Louisiana Creole. In Reunionese Creole a form /sava/, possibly
lexicalized from the agglutination of the demonstrative with the 3. form of the verb ,
is used in a number of impersonal constructions. Armand (2014) describes it as an auxiliary.
b
In addition to /vin/, both Mauritian and Louisiana Creole have the form /vjɛ̃/. But in both languages,
this is a late borrowing and the two forms are used interchangeably.
case of kone) and the syncretic SFs are derived from them or they enter the
paradigm as SFs from which the corresponding LFs are derived. Notice also
the case of Mauritian asiz ‘to sit’, whose French source is evidently the feminine
past participle assise, is imported as a Mauritian SF from which the corresponding
LF asize is then derived.
Together with Louisiana Creole, French-based creoles spoken in the Indian
Ocean show a more extensive pattern of alternation than New Caledonian creole,
Tayo and the creoles of the French West Indies. Table 5.3 illustrates alternations
from Reunionese, another French-based creole spoken in the Indian Ocean, and
Louisiana Creole, Guadeloupean and Haitian, all spoken in the Americas. In our
view, it is likely that verb alternations in these varieties started out as a sandhi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
alternation that was subsequently exapted to serve one or another function in each
individual creole.
While we focus here only on three French-based creoles, Mauritian,
Guadeloupean, and Haitian, we hypothesize that verb-form alternations in all
French-based creoles are unequivocally more complex than has previously been
acknowledged (see, e.g., Ehrhart 1993; Hazaël-Massieux 2002; Bernini-
Montbrand et al. 2013). As we show in the following section, this complexity is
revealed by the creoles’ processes of deverbal derivation.
In discussing deverbal derivation in these creoles, we will draw upon the
following useful distinctions:
abstracted away from the syntactic contexts in which it may appear; a lexeme
belongs to a lexical category, has semantic content, and is realized by one or more
word forms through which it participates in syntax. In inflectional languages, a
lexeme is usually associated with a collection of stems used to form the inflected
forms that can be inserted into sentences. For instance, the French verbal lexeme
‘to drink’ has a stem /byv/ upon which are built the inflected forms /byvɔ̃/
(buvons ‘we drink’), /byve/ (buvez ‘you () drink’), /byvɛ/ (buvais ‘I was drink-
ing’), etc., and a stem /bwa/ from which are formed the homophonous word forms
/bwa/ (bois ‘you () drink’, boit ‘s/he drinks’). Stems such as /byv/ and /bwa/ are
morphomic in the sense of Aronoff (1994): they participate in formal alternations
whose conditioning cannot be coherently characterized in semantic, morphosyn-
tactic, or phonological terms but must be seen as purely morphological in its
motivation.
For French verbs, Bonami & Boyé (2002, 2003) propose a stem space with
twelve slots; this is a kind of matrix within which each verb’s full inventory of
stems is uniformly specifiable. The stem slots are linked to one another by default
implicative rules, so that for a regular verb, there is a slot whose stem suffices to
determine the stems in all of the other slots in that verb’s stem space. An irregular
verb is a lexeme whose stem space includes at least one stem that overrides a
default implicative rule. Extending this idea, Bonami et al. (2009) show that a
thirteenth stem is needed to account for deverbal lexemes suffixed with the action
nominalizer -ion, the adjectivalizer -if, or the agent nominalizers -eur/-rice. Thus,
both rules of inflection and rules of derivation draw upon a lexeme’s stem space;
an individual stem may, however, be accessible to rules of only one type; for
instance, the thirteenth stem proposed by Bonami et al. (2009) is hidden to
inflection, being accessible only to rules of derivation, as in Table 5.4.
Table 5.4. Stem space of ‘to form’, ‘to finish’, and ́ ‘to defend’
# Stem use
At least five members of a French verb’s stem set are available as base stems in
instances of deverbal nominalization (Bonami et al. 2009; Tribout 2012). Deverbal
nouns in -age have stem 1 as their base stem (e.g., /netwaj/: ‘clean-
ing’); deverbal nouns in -ment generally have stem 2 as their base stem (/netwɑ/:
‘cleaning’, /ʒonis/: ‘yellowing’); and the base stem of
a deverbal noun arising by conversion may be stem 3 (/dɑ̃s/: ‘to dance’!
‘dance’), stem 12 (/ɑʁive/: ‘to arrive’ ! ́ ‘arrival’), or the
hidden stem 13 (/defɑ̃s/: ́ ‘to defend’ ! ́ ‘a defense’).
The selection of a deverbal derivative’s base stem is not uniquely determined by
phonological or grammatical criteria. For example, there are instances in which
more than one of a verb’s stems serves as a base for conversion, as in the case
of ‘to dive’, whose derivatives include ‘dishwashing’ (whose
stem /plɔ̃ʒ/ is stem 3 of ) and ́ ‘diving’ (whose stem /plɔ̃ʒe/ is
stem 12 of ). More importantly, base-stem selection has no correlation
with the semantics of the derived nominal: nominalizations expressing action,
result, agent, instrument, or location vary unpredictably with respect to which of
the base lexeme’s five possible stems serves as their base stem.
Given the dimensions of complexity discussed in section 5.2, we claim that
French derivational relations contribute substantially to the morphological com-
plexity of French. In particular:
5.6.1 Mauritian
Finally, both the short and the long form are used in lexeme-formation processes
such as reduplication (Henri 2010, 2012). A derived verb formed by reduplication
itself has both an SF and an LF; as the examples in Table 5.6 show, the derived
verb’s SF is a doubling of the base verb’s SF while its LF is the base verb’s SF
combined with its LF.
Heterogeneous distributional patterns such as those of a Mauritian verb’s short
and long forms can be characterized as morphomic (Henri forthcoming), a
property that has been argued to contribute to a system’s integrative complexity
(Aronoff 1994). As we now show, Mauritian derivations are as integratively
complex as those of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
5.6.2 Guadeloupean
When SFs are combined with the progressive marker ka, the interpretation is that of
what might be called a ‘progressive completive’, as in (8a); but the combination of
an LF with ka (as in (8b)) instead has a prospective reading, in which a multiplicity
of future events, potentially but not necessarily completed, is understood.
Similarly, SFs with the irrealis marker ké or the past tense marker té may have a
single event interpretation; the SF sav ‘know’ in (9a) has a single event interpret-
ation, and the SF mèt ‘put’ in (10a) may receive either a single event or multiple
events interpretation. By contrast, LFs combine with ké and té to express multiple
events, as in (9b) and (10b).
This contrast is of course not obvious in cases in which the long and short forms
are syncretized. The data in (11) exemplify syncretic verbs exhibiting meanings
that are ambiguous between the single-event and the multiple-event
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
A subclass of verbs shows different constraints: SFs of the verbs ́́ ‘to peep’, ́ ‘to
look’, and ́ ‘to put/give/leave’ are only used as imperatives, as in (12); these reflect
a more direct borrowing from French, with the exception of the form gay /gɛ/ (12b),
apparently a creole neologism. A comparable behaviour is seen with ̀, whose short
and long forms discriminate between the active and the passive/causative, as in (13).
Finally, the verb ‘to give’ features semantic contrasts but also sandhi effects.
With non-pronominal objects, we find both the form bay and ba combined with
the irrealis marker ké, with the former form encoding an irrealis single-event
¹⁰ The form kay probably derives from the contraction of the TAM marker ka with ay (from the
short form of the verb ‘to go’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
meaning (as in (14a)) and the latter encoding an irrealis multiple-event meaning
(as in (14b)). With pronominal noun phrases, the form ba precedes a vowel-initial
pronoun (14c) and ban, a nasal-initial pronoun (14d).
nouns; these operations are exemplified in Table 5.9, with additional examples in
(16)–(18).¹¹ Here, too, the derivational suffix joins with either a verb’s LF-stem or,
with elision, its LF.
¹¹ The suffixal derivatives in Table 5.9, in (15)–(17), and in (20) are cited from Villoing & Deglas
(2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Villoing & Deglas (2016) pursue the assumption that such derivations involve a
sandhi operation by which vowel hiatus is avoided through the prevocalic elision
of an LF’s final é. Additional evidence, however, reveals that at least some cases
cannot be attributed to prevocalic elision but must be seen as involving direct
suffixation to a verb’s LF-stem.
Consider, for example, the operation of -man suffixation, by which action
nouns such as those in (19) are derived.
As these examples show, deverbal nouns suffixed with -man also lack the final é of
the verb’s LF. Here, however, the absence of the final é cannot be attributed to
hiatus avoidance, since the suffix begins with a consonant. Moreover, nouns such
as , ́, and ́ have no counterparts in French
and so cannot simply be inheritances from the lexifier. The only explanation is that
they are productively formed in Guadeloupean through the direct suffixation of
-man to a verb’s LF-stem. Moreover, Occam’s Razor favours the assumption that all
of the operations in (15)–(19) involve direct suffixation to a verb’s LF-stem.
By maintaining a distinction between a verb’s SF and its LF-stem, we can arrive at
a straightforward account of deverbal nominalizations such as those in (20) as well
as denominal verb derivations such as those in (21). On one hand, the deverbal
nominalizations in (20) are conversions of a verb’s LF-stem to a noun; by contrast,
the derivations in (21) are conversions of a noun to a verb’s LF-stem, to which the
suffixal formative for a verb’s LF then attaches. This account contrasts with that of
Villoing & Deglas (2016), who regard the derivations in (20) and (21) as involving
processes of suffixation that induce elision rather than processes of conversion.
predicts that a particular verb may give rise to two derived nominal stems, one
based on the verb’s LF, the other on its LF-stem. This prediction is indeed borne
out: ́ ‘to win’ has two derived nominals, ́ ‘victory’ (whose stem is
the verb’s LF) and ‘win’ (whose stem is the verb’s LF-stem).
In summary, we assume that every verb has an LF-stem, even if it doesn’t
exhibit distinct long and short word forms; for those that do, the SF shares the
form of the LF-stem. Postulating an LF-stem for every verb offers a unified
analysis of both denominal verb derivation and deverbal nominalization (whether
by conversion or by the addition of a derivational suffix).
On this account, Guadeloupean derivation shows a degree of complexity equiva-
lent to those of French and Mauritian with respect to base-stem predictability. In
Guadeloupean, a verbal lexeme’s base stem is its LF in some cases and its LF-stem in
others; thus, base-stem predictability in the definition of deverbal nominalizations
exhibits complexity of degree 2. By contrast, it is not clear that Guadeloupean
deverbal nominalizations ever have a hidden form as their base stem; not even a
verb’s LF-stem can be claimed to be hidden in view of its use in the formation of a
present participle, an inflected form. Guadeloupean deverbal nominalizations there-
fore exhibit a base-stem restrictedness whose complexity is no higher than degree 1.
5.6.3 Haitian
Notice that the behaviour in (21b) is also attested in Guadeloupean with the verb
ale. The opposition fèt fè ‘to do/make’ also occurs in both creoles. In addition,
DeGraff (2001) claims that LFs are used for emphasis. He concludes that verb
alternations in Haitian are an instance of inflectional morphology whose realiza-
tion is determined by phonological phrasing and argumenthood.
Because very few verbs in Haitian exhibit an overt inflectional alternation between
long and short forms, there are few cases of derivation where one can readily
identify the choice of one alternant over the other. When cases of this sort do
occur (typically in conversions), they involve the LF in some instances and the SF
in others, as in Table 5.11.
Suffixal derivation of nouns from verbs often involves a vowel-initial suffix, as
in (24); the existence of a sandhi rule eliminating vowel hiatus by means of stem-
final vowel truncation might (as in Guadeloupean) be claimed to allow such
derivatives to be based on a verb’s LF. But as in Guadeloupean, the noun-forming
suffix -man does not create vowel hiatus; its appearance in post-consonantal
positions therefore cannot be attributed to elision, but must be seen as the effect
of direct suffixation to a verb’s LF-stem. In some cases (e.g., (25)), the resulting
nominalization has no counterpart in French, and so cannot be seen as a direct
inheritance from the lexifier. We must therefore assume that as in Guadeloupean,
a Haitian verb’s LF-stem sometimes participates directly in the workings of its
derivational morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
In a given row, each nominalization has that row’s verb form as its base stem.
*An asterisk marks a derived stem that is morphologically ambiguous, involving
either (a) a base stem that is a hidden LF-stem or (b) a base stem that is an LF whose
final vowel undergoes prevocalic elision.
VN compounds might seem to afford a parallel argument, since the verb in such
compounds often appears to be an LF-stem; for example, ‘break’, ‘break’,
¹² Nominalizations similar to kozman include for instance ajoutman ‘addition’, frapman ‘knocking’
and pledman ‘discussion, quarrel’, which are absent in contemporary French but found in Medieval
French. DeGraff (2003: 69) rightfully argues that these might have been inherited from regional
varieties spoken in the colonies in the seventeenth century.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
and ‘walk’ all seem to be represented by their LF-stems in the compounds
̀ ‘a destructive individual’ (Fr. -), ̀ ‘hard question’ (Fr. -
̂), and ’stoop, steps to a house’ (Fr. -); but these com-
pounds all apparently originate in French, and evidence of the productivity of
exocentric VN compounds is in general lacking in Haitian (Lefebvre 1998: 345).
A final parallel between Haitian and Guadeloupean pertains to denominal
verbs. Verbs are apparently derived from nouns by means of a suffix -e, which
sometimes produces verb forms having no counterpart in French. (The examples
in (26) illustrate.) But as in Guadeloupean, these can instead be seen as instances
of N!V conversion whose output is a verb’s LF-stem (in which case -e has the
role of an LF-forming verb suffix); here again, distinguishing a verb’s LF-stem
from its SF affords a more streamlined account of derivation.
It is clear that at least some Haitian verbs possess special hidden stems. Each of the
verbs in (27) has a special hidden stem used in derivation (e.g., with the nomin-
alizing suffix -man: vomis-man) but not in inflection. The productivity of this
pattern of alternation is attested to by the fact that it gives rise to derivatives
having no counterpart in French, as in (28).
¹³ Finissement and b^
atissement can be found in Medieval French, but not *remercissement.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
hidden stem for still others. Base-stem predictability in the definition of deverbal
nominalizations therefore attains complexity of degree 2. And given that the base
stem in some deverbal nominalizations is a special hidden stem, base-stem
restrictedness in the definition of these nominalizations likewise exhibits com-
plexity of degree 2.
5.7 Conclusion
In this chapter, we have presented criteria for assessing the integrative complexity
of a morphological system’s derivational relations, and we have applied these
criteria in an analysis of derivational relations in Mauritian, Guadeloupean, and
Haitian. We have demonstrated that each of these languages possesses deverbal
nominalizations that are not a mere inheritance from the lexifier language but
must be seen as the effect of a productive process within the creole itself.
Moreover, we have shown that the complexity of the derivational relations in
these creoles attains the same degree of complexity as those of the lexifier; our
results are summarized in Table 5.12.
When a verb L is the base lexeme in a derivational relation, the identity of L’s
base stem in L’s stem set is not, in general, predictable either in French or in
Mauritian, Guadeloupean, or Haitian; moreover, the status of L’s base stem in the
definition of L’s morphology may be as peripheral in Mauritian and Haitian as in
French. These results challenge the extreme simplicity that has so often been
attributed to creole morphology. We hypothesize that as further work is done on
the morphology of creole languages, other sorts of derivational processes will be
found to exhibit a comparable level of integrative complexity.
Acknowledgements
We would like to thank Jean-Michel Benjamin for his input on the Guadeloupean data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
6
Simplification and complexification in
Wolof noun morphology and morphosyntax
Michele Loporcaro
6.1 Introduction
In this chapter, I will describe how Wolof noun morphology has become simplified,
compared with the system that can be reconstructed for a previous stage through
comparison with other Atlantic languages (the subdivision of the Niger-Congo
family to which Wolof belongs). On the other hand, I will also show that, in some
respects, Wolof noun morphology and especially morphosyntax has become more
complex—more complex than in previous stages of the language and also more
complex than usually assumed in the literature—acquiring new irregularities.
The Wolof—and Atlantic—facts will be scrutinized against the background of
recent research on linguistic complexity. Since the study is about the grammatical
system and does not adduce any psycholinguistic evidence (from language usage
and/or processing), I will be addressing what the relevant literature (e.g., Dahl
2004: 39; Miestamo 2008: 27; Sinnemäki 2008: 72; Lindström 2008: 217) labels
‘absolute complexity’, not what is sometimes called ‘relative complexity’ (Kusters
2008: 4–8), that is, memory cost/difficulty (Hawkins 2007).
The chapter is organized as follows: in section 6.2, I introduce the language and
its classification; in section 6.3, I present the basics of the Wolof noun class system,
which is then placed in its Atlantic context in section 6.4.¹ In section 6.5, I will
briefly introduce the distinction between complexity and morphological richness—
as defined in the literature on morphological complexity I take as a point of
reference (in particular Baerman et al. 2010; 2015b; 2017; Dressler 2011)—and
how complexity and richness relate to morphological type, to then move on to
¹ While the data from other Atlantic languages are drawn from the available literature, for Wolof
available sources are complemented with first-hand data from the variety of Mbakke (Mbacke), lying
about 150 kilometres east of Ndakaaru/Dakar, in the territory of the traditional kingdom of Bawol
which is part of the Wolof heartland, the area on whose dialects the standard variety of Wolof is based.
These were collected in cooperation with Cheikh Anta Babou, to whom I am indebted, and are
presented in more detail in Babou & Loporcaro (2016). Glossing obeys the Leipzig glossing rules: in
addition, indicates class marker (without numbering for Wolof, since contrary to other Niger-
Congo languages mentioned in the chapter, there is no agreed-on numbering of noun classes in studies
on Wolof).
Michele Loporcaro, Simplification and complexification in Wolof noun morphology and morphosyntax In: The Complexities of
Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Michele Loporcaro.
DOI: 10.1093/oso/9780198861287.003.0006
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Wolof is the native language of four million (Lewis et al. 2015) to 4.5 million
(Leclerc 2015), and the main inter-ethnic lingua franca among the thirteen million
inhabitants of Senegal. It is also spoken in Gambia (about 226,000 speakers),
where it is the second most spoken language after Mandinka, Mali (62,000
speakers), Mauritania (around 16,400 speakers), and Guinea Bissau, as well as
in migrant communities in Europe (France, Italy, and Spain) and the USA (mainly
New York City).²
The evidence to establish change in Wolof is twofold: on the one hand, the
language has been described thoroughly since the early nineteenth century (cf.
Dard 1825, 1826, Boilat 1858, Kobès 1869, etc., with some news on relevant
aspects of its structure available since as early as the late sixteenth century: cf.
Doneux 1978: 45), so that changes leading to the present situation can be followed
through the extant documents and descriptions. Transcending this limited time-
depth requires reconstruction, and this poses problems since the classification of
Wolof within the Northern Atlantic branch of Niger-Congo is debated: the
traditional view considers Wolof most narrowly related to Fula, and places
Wolof/Fula, together with Seereer, in a Senegambian subdivision of Atlantic (cf.
Sapir 1971: 47f; followed by Wilson 1989: 87f; Childs 2004, 2010: 36, etc.), while
Doneux (1978: 43–5) and Segerer (2010: 4f) propose alternatively that the closest
relative to Wolof is the Ñuun (also: Bagnoun, Bainuk, Baïnounk) language/dialect
cluster (straddling Casamance, in Southern Senegal, the north of Guinea-Bissau,
and Gambia), and Pozdniakov (2015: 58) lists Fula/Seereer, Buy/Nyun, and Wolof
as three different branches of Northern Atlantic. Be that as it may, all the
² Occasionally, one comes across much lower figures in the literature: see, for example, Njie (1982:
16), reporting slightly more than one million speakers (‘le wolof se parle en Gambie et au Sénégal par
un peu plus d’un million de personnes’). Higher figures (e.g., the 7.5 million reported by Perrin 2012:
11) are given by authors not drawing the distinction between native/L1 and vehicular/L2 usage of
Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
6.3 Wolof noun classes: the basics and the received view
In the rich literature on Wolof, the language is invariably described as featuring ten
noun classes (henceforth abbreviated NCs), eight singular and two plural, marked on
determiners and other noun modifiers occurring adnominally as well as pronomin-
ally.³ A complete list of the usually assumed classes is given on the horizontal dimension
in (1), while (1a)–(1d) exemplify the larger list of class-marked function words:
Taking the proximal definite article, the following examples illustrate NC contrasts:
³ Cf., for example, Boilat (1858: 11ff); Rambaud (1898: 11); Delafosse (1927: 30f); Labouret (1935:
46); Gamble (1957: 134); Sauvageot (1965: 72–4); Stewart & Gage (1970: 392); Sapir (1971: 75); Irvine
(1978: 43); Thiam (1987: 9); Fal et al. (1990: 17); Mc Laughlin (1997: 2); Munro & Gaye (1997: ix);
Becher (2001: 42); Ndiaye (2004: 26); Camara (2006: 11); Diouf (2009: 153); Guérin (2011: 84); Tamba
et al. (2012: 895); Torrence (2013: 16); Pozdniakov & Robert (2015: 548). The notion ‘noun class’ is
used in different ways by different authors, within and beyond African language studies (see the
discussion in Babou & Loporcaro 2016: 4–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
i. xarit/jëkkër/ndongo/njëngtéef/soxna/far y-i
friends/husbands/disciples/sorcerers/ladies/lovers -.
‘the friends/husbands/disciples/sorcerers/lovers’
j. góor/nit ñ-i
man/person -.
‘the men/persons’
(3) gaa/gan/géer/gor/góor/jaam/jigéen/
people/guest/non-casted/free man/man/slave/woman/
mag/maggat/ndaw/nit ñi
adult/old person/youngster/person .-.
‘the people/guests/non-casted/free men/men/slaves/women/adults/old
people/youngsters/persons’
All the rest of the nouns take yi in the plural ((2i)). Likewise, in the singular the bi
class in (2a) accounts for the vast majority of nouns, and has been constantly
attracting new members, as schematized in (4) (based on Becher 2001: 42–52):
Its incidence has grown from less than 50% in nineteenth-century rural Wolof to
near generalization in the contemporary urban language. As a result, the agree-
ment pattern selected by most nouns in all varieties of Wolof is the one in (5)
(singular bi/plural yi):⁴
⁴ This is the default agreement class (consisting of the two default NCs for singular and plural), both
in lexical and in syntactic terms: lexically, loanwords are assigned bi/yi class membership (cf. Rambaud
1898: 22; Stewart & Gage 1970: 392; Guérin 2011: 83); syntactically, there are rules substituting yi for
other plural markers under certain conditions (cf. Babou & Loporcaro 2016: 16, 31f).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Thus, Wolof has moved far away from the pervasiveness of agreement typically
observed in Niger-Congo, including Atlantic languages. Compare the Fula examples
in (6), where the word for ‘king’ is class-marked itself and controls class-agreement
on adjectives and function words; or the Baïnounk examples in (7), with class-
agreeing demonstratives, adjectives, and numerals; or those from Diola-Fogny in (8),
with class-agreement also on the verb (again, class markers are boldfaced for clarity):
Since this pervasiveness of class marking on nouns and different agreement targets
is a property reconstructed for Niger-Congo, and for Atlantic, Wolof has lost it,
which boils down to loss of complexity, under the view that redundancy adds to
complexity, maintained by Dahl (2004: 10) among others:
I agree with Baerman et al. (2010: §1) that the size of a paradigm is not a
primary criterion of complexity; it is [ . . . ] a criterion of morphological richness
dependent on the importance of inflectional morphology in the morphology–
syntax interface. (Dressler 2011: 160)
Under this view, the morphology of an ideally agglutinating language is rich, not
complex. To mention just one crucial aspect, relevant for the present discussion,
such a language, lacking inflectional classes, lacks ‘the additional structure
imposed by inflectional morphology, above and beyond its dedicated task of
expressing syntactic and semantic distinctions’ (Baerman et al. 2010: 1).
As a final remark to this section, note that the use of notions such as ‘agglutin-
ating’ and ‘inflecting-fusional’ in morphological typology has been criticized, most
influentially by Haspelmath (2009), who analyses what he calls the ‘Agglutination
Hypothesis’ into three distinct indexes (the Cumulation, the Alternation, and the
Suppletion Index) and takes it to be falsified by the fact that, on the whole, the
languages in his sample score differently on the three. A language displaying one-to-
one correspondence between form and meaning in inflectional morphology scores
higher on the Cumulation Index than languages allowing for one-to-many corres-
pondences. The ‘Alternation Index’, on the other hand, assigns 0 to languages
‘which exhibit complete stem invariance’, and higher values to languages showing
more ‘stem alternations, that is, the (co-)expression of morphological categories by
changing, rather than adding to, the stem’ (Haspelmath 2009: 17). The ‘Suppletion
Index’, finally, is ‘defined as the average percentage of subcategories (per category-
system) that exhibit affix suppletion’ (Haspelmath 2009: 22).
Note that the only Niger-Congo language in the sample (Swahili) scores 0.1 on
the Cumulation Index, while a paramount instance of an agglutinating language
such as Turkish (Haspelmath 2009: 23) scores 0. Both Swahili and Turkish also
score 0 on the Alternation Index. On the Suppletion Index, on the other hand,
Turkish scores 23/100 and Swahili 28/100, which is far from 0 (Nivkh) but much
closer to it than to the score reached by a typically ‘inflecting-fusional’ language
like Latin (84/100).
Thus, despite the scepticism Haspelmath airs about the usefulness of the ‘agglu-
tinating’ vs. ‘inflecting-fusional’ distinction, his own data show that it is far from
odd to qualify languages such as Turkish or Swahili as consistently agglutinating, for
the purposes of the present study. More broadly, Haspelmath’s line of argument
seems to be at odds with the notion itself of a ‘type’, whose legitimacy cannot be
called into question by pointing to empirical objects which poorly fit the ideal
instantiation of it, however defined, given that ‘linguistic types’ are ‘ideal constructs
which natural languages approach to various degrees’ (Dressler 2005: 7).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Like Niger-Congo in general, Wolof too has agglutinating morphology, but this is
today the case only in the verb, since the noun has become almost completely
invariable, as reflected in the white dot (meaning ‘no distinct plural form’) on the
WALS map 33 on noun plurality (Dryer 2013), a fact remarked ever since the
earliest descriptions of Wolof.⁶ However, while such remarks and the WALS white
dot are accurate for the overwhelming majority of Wolof nouns, uninflectedness
has not yet triumphed completely. In fact, one of the rare lexemes still preserving
two distinct forms, that is, buur ‘king’, has already been displayed in (5). The same
is the case for about twenty nouns (listed in (9)), whose singular and plural differ
because of an alternation in the initial consonant:⁷
⁶ On noun invariability in Wolof, see the early remarks by Dard (1826: 14): ‘Mais si le nom n’est
pas suivi de la préposition ou, on ajoute après ce nom les articles ya, yi, you, sans jamais rien changer
dans son orthographe’ [‘But if the noun is not followed by the preposition ou, one adds after this
noun the articles ya, yi, you, withouth ever changing anything in its orthography’]. Similarly, Boilat
(1858: 7) points out: ‘En Wolof, les noms ne changent pas de terminaison dans les différentes
combinaisons que leur fait éprouver le discours, pas même en passant du singulier au pluriel’ [‘In
Wolof, nouns do not change ending in the different combinations in which discourse places them,
not even when they change from singular to plural’]. Thus, ‘le substantif est invariable’ [‘the noun is
invariable’] (Boilat 1858: 11).
⁷ The alternations—as described in Sauvageot (1965: 74); Diagne (1971: 79); Diouf (2009: 155);
Camara (2006: 7–8), etc.—may take different forms, illustrated in (9). The proximal form of the definite
article—already seen in (1)–(2)—is added after each word form, to indicate that the two occur in
distinct environments (thus glosses expand to ‘the x/the x’s right here’).
⁸ Camara (2006: 8) also reports pan/fan ‘day/days’, showing the same p-/f- consonant alternation as
in (9d). However, this paradigm is no longer attested in Mbakke Wolof, where the formerly plural form
fan has generalized and is used for singular as well: for example, benn fan jàll na ‘one day has passed’.
The lexeme fan is reported as invariable also in also Fal et al.’s (1990: 70) dictionary: fan wi ‘the day’/
ñaari fan ‘two days’. The older singular form pan still occurs only in the fixed expression weer-u benn
pan ‘the first day of the month’ (literally ‘crescent-. one day’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
For most lexemes, this difference today is only optional since—with the sole
exception of këf ‘thing’—the singular form may, and indeed tends to, be used in
plural contexts, while the reverse is not the case (see Guérin 2011: 85; Babou &
Loporcaro 2016: 10). Once uninflectedness is generalized, noun morphology will
have become simplified again, but as long as paradigms such as those in (9)
survive, they represent an increase in morphological complexity, determined by
changes which introduced morphological irregularity of the sort familiar from
inflecting-fusional languages: in other words, that in (9) is evidence for the
occurrence of (residual) inflectional classes in Wolof. Note also that free variation
in the plural cell of those noun lexemes determines overabundance (Thornton
2011; Meakins & Wilmoth, Chapter 4, this volume), that is, variation between two
cell-mates (Loporcaro & Paciaroni 2011: 420), thus contributing to a local increase
in complexity, if only ephemeral, on the way towards simplification.
The initial consonant alternations defining these inflectional classes are the last
remnants of two distinct but intertwined processes which are observed—with
varying degrees of regularity—in the neighbouring Atlantic languages, and spe-
cifically, in those to be considered as representative comparator languages from
the North Atlantic branch under either classification hypothesis for Wolof (see
section 6.2), that is, either Fula and Seereer or Ñuun. The two processes are one
morphological (NC-prefixation), the other morphonological (initial consonant
mutation). Integration of initial consonant mutation into the NC system is an
innovation that is currently reconstructed for Proto-Northern Atlantic (see
Pozdniakov 2015: 60), even if not preserved in all daughter languages: in Ñuun
languages, ‘the system is barely operative now, but can be partly reconstructed’
(Wilson 2007: 86), and the same is true of Wolof, as discussed in (18)–(19) below.
In Fula and Seereer, by contrast, the consonant mutation system itself and its
interaction with NCs are well-preserved.
As an illustration consider the word koor ‘man’ in Seereer-Siin (or Siin-
Gandum, the most conservative variety of Seereer in this respect, spoken in the
Sine region of Senegal; see Faye 2013: 3, 9). This nominal root may occur, with
distinctive morphology, in several of the sixteen NCs of the language (see Mc
Laughlin 2000: 336)—eleven of them displaying overt class prefixes, five lacking
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(11) Fula, Gombe, N. Nigeria (Arnott 1970: 87). Suffix grades, lexically selected
(invariable stems):
Grade A Grade B Grade C Grade D Class Gloss (grammatical)
ɓoy-re leemuu-re tummu-de loo-nde 9 ‘x’
ɓoy-e leemuu-je tummu-ɗe loo-ɗe 24 ‘x’s’
ɓoy-el leemu-yel tummu-gel loo-ŋgel 3 ‘small x’
ɓoy-um leemu-yum tummu-gum loo-ŋgum 5 ‘worthless little x’
ɓoy-on leemu-hon tummu-kon loo-kon 6 ‘small x’s’
ɓoy-a leemu-wa tummu-ga loo-ŋga 7 ‘big x’
ɓoy-o leemu-ho tummu-ko loo-ko 8 ‘big x’s’
‘baobab fruit’ ‘orange’ ‘calabash’ ‘storage pot’ Gloss (lexical)
ɓoy- leemu(u)- tummu- loo- Stem
⁹ For the Senegalese variety of Pulaar Mc Laughlin (1997: 7) describes twenty-one NCs, while
twenty-two are reported for the one described by Sylla (1982: 31) and twenty-five for the Gombe dialect
(Northern Nigeria) described by Arnott (1970: 75).
¹⁰ This ‘affix renewal’ occurs not only in North Atlantic, as also in ‘at least one language of South
Atlantic, Kisi, the normally prefixed NCMs [= noun class markers] are suffixed’ (Childs 2009: 117; see
Childs 1983 and the recent discussion by Di Garbo 2014: 80).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(though not perfectly, see Arnott 1970: 73) with the semantics, as shown in the
gloss column on the right-hand side: thus, for instance, class twenty-four hosts
word forms which are plural to class 9; class 3 is the corresponding diminutive
singular, which pluralizes in turn as class 6; class 5 is diminutive/pejorative;
and so on.
For suffixes, by contrast, the grade is lexically selected by the (lexical specifica-
tion of the) stem. The data in (11) exemplify invariable stems, where only class
suffixes vary according to the class-dependent consonant grade, while the noun
stem stays the same because its initial consonant is an invariable one, not involved
in consonant mutations, observed here only on suffixes. Thus, for instance, in
class 9 the forms -re, -de, -nde, marking different grades, are related morphono-
logically via mutation with each other, and are selected by the individual noun
lexemes so that, for example, ‘baobab fruits’ cannot be *ɓoy-je/-ɗe (i.e., cannot
take plural class 24 suffixes of grades B–D) because of lexical specification.
The nouns in (12), by contrast, exemplify what Arnott (1970: 93) calls ‘variform’
stems (only some consonant alternations are displayed here, as selected by
grades A, C, and D; in other words, (12) displays an arbitrary selection, not only
of noun classes, but also of grades and consonant alternations; the reader is
referred to Arnott’s description for a full account of the intricacies of this
fascinating system):
(12) Fula, Gombe, N. Nigeria (Arnott 1970: 98). Consonant alternation in noun
stems of different grades:
Grade A Grade A Grade C Grade D Suffix grade (selected)
r/d/nd w/b/mb w/g/ŋg y/g/ŋg C- alternation on stem
Class Gloss (grammatical)
dim-o beer-o gor-ko gim-ɗo 1 ‘x’
rim-ɓe weer-ɓe wor-ɓe yim-ɓe 2 ‘x’s’
dim-el beer-el gor-gel gim-ŋgel 3 ‘small x’
dim-um beer-um gor-gum gim-ŋgum 5 ‘worthless little x’
ndim-on mbeer-on ŋgor-kon ŋgim-kon 6 ‘small x’s’
ndim-a mbeer-a ŋgor-ga ŋgim-ŋga 7 ‘big x’
ndim-o mbeer-o ŋgor-ko ŋgim-ko 8 ‘big x’s’
‘free man’ ‘host’ ‘man’ ‘person’ Gloss (lexical)
rim- weer- wor- yim Stem
For instance, the first two stems rim- ‘free man’ and weer- ‘host’ select the same
class suffixes (both grade A) but differ in the initial consonant, while the other
two, wor- ‘man’ and yim- ‘person’, select allomorphs of the class suffixes which
differ from each other, apart from some syncretisms (seen in classes 6 and 8).
Thus, for instance dim-o, gor-ko and gim-ɗo all display what is morphologically
the same class 1 suffix, but in different allomorphs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
In other words, what we have here is different inflectional classes, in spite of the
overall agglutinating character of Fula morphology. The Fula situation, as for the
selection of the forms of each NC suffix, is closer to that of an inflecting-fusional
language like Italian, with ICs, than to that of a strongly agglutinating language
like Turkish, without inflectional classes, as schematized in (13):
Turkish has no inflectional classes, since the alternants of each affix are selected
phonologically (e.g., ev/ev-ler ‘house/-s’ vs. yol/yol-lar ‘trip/-s’, with plural -ler/-lar
depending on the front/backness of the root vowel), while Italian has because can-
e/can-i ‘dog()-/-’ vs. lup-o/lup-i ‘wolf()-/-’) take different singular
endings, not derivable from each other phonologically ((13i)), due to lexical
specification ((13ii)).¹¹ In Fula too, ‘there seems no advantage in treating all
suffixes of each class as morphophonemic variants of a single class suffix’
(Arnott 1970: 68). In fact, while in some cases one observes, between different
suffix grades, alternations that could be accounted for through independently
valid morphonological rules of the language (e.g., the alternation between voiced
and voiced prenasalized stops between Grades C–D in Classes 3, 5, or 7), this
cannot be generalized, since, for example, in Class 1 -ko (Grade C) and -ɗo (Grade
D) are not related morphonologically. Thus, Fula differs in this respect from an
ideally agglutinative language such as Turkish and rather resembles Italian, where
inflections are selected depending on inflectional class (a lexeme-inherent purely
morphological property) and are not derived by morphonological rule from one
another. In sum, there is no alternative but to recognize the occurrence of
inflectional classes in Fula too, though this—as highlighted in Babou &
Loporcaro (2016: 44)—is a descriptive notion which is hardly used in the gram-
mars of Atlantic languages.
More generally, Atlantic languages offer interesting evidence for the rise of
inflectional classes within an agglutinating system.¹² This applies also to the Ñuun
¹¹ Here, an editorial comment asked: ‘why not analyse -o/-e as part of the stem truncated before
plural -i?’. This corresponds to Scalise’s (1983: 293–4) vowel deletion rule, and the alternative between
the two is indeed a handbook topic in Italian morphology: the reader is referred to Thornton (2005:
160), who shows that this readjustment rule becomes superfluous under a word and paradigm
approach to morphology.
¹² An anonymous reviewer comments that, with the present discussion, ‘The author seems to
suggest that inflectional classes of nouns are an innovation in the history of individual languages’.
Actually, one must recognize ICs for previous stages of Atlantic languages: as observed in n. 16, the
same mechanisms of consonant gradation responsible for IC-contrasts in Fula are currently assumed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
While (14a) mirrors the inherited Niger-Congo noun inflection, the rest is the
product of a series of innovations (e.g., the prefixes occurring in type (14c) nouns
‘do not occur as singular prefixes in the paired prefixed groups or if so then
only very rarely’; Cobbinah 2010: 186), which makes the recognition of different
inflectional classes, as schematized in (14), necessary, even if the combination
of morphs in noun word forms largely stayed agglutinative, rather than fusional,
in nature.
This evidence could be multiplied, another case in point being, for example,
Diallo’s (2010), (2014: 151–81) study of the adaptation of borrowed Mande nouns
leading to the creation of inflectional classes (not present in the native lexicon) in
Fuuta-Jaloo Pular, the Fula variety spoken in the Fuuta-Jaloo area in Guinea. This
shows that all over the area a trend towards the creation of allomorphy in nominal
paradigms (and new inflectional class distinctions) is observed.
for earlier stages of Wolof as well. However, this is orthogonal to the fact that new morphological
irregularities, defining (new types of) ICs, can be shown to have arisen, as is the case with the stem
alternations in (9), which define (residual) ICs (a) of a kind different from that reconstructed for earlier
stages of Atlantic, and (b) that are not usually recognized in the literature, before Babou & Loporcaro
(2016).
¹³ Pozdniakov (2015: 79–82) reviews pluralizing suffixes (-Vn/ŋ) from different Atlantic languages
suggesting that they may be etymologically related with the plural class marker for humans reflected in
Wolof as ñ-.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
As seen in section 6.5.3, thus, Wolof is not the only Atlantic language to have
developed morphological irregularities of the kind found in fusional languages.
Since such irregularities add to morphological complexity (section 6.5.1), one
must recognize that even the morphological system of Wolof, much less rich
than those seen in section 6.5.3, has developed new forms of complexity.
Recapitulating so far, the marking of NC contrasts in the North Atlantic
languages considered above can be summarized as follows (after Mc Laughlin
1997: 7, with one small modification):¹⁴
(15) Class markers in some North Atlantic languages (Mc Laughlin 1997: 7, revised)
a. Seereer-Siin √ √ √
b. Fula √ √ √
c. Wolof (traces) (traces) √
As seen for Fula in (11)–(12), in this language consonant mutations and suffixation
(which replaced prefixation in the affix renewal process: see n. 10) are involved in
lexically conditioned allomorphy defining inflectional classes. Some remnants of this
situation persist in Wolof ((15c)), though this has neither class prefixes nor class-
marked clitics nor suffixes but, in its present state, marks NC only on determiners.
These remnants are the singular/plural alternations in (9), which concerned many
more lexemes in the nineteenth century, as shown in (16), listing lexemes which now
have lost consonant alternation but still had it according to nineteenth-century sources:
(16) Becher (2001: 50f): nouns with allomorphy in Boilat (1858) and Kobès (1875)
Gloss / today (Fal et al. 1990)
banta bi wanta yi ‘stock’ bant bi/yi ‘bit of wood’
badoolo mi wadoolo yi ‘peasant’ baadolo bi/yi
bakan bi wakan yi ‘nose’ bakkan bi/yi
bopa bi gopa yi ‘head’ bopp bi/yi
garab gi yarab yi ‘tree’ garab gi/yi
Further language-internal evidence comes from the indefinite article, which is the
only noun determiner to occur categorically in pre-nominal position (while
¹⁴ The modification consists in indicating the occurrence of traces of earlier prefixes for Wolof: see
(9) as well as the diachronic data in (16)–(17). In particular, I am non-committal about Mc Laughlin’s
distinction between ‘clitic determiners’ and ‘independent determiners’, a distinction one anonymous
reviewer finds fault with: ‘I have serious doubts about the validity of the distinction between “clitic
determiners” and “independent determiners”.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
definite article and demonstratives normally follow the noun, though demonstratives
can also be preposed), and the only one to display the class marker after, rather than
before, its class-invariable part. According to Doneux (1975: 49), this doubly excep-
tional distribution arose via reanalysis of earlier prefixes, reconstructed as seen in (17a):
(17) Doneux (1975: 49): Wolof prenominal article < former class prefix on
noun
a. a-b sëriñ ‘a healer’ < *a-b-sëriñ
b. sëriñ b-i ‘the healer’ < *bi-sëriñ b-i (bixirim, AD 1594; Ferronha
1994: 24f)
Converging documentary evidence for earlier prefixes, seen in (17b), comes from
a Portuguese voyager, who—writing in 1594—calls bixirim what is today sëriñ b-i
‘the healer’, which is evidence, as Doneux comments, ‘qu’un préfixe (probable-
ment figé) était encore utilisé à cette époque’ (Doneux 1975: 45). While in this
lexeme, like in most Wolof nouns, the prefix has been simply dropped, one may
argue that some of today’s irregular singular/plural alternations in Wolof (seen
above in (9)) show the traces of former class prefixes, which have become fused with
the stem, as observed also in other Atlantic languages.¹⁵ Among those irregular
alternations, some others come instead from consonant mutations, which are
regularly involved in NC inflection in other Atlantic languages (cf. (15a–b) and
the examples above in (10)–(12)). In Wolof, consonant mutation is still regular in
some derivational processes, such as diminutive or deverbal noun formation:
¹⁵ This has been remarked by many scholars: cf. Pozdniakov & Robert (2015: 551) for a recent
recapitulation. As for other Atlantic languages, see, for example, Cobbinah (2010: 189) on the so-called
‘literal alliterative concord’ in Baïnunk: ‘the disputed elements [ . . . ] are archaic noun class morphemes
in different stages of fusion with the stem’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Concluding section 6.4, I mentioned changes which led to the rise of morphological
irregularity also in the paradigm of agreement targets: in fact, in the indefinite
article, some defective and otherwise irregular paradigms have been created in
Wolof, which are not inherited from Proto-Atlantic. This boils down to an increase
in formulaic complexity (descriptive and generative), in Rescher’s (1998: 9) terms.
To see this, however, we have to abandon morphology proper and consider
morphosyntax, since agreement is a crucial criterion to establish the irregular
paradigms I will be concerned with. The agreement facts at stake crucially involve
the recognition (as in Babou & Loporcaro 2016) of two additional NCs in the plural
(boldfaced in (20b)) with respect to the current view ((1), repeated here in (20a)):
(20) a. Wolof: eight singular and two plural classes (traditional analysis):
NC marker b- g- k- j- l- m- s- w- y- ñ-
b. Wolof: eight singular and four plural classes (Babou & Loporcaro 2016):
NC marker b- g- k- j- l- m- s- w- y- ñ- j- s-
¹⁶ See Pozdniakov (1993: 85) and Pozdniakov & Robert (2015: 552f) for a reconstruction of the set of
initial consonant mutations—richer than the one still observed today in (19)—involved in NC-related
alternations in an earlier stage of Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
singular classes combine with both the traditionally recognized plurals, thus
resulting in (21b) rather than (21a)), while (21c) schematizes Babou &
Loporcaro’s (2016) account:¹⁷
(21)
(a) (b)
Expected pairings Observed pairings
Singular Plural Singular Plural
k- k-
g- g-
ñ- ñ-
j- j-
m- m-
s- s-
l- l-
y- y-
b- b-
w- w-
(c)
Observed pairings
Singular Plural
k- ñ-
g-
j-
l-
m-
s-
w-
b- y-
j-
s-
¹⁷ Singular/plural pairings of NCs define distinct genders: cf. Corbett’s (1991: 190f) analysis of
Wolof and Fula.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Note preliminarily that several of the pairings in (21), as well as several of the NCs
themselves, are established based on small amounts of lexemes. This ‘inquorate’
character (in Corbett’s 1991: 170–5 terms) is however a normal situation in
Atlantic languages, as remarked by Ferry & Pozdniakov (2001: 166):
[It is wrong to think that each weakly represented NC pairing reflects the fixation or
the disappearing of prefixes that once existed. Atlantic languages are characterized
by a particular feature: in these languages, one often comes across a special class
featuring no more than two or three nouns, or even just one. [ . . . ] Each Atlantic
language displays at least one word that has a statistically rare, irregular agreement
pattern, which translates a selected and specific notion in that very culture.]
Thus, if a consistent syntactic behaviour, distinct from that of other NCs, can be
identified for a set of nouns, however small, this must count as evidence to
establish a separate NC. This is what Babou & Loporcaro (2016) did for two
additional NCs, the plural classes ji and si. These are homophonous with two
singular classes, but must be kept distinct from them because they differ in the
agreements they trigger. This is a principle of method that holds in general and is
standardly applied also in studies of the Atlantic languages. For example, consider
Arnott’s (1970: 72) account of the two homophonous ko classes of Gombe Fula
(classes 20 and 8), one singular, one plural, distinguished by agreement:
There are two ko classes (8 and 20), with agreement marked by -o, -ho, -ko, ko-,
ko elements, etc.; but they are distinguished (i) by the different category of initial
consonant in full nominals (F-category in class 20, N-category in class 8 [ . . . ]),
and (ii) by the different pattern of agreement with verbal radicals [ . . . ], class 20
being a singular class requiring F- or P-category initial in the verbal radical, while
class 8 is a plural class requiring N-category initial in the radical, e.g.:
Exactly the same happens in Wolof, where what is indeed two couples of distinct
classes have been previously confused, disregarding the evidence from verb
agreement. This is in fact the only morphosyntactic diagnostic, independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
On the other hand, other nouns that select si, viz. those in (23b), take plural verb
agreement (while, of course, when used in the singular the same nouns take
another class marker):
The same can be repeated for plural ji (jeeg/janq ji ‘the women/little girls’, (24b)),
which is distinct from singular ji, seen in (2d) and exemplified again in (24c):
The fact that these are plurals has been overlooked in the literature on Wolof up to
now because traditionally plurals such as Séeréer si and jeeg ji have been called
‘collective’, in the wake of Sauvageot’s (1965: 73) influential statement:
[To the singular/plural number contrast, one has to add that of collective. The
peculiarities of the latter are: a) it does not possess a dedicated expression
distinguishing it from the singular, b) it has no corresponding plural.]
There are indeed other African languages—also within the Atlantic family—for
which it is justified to assume a separate value of the category ‘number’, which is
called traditionally ‘collective’ (cf., e.g., Sapir 1965: 61, 64, on Diola-Fogny), or
‘collective plural’:
In addition to the first plural, used with countable nouns, many nouns can
combine with a second plural, which is a collective plural for non-countable
quantities, or non-specified numbers of entities (Cobbinah 2010: 184)
The author, describing Baïnounk Gubaher, refers to triplets such as the following:
¹⁸ To illustrate, Corbett (2000: 31) cites the paradigm bu-sumɔl ‘snake’ singular ≠ i-sumɔl ‘snakes’
plural ≠ ba-sumɔl ‘snakes’ greater plural.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The indefinite article, as shown above in (17a), is the only determiner in which the
class marker follows the class-invariable part, thus becoming the final consonant.
As exemplified in (26), and schematized in (27a), in the regular case there is
a correspondence between this final consonant and the initial one occurring as a
class marker in other determiners. In addition, however, as illustrated in (27b–c),
there are two irregular patterns:
Paradigm (27b) shows a deviation from the regular formation by which the
class-marking consonant yields to y- in the indefinite plural, while in (27c) the
indefinite article paradigm is defective, lacking the singular form. In the available
literature, the occurrence of ay instead of expected a-C₁ is usually recognized for
ñi plurals, seen in (3) above:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
In addition to the pairings involving ñi plurals, however, the list in (27b) also
includes the two ‘new’ plural NCs in (20b). In fact, as illustrated in (29b)–(30b),
plural ji and si both select the default class marker -y in the indefinite article, on a
par with ñi, while indefinite plural *aj and *as do not occur:
In fact, it is not possible at all to form the indefinite article from this class.
In order to convey the same meaning, one has to have recourse to suppletion
and use instead the (regularly class-marked) form of the numeral C-enn ‘one’,
as shown in (32a). This defectiveness also concerns the li class, or the li/yi and
li/ñi pairings listed in (27c), as exemplified in (32b) by ndab and ndaw,
respectively:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The scheme in (33) recapitulates the different kinds of irregularity found in the
paradigm of the indefinite article ((33d)), compared with two regular function words,
highlighting (in boldface) the differences between singular and plural ji and si:¹⁹
To conclude, not only change in noun inflection but also change in agreement
target morphology has created new irregularities in Wolof, which add to com-
plexity in a way that had largely gone unnoticed under the traditional—but,
arguably, incorrect—view of Wolof NCs in (20a). This ‘local complexification’,
which yields a more realistic view of Wolof morphology and morphosyntax, can
be viewed as an ‘accident’ along a path in which the overall tendency is, for noun
morphology, from agglutinating towards isolating: not only are the inherited
prefixed NC markers long gone, but also the inflectional irregularities (stem
alternations) seen in (9), partly arisen from them, are on their way to disappear-
ing.²⁰ In other areas of inflectional morphology, while the verb maintains its
agglutinating structure, pronominal and adnominal agreement targets either
stay agglutinative (cf., e.g., (33b–c)) or develop paradigmatic irregularities, as
seen for the indefinite article in (27b–c), of the kind linguists usually associate
with inflecting-fusional type morphology.²¹ Contrary to those in noun morph-
ology, which are in the process of vanishing, the irregularities in the indefinite
article are stable as long as the NC system is stable. This, however, is not anymore
the case in contemporary urban varieties, which leads us to the last section.
¹⁹ Pozdniakov & Robert (2015: 565) provide a similar scheme, without the two plural classes ji and
si, and marking a blank for both neutralization (occurrence of ay for ñ- plurals as well as for y- plurals)
and defectiveness (non-existence of forms for singular j- and l-).
²⁰ In this transitional stage, however, as argued while concluding section 6.5.2, variation between
two cell-mates in the plural adds to overall paradigm complexity.
²¹ That verb and noun inflection can differ, in this respect, within one and the same language, ‘and
develop diachronically in typologically different directions’ (Dressler 2005: 7) has been shown by much
work on morphological typology (see, e.g., Haspelmath 2009: 25).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Even once the local increase in complexity in noun and determiner morphology
addressed above has been recognized, it remains true that, on the whole, Wolof
morphology is both less rich and less complex than that of the closely related
Atlantic languages mentioned above (and, hence, of the reconstructible common
ancestor, under either of the alternative classifications in section 6.2). This
impoverishment/simplification, resulting in a ‘restricted system’ (Pozdniakov
& Robert 2015), may be traced back to external factors. In fact, Wolof, a
vehicular non-native language for a substantial share of its users, is a typical
case of a language spoken in an ‘exoteric niche’ (in Lupyan & Dale’s 2010
terms) or in a ‘Type 2 community’, or ‘an extreme “generalized outsider com-
munity” ’ (in Kusters’ 2008: 14 terms). The literature on linguistic complexity
has addressed the consequences on morphology that are often observed
when the percentage of non-native speakers becomes substantial, concluding
that languages spoken in such communities are expected to simplify their
morphology:
we may conjecture that when a language splits, and one variety becomes more
like a Type 1, and the other like a Type 2 community, we expect that the latter
becomes simpler in its inflectional morphology. (Kusters 2008: 15)
Thus, that of Wolophones is not only a Type 2 community, with many non-native
speakers, but also a community in which native speakers, in both traditional and
urban contexts, tend to adopt themselves, qua prestigious, modes of linguistic
behaviour favouring simplification, a fact that can be plausibly invoked as an
explanatory factor for the overall structural simplification of morphology and
morphosyntax that Wolof has undergone, compared with its antecessor within the
Atlantic language family.
Acknowledgements
Thanks to the editors and two anonymous reviewers for comments and constructive
criticism on a previous draft, as well as to Cheikh Anta Babou for joint fieldwork on
Wolof. Usual disclaimers apply.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
II
T H E CR O S S L I N G U I S T I C
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
7
Canonical complexity
Johanna Nichols
7.1 Introduction
¹ Henceforth I use that term to refer to the theory and its body of exemplar studies, since it is used in
the foundational literature, but canonicality when I need to nominalize the adjective canonical (since
only canonicality is possible in my English).
Johanna Nichols, Canonical complexity In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Johanna Nichols.
DOI: 10.1093/oso/9780198861287.003.0007
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
² Or perhaps it is logical. Canonicity theory is concerned with whether linguistic elements are
canonical or not, while the goal in this fragment of complexity theory is to describe types of complexity.
In that theory, presumably the ideal in a space of complexity is maximal complexity, so in that sense
‘CC’ is logical.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Big Data work. Independently of those considerations, typology needs more than
one kind of complexity measure.
To address these various needs and possibilities, this chapter proposes a
method for measuring CC (section 7.2) and presents results of a survey showing
that CC yields results that are revealing and do not duplicate those from EC but
complement them to make a stronger combined measure (section 7.3).
7.2 Method
7.2.1 Samples
³ The denser coverage of the northern hemisphere is intentional, as I planned to test some of the
geographical distributions hypothesized in section 7.3. The coverage of the southern hemisphere is
thinner than planned because the survey proved more labour-intensive than anticipated and could not
be fully completed as projected.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
non-canonical patterns in inflectional paradigms for each category and each POS.
The set of categories is a sample chosen because they are generally well-understood
and well-described (including that grammars make it relatively straightforward to
determine whether the category is present or absent, and if present what its values
are). They are present in enough languages to make frequency comparisons mean-
ingful. This section describes, first, the inflectional categories surveyed, then the
variables. Survey data consists of: (1) a text report on each language that includes
any definitions of categories and variables required and discussion of any coding
decisions, plus sources used. These reports discuss but do not fully replicate the
information available in grammars. Sometimes they include scans of published
paradigms. (2) A database page for each language showing the number of non-
canonical patterns in each intersection of POS and category. Appendix 7.1 lists the
categories and variables, and Appendix 7.2 gives the sum of entries in each
intersection of categories and variables, across the whole sample. Appendix 7.3
lists the sample languages. The entire database will be included in some future
release of the Autotyp database (Bickel et al. 2017 is the current release).
The inflectional categories surveyed are:
⁵ Numeral classifier systems often recruit regular nouns to the system, and in their capacity as
regular lexical nouns they are of course covered in this survey.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
classifiers as a gender category and added entries for their number and
unpredictability. Thus the six classifiers of Mian are entered as a noun
category with six inherent values, all unpredictable, while the 150 or more
classifiers of Kwaza, like, for example, the ~50 of Mandarin, do not appear
in this database and do not contribute to the EC of noun inflection or to
non-canonicality in the form of unpredictability.
• Tense/aspect/mood (TAM). The survey seeks the most basic synthetic
present-like and aorist-like tense categories. (In terms of aspect these tend
to be imperfective and perfective respectively.) If one or both is absent, as it
is, for example, in Mawng (Iwaidjan, northern Australia), which has only a
future/non-future tense opposition, the closest basic tense opposition is used
(future and non-future in Mawng). If the language has no inflectional tense
(as Mandarin does not), basic imperfective and perfective are used if the
language has inflectional aspect; otherwise there is no entry for the TAM
category.
• General. Some of the variables are inherently difficult to ascribe to some
particular category. Examples are the numbers of stems per lexeme and stem
classes per language. They are entered as general rather than as pertaining to
paradigms of particular categories (usually with a comment in the data
report). Again, for the present survey the exact placement of an entry is
less important than ensuring that it is included somewhere and contributes
to the total.
For each language the database records for each category whether it is present
or absent (a yes/no, or 1/0, classification).
The variables surveyed are the following.⁶ For all of them the number, or the
presence vs. absence, of non-canonical patterns was entered for each of the survey
categories just listed. For what was counted as non-canonical see below. For every
variable and every category and value, irregular words, lexically specifiable excep-
tions, and small closed classes are disregarded. Sizable minority classes, and classes
that are open or specifiable as a class, are counted. For example, if possessive
inflection applies only to kin terms, or even only to consanguineal kin terms, this
is counted as a class.
[1] Inflectional classes. In the terms of Bickel & Nichols (2007) these are instances
of formative flexivity: classes distinguished by different sets of inflectional mor-
phemes (e.g., suffixes). Not all grammars explicitly account for the number of
declension or conjugation classes, and those that do often mix together, or at least
fail to distinguish, formative flexivity and stem flexivity (variable [5] below), so
⁷ I know of only one language where human nouns have arbitrary gender: Uduk (Koman, Africa;
Killian 2015), where the cutoff point for gender predictability is set even higher on the animacy
hierarchy: it is predictable for first and second person pronouns but not for human nouns.
⁸ Cole (1967) describes most of the non-human gender classes of the Bantu language Luganda
(which number fourteen singular-plural concord pairs by his count) as ‘miscellaneous’ (these number
ten), a large number for one cell of this survey. Most Bantu grammars describe the classes as having a
semantic basis with some unpredictable members, but in languages with only one description the
decision on predictability has to be taken at face value.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
As a further note, most languages with a sex-based gender opposition for human
nouns also apply it to a few non-human animate nouns, typically large and
important domesticates. This kind of individual lexical exception falls under
value 0 of the applicability scale.
Table 7.1 shows a few languages and how they are treated in this classification.
Languages with a zero score have no unpredictable gender classes, either because
their gender is entirely predictable (Avar) or because they have no gender, either
of nouns (English) or of pronouns (Finnish).
[5] Number of stems per lexeme. This is what Bickel & Nichols identify as stem
flexivity: declension or conjugation classes based on changes in the stem, such as
ablaut, extensions, or allomorphy conditioned by the survey categories. For example,
in Nakh-Daghestanian languages, many or most nouns have distinct nominative and
oblique stems in the singular, with the oblique stem formed by adding an extension
suffix (Kibrik 1991, 2003). This is coded as two stems per lexeme. In English and
other Germanic languages, the sizable but minority class of strong verbs has different
stems, marked by ablaut, in the two survey tense categories (English sits, sat); this is
also two stems per lexeme. A word or class is counted if it involves all, most, or a
sizable or open subset of the relevant words, following the thresholds in (2).
[6] Number of stem classes per language. The Nakh-Daghestanian languages
with extensions in oblique stems mostly have two stems per lexeme, but the
number of oblique extension suffixes ranges from one to over a dozen in different
languages. This, plus the (usually minority) class of nouns with a single stem, is the
total number of stem classes per language. Following the criteria in (2), the
number entered in the database is the number of such classes that are sizable,
productive, and/or open.
[7] Unpredictability of those stem classes (per language), by the same criteria
as for [2] and [4] above.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Human: 0 0 0 0 0 0 2 0 0
Non-human: 2 2 0 1 2 1? 2 0 0
Cross: 2 1 0 0 2 0 2* 0 0
Total: 4 2 0 1 4 1 6 0 0
Notes: Languages: IE: Generic conservative Indo-European (e.g., Latin, Russian). Three genders:
masculine (M), feminine (F), neuter (N). The neuter gender contains relatively few nouns, so most
non-human nouns are M or F, arbitrarily classified. Ingush: Nakh-Daghestanian (Caucasus). There is a
dedicated gender for human males, a gender containing human females and some inanimates (though
if the survey counted singular-plural gender pairings these would be different genders as plurals have
different genders for human females and non-humans), and two non-human genders with arbitrary
membership. Avar: Nakh-Daghestanian (Caucasus). There are three genders with total semantic
predictability: M (human males), F (human females), N (all else). Bantu: Subbranch of Benue-Congo
(Africa). Generic entry applicable to most Bantu languages including Luganda in this survey. There is a
dedicated human gender and a number of non-human genders (the number varies among languages)
which most descriptions present as having a semantic core or prototype plus a limited number of
arbitrary members. Usually there are also a few dedicated genders for such things as non-finites or
particular deverbal derived nouns. Nama (Khoekhoe): There are two genders, M and F, containing all
human males and all human females respectively, and other nouns are arbitrarily divided between
M and F. BGW (Bininj Gun-Wok; Gunwingguan, northern Australia): M and F genders contain all
human nouns plus some arbitrary members. The other genders also have a semantic core and some
arbitrary members. Uduk (Koman; Africa): Two genders; all nouns arbitrarily classified; first and
second person pronouns have predictable gender (all have gender 2). English: No noun gender. Finnish:
No gender of either nouns or pronouns.
* Not in sample. For Uduk, see footnote 7 above in text.
[8] Arguments indexed. The number of core arguments indexed on the verb,
counted for the verb type with the most core arguments. The maximum number
of core arguments possible is three (A, G, and T), but not all languages have
ditransitives, and for those that do not the maximum is two. Arguments indexed
are counted only for simple clauses without valence-related derivations such as
causatives or applicatives.
[9] Co-exponence, that is, portmanteau, cumulative, or otherwise opaquely
fused marking of categories. Examples are the gender-number-case suffixes of
nouns and adjectives in conservative Indo-European languages. Co-exponence
violates the one-form-one-function tenet of canonicality, as one form has three
functions (marking gender, number, and case). A language is coded as having co-
exponence if all, most, or a sizable minority of its words in the relevant categories
(e.g., nouns and their case paradigms) have co-exponent markers; it is so coded for
all of the categories involved (e.g., for Indo-European, gender, number, and case).
[10] Syncretisms: identical formatives in two or more categories that are non-
identical elsewhere in the language. Consider the German articles in (3):
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
There is one allomorphy here in noun case inflection (accusative -a vs. -Ø), and
also two patterns of case syncretism.¹⁰
⁹ Syncretism is clearly non-canonical (Corbett 2013a, 2007, and other works), as it makes for non-
biuniqueness, but reviewers and audience members often object that syncretism does not increase the
amount of information required to describe a language. This shows that canonical and Kolmogorov
complexity are not identical; it is the only respect I am aware of in which they are different. I believe the
difference arises because Kolmogorov complexity is concerned only with the information required to
describe the text as string alone and not the full text including its message. For the message even at the
minimal level of determining which case is intended as in (3), resolving syncretism requires bringing in
additional information.
¹⁰ There are debates in the Slavistic literature as to whether animacy is an additional gender
category, or for that matter a subgender or supergender. It is also sometimes called a case split or a
gender split, but I have not tried to distinguish allomorphy from splitting.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
I did not encounter examples where it was difficult to decide whether some-
thing was allomorphy (within one category) or a syncretism (in paradigms that do
not have that allomorphy), but there may be such cases. If so, the important thing
is to enter it somewhere, for this survey in which total numbers of non-canonical
points are compared.
[12] Position discrepancies. In some languages the forms of a single category are
distributed between two different positions, e.g. Pazar Laz (Kartvelian, Turkey;
Öztürk & Pöchtrager 2011: 485) subject person agreement in verbs (present tense):
First person is a prefix, second person zero (presented as a prefix because the
object prefixes that compete hierarchically for the same slot have an overt 2
object form), and third person a suffix. Discrepant position is analogous to
different forms for one category (albeit the forms are slots rather than mor-
phemes), hence non-canonical.
[13] Category discrepancies. I used this variable to account for infrequent
examples like verb inflection in many Slavic languages, which have agreement
for person-number in the non-past tense and gender-number in the past tense.
The survey category is TAM rather than just one tense; if there were only one
survey tense there would be no discrepancy. In these languages verbs were coded
as having the categories of person-number, gender, and TAM, with a category
discrepancy for TAM.
[14] Wordhood discrepancies. These are discrepancies between such statuses as
independent word, clitic, affix, and non-linear marking such as ablaut, within a
single paradigm. For example, in Slovene, singular pronouns have both tonic and
clitic forms but plural ones have no clitic forms; in Bulgarian, Romanian, and
Ossetic, subject indexation is suffixal while object indexation uses clitics.
Languages like German or Mian (Ok, New Guinea) have noun gender marked by
articles; this is a wordhood violation for gender not as an inherent category but as an
agreement category (in languages without the wordhood violation it is usually
marked affixally, as with the noun class prefixes of nouns in Bantu languages).
[15] Partial marking: Only some of the otherwise eligible words inflect for the
category. An example is gender in Nakh-Daghestanian languages, which is gen-
erally marked by prefixation or initial consonant mutation of the verb, but not for
all verbs (the verb roots that do take it range in different languages from about
30% to the great majority of verbs). Another example is number: probably all
languages that have number inflection on nouns apply it only to some nouns.
Most common is drawing the line between count and mass nouns, with mass
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
nouns taking no number marking, but it is also fairly common to find the line
drawn between animate and inanimate or human and non-human nouns. I did
not code number as a partial category for any of these: for count vs. mass nouns it
is clearly due to semantics, and for the distinctions higher up there is a case to be
made that those are semantically akin to the count/mass distinction. Some
languages have only a handful of nouns that make number distinctions, for
example Yurok (Algic, California), where only nine nouns, not representing a
coherent semantic group, form plurals (Robins 1958: 23); these are my only cases
of non-semantic, purely lexically specified, plural marking, but the nouns involved
are too few in number to count in this survey.
Partial marking is not common in the survey languages; Nakh-Daghestanian
gender contributes most of the examples.
[16] Multiple marking. Even rarer among the survey languages is marking of
an inflectional category more than once in a wordform. For example, Bardi
(Nyulnyulan, Australia) marks person-number on verbs with person enclitics,
and can add an optional additional person-number enclitic to mark plurality of
the object; this amounts to marking person twice. Yurok has A and O agreement
in person-number, and in some verb classes and categories one-argument verbs
fill both slots and thereby mark subject person-number twice (Robins 1958: 69ff).
[17] Other. This entry column handles the occasional uncertainty in classifi-
cation, but primarily contains calculations of the number of categories or dimen-
sions involved in co-exponential marking. Noun inflectional paradigms of Indo-
European languages preserving the original design of co-exponential gender-
number-case inflection abound in such non-canonical phenomena as syncretisms,
unpredictable declension classes, unpredictable gender classification, human
crossgender, and others. (For some illustrations, see Nichols 2019.) These give
them extremely high CC values if the number of syncretism patterns is counted,
and this skews comparisons. Therefore I coded not the number of such patterns
but the number of categories involved in them, treating those as dimensions of
freedom within which syncretism might appear. Similarly, for complex systems of
verb argument indexation where person-number and role (A, O) are marked by
co-exponential and often opaque markers, I counted the number of categories
involved (usually person-number and role, sometimes also gender).¹¹ This pro-
cedure levels out the possible complexity ranges of case-inflecting languages like
Indo-European and complex head-marking languages like many in the Americas.
But even with the obvious heavy contributors neutralized, section 7.3 shows that
the languages of western Eurasia still reach overall higher CC levels than even the
polysynthetic languages of the Americas. I judge this high level to be non-
artifactual as measured, implying less opacity for polysynthetic inflection than
¹¹ Recognizing role as involved in the categories is also a way of accounting for the mix of direct and
hierarchical marking of person in such systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
7.3 Results
¹² Differential complexity of noun vs. verb inflection and head vs. dependent marking, and
measuring the complexity of hierarchical patterns and polysynthetic structure, will be covered in a
separate paper. At that point the dimensions of co-exponential marking will be given a term and a
separate dedicated variable.
¹³ The variables used for EC are much as defined in Nichols (2009). Publication of an updated
version of that list is planned for the next year or two.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Nichols (2019) found that there was no correlation between EC and the
presence of gender in a language, concluding that gender and the well-known
complexity of many gender systems are not simply byproducts of overall
complex morphology. I replicated that study on the smaller and different
language set used here, and using a correlation test, with the same result:
there is no correlation between EC and presence of gender. For CC, there is a
slight positive correlation but it is far from significant (correlation coefficient
0.089, p = 0.233).¹⁴
I calculated the mean CC for a number of areas and families, and asked whether
the range of mean 1 standard deviation for each area overlapped with others,
using the breakdowns in Table 7.2. Ranges for local areas and families are in
Table 7.2. Figure 7.1 gives a graphic display.
Non-overlap of the ranges means significantly different populations.
Macrocontinents and continents overlap each other considerably, which means that
the largest groups all represent the same population. Of the local areas, the Circum-
Baltic has a very large standard deviation, that is, very little areality, and overlaps
Notes: Figure 7.1 shows the mean CC 1 standard deviation for all groups. Northern continents
(Eurasia, North America), the Caucasus, and the Uralic and Nakh-Daghestanian families are well-
sampled; other areas and families are compiled opportunistically from languages in the sample and are
less well covered.
¹⁴ For these calculations, to avoid circularity the points contributed by gender were subtracted from
the total complexity. (If that is not done, CC yields a highly significant but spurious correlation. EC
does not, because the contribution of gender to its total is much less than for CC.) For CC I use the two-
tailed value since I had no advance expectation about whether or how CC might correlate with gender.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
50.0 50.0
40.0 40.0
30.0 30.0
20.0 20.0
10.0 10.0
0.0 0.0
1 2 3 4 1 2 3 4
50.0 50.0
40.0 40.0
30.0 30.0
20.0 20.0
10.0 10.0
0.0 0.0
1 2 3 4 5 1 2 3 4 5
Balkan Caucasus Circum- N. Inner N. Pacific Balto-Slavic Uralic Nakh- Tungusic Uto-
Baltic Asia Rim Daghestanian Aztecan
Figure 7.1. Mean CC 1 standard deviation for three areal breakdowns and selected
families
Notes: Groups are defined in Table 7.2. The mean and range for the entire sample are very similar to
those for Africa.
most others. The Caucasus has a relatively large standard deviation (unsurprisingly,
as its languages range from the fairly simple Lezgi to the very complex Ingush and
Khinalug), and its status as an area is debated (con: Tuite 1999, pro: Chirikba 2008;
I side with Tuite). The other three are well-known areas and have small standard
deviations and little or no overlap. Mean complexity levels differ considerably
among the areas, suggesting that regression to some neutral complexity level is
not a consequence of areality.
The five families show relatively little overlap. Uralic, one of the older and more
widely distributed families and the most thoroughly surveyed here, has a large
standard deviation. The others have clearer family profiles.
Overall, then, continents and macrocontinents are not greatly different from
one another or from world totals while local areas and families are more discrete
from each other and for the most part internally fairly consistent in their com-
plexity levels. These figures are very preliminary; in particular, standard deviations
will probably shrink as the sample adds more members per area and family,
reducing overlaps.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
¹⁵ Northern continents are Eurasia and North America. Southern ones are Africa, Australia-New
Guinea, and Central and South America.
¹⁶ In Figures 7.2(a)–(b), what appear to be dense vertical stacks of dots at some places are regions
that are densely sampled and/or have high linguistic diversity at a similar longitude: at left, at about 45 ,
the Caucasus; at right, at about 230 , the Pacific coast of North America.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
7.3.5 Sociolinguistics
Dahl (2004) and Trudgill (2011) show that what Trudgill calls sociolinguistic isolation
tends to allow languages to grow more complex over time, while sociolinguistically
expansive languages tend to simplify. Sociolinguistic isolation means that a language
absorbs little or no immigrant or language shifting population, so that nothing hinders
the further growth of complexity. An expansive language (this is not Trudgill’s term;
I take it from Janhunen 2008) absorbs appreciable numbers of adult L2 learners, and
their influence tends to simplify the language. This section describes the four language
groups in this chapter’s sample for which enough is known of the history of expansion
and non-expansion to permit predictions about relative complexity levels. The groups
and the complexity levels are listed in Table 7.3.
CC EC CC + EC
categories, etc. That is, downhill languages are expansive while uphill ones are
more sociolinguistically isolated. Thus we expect higher complexity in highland
languages. Nichols (2013) finds a correlation between EC and altitude in the
Daghestanian branch of the Nakh-Daghestanian family, and Nichols (2016)
finds a stronger correlation using non-transparency of just gender marking.
Nichols & Bentz (2018) show that a correlation of altitude with complexity is a
significant worldwide tendency on several different measures. The sample used
here is smaller but yields similar results. Both CC and EC correlate appreciably
with altitude, and combined CC+EC yields a notably strong correlation for the
small sample (Figure 7.3).
2000
1000
0
0 10 20 30 40 50
CC
(b) EC x altitude in Daghestan
3000
Altitude (metres)
2000
1000
0
0 10 20
CC
(c) Combined CC+EC x altitude in Daghestan
3000
Altitude (metres)
2000
1000
0
0 10 20 30 40 50
CC
Figure 7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three
complexity counts
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
• Spreads and isolation in the Caucasus: The Avar sphere. The eastern
Caucasus is compactly settled by the 40+ descendants of the Daghestanian
branch of Nakh-Daghestanian; Daghestanian may be of about Indo-
European-like age. The eastern Caucasus has been inhabited by settled
food producers for some 8,000 years. For at least the last few millennia the
highland populations have followed an uncommon kind of transhumance:
the entire working-age male population leaves the highlands for the winter
half of the year, taking livestock to markets and winter pastures and usually
finding seasonal work or maintaining businesses in lowland cities. There is
what seems to have been a long-standing centre of language spread in the
northeastern Caucasus and foothills, dominated from at least c.1000
by the Sarir Kingdom. The canyons of the Avar Koisu, Andi Koisu, and their
confluence in the Sulak were the avenues of trade and transhumant migra-
tion for most of Daghestan, and large markets formed in the Sulak lowlands.
The language spoken at and near the confluence—in recent historical times,
Avar—had major economic importance and was the language of work and
everyday life for half of the year for much of the male population of
Daghestan. This has led to contact effects among the languages of western
Daghestan, including a distinctive structural type marked among other
things by highly transparent gender systems, lack of verbal prefixation, and
of course many Avar loans.
Three episodes of uphill spreading can be traced in the Avar sphere (Nichols
in prep.): most recently Avar, earlier Andic, still earlier Tsezic. These three
make up one branch of Daghestanian, with this structure: [ Tsezic [ [Andic]
Avar ] ].¹⁷ Avars apparently became rulers in the Sarir Kingdom on its
conversion to Islam (at which point it became the Avar Khanate), and the
final battles for control between Andi and Avar took place only in the
seventeenth to eighteenth centuries (Aglarov 1988: 24). Avar has been an
expansive language, serving as lingua franca along the Andi Koisu for about
three centuries and along the Avar Koisu for probably somewhat longer; it
has spread well uphill and spilled over the crest to Georgia and Azerbaijan,
but patchily, with many non-Avar enclaves. Andic is probably about 1,500
years old, during most of which time it has been expansive and its daughters
have spread uphill; their settlement of the Andi Koisu is compact. Tsezic may
have separated some 3,000 years ago in an earlier uphill spread; Tsezic
languages are now at the uppermost highlands of both the Avar Koisu system
and the Andi Koisu. The Andic languages can be expected to show more
pronounced effects of spreading than Avar does. The western Tsezic languages
(Hinuq in this sample) have been under strong Andic and Avar influence;
¹⁷ Avar is one language, Andic a close-knit group of about ten, and Tsezic five more disparate.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
the eastern Tsezic languages (Hunzib in this sample) had less Andic contact
and held winter pastures not in the Avar-Andic lowlands but in Georgia to the
south. Consistent with this history, Hinuq has considerable Avar influence and
a very Andic-like grammar; Hunzib is markedly different, with southeastern
Daghestanian-like traits. At the edge of the Avar sphere, the isolate branch Lak
was not part of the Avar Khanate but used the same trade and transhumance
routes and shows Avar lexical and grammatical influence; isolated in a high-
land plateau, it has no known history of spreading. Beyond Lak are the Dargwa
languages, for which the Caspian coastal cities and trade routes were import-
ant, lessening Avar influence. To the south of Avar, languages of the Lezgian
branch are spoken along the southeast-flowing Samur and its tributaries, and
Tsakhur is at the high end of this line of communication and also at the high
end of the Koisu-Sulak line. The sample here includes representatives of most
of these stages. Thus we expect the descending order of spread effects along
and near the Andi Koisu and Avar Koisu systems shown in (6):
Table 7.3(a) shows the complexity values. CC conforms very well to this
scale; the only non-conformities are Hinuq, which clusters with Andic as is
unsurprising, and Ic’ari (Dargwa), which belongs to the Caspian coastal
sphere. EC is not very informative. The combined total is again in good
conformity (unsurprisingly, as it adds the fairly uniform EC scores to the CC
scores). For the Avar sphere and its periphery, then, CC reflects the socio-
linguistics of spreading and isolation better than EC does, and the combined
measure differs little from the CC scores.
• The Samur sphere. The delta of the Samur River, which drains the southeast
Caucasus and flows into the Caspian Sea, is a highly productive agricultural
region and long a nexus of trade and tax collection along the East Caspian
commercial route. It is the second most important avenue (after the Sulak)
for transhumant migration. The Lezgian branch, an old and diversified
branch of Nakh-Daghestanian, originated in this vicinity and spread both
uphill and into the Alazani valley in eastern Georgia and the lower Kura
valley in northern Azerbaijan. The sample contains four Lezgian languages,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
two in the highlands and two in the lowlands. Lezgi is a large, expansive,
and inter-ethnic language centred on the lower Samur and nearby. Udi,
which descends from a probably expansive inscriptional language of the
early to mid first millennium (Caucasian Albanian [Gippert et al. 2009] is
its ancestor), has since shrunk to three isolated enclaves in Azerbaijan and
Georgia. Archi, noted for its morphological quirks (Corbett 2013b; Bond
et al. 2016), and the complex Tsakhur are isolated at high ends of river
canyons and have no known history of spread (apart from reaching the
highlands in the first place, however that happened). The complexity
figures in Table 7.3(b) reflect this history well. Tsakhur has much higher
CC than the rest; Archi has higher EC; lowland Lezgi, with its known
history of expansion, is low on both counts. Udi is mixed, high on CC
and lower on EC, suggesting that CC complexifies faster than EC after the
end of expansion.
For both Caucasus surveys, EC picks out as most complex one language
that is isolated at a high end with connections in more than one direction
(Ic’ari Dargwa, Archi), CC appears to reflect spreading more than isolation,
and the combined total gives a workable unified complexity scale that
correlates reasonably well with altitude and isolation.
• Slavic. Of the four Slavic languages in the sample, Russian has a long
history of expansion and absorption of Baltic and Finnic populations;
Sorbian reflects the leading edge of the Proto-Slavic expansion (c. sixth
to ninth centuries) but has been sociolinguistically isolated and receding
since then (largely absorbed by the German expansion); Slovene remains
close to the homeland and has no known history of expansion other
than uphill spread into the Austrian and Slovene Alps; Bulgarian belongs
to the Balkan Sprachbund and has undergone drastic structural changes
as a result, including loss of cases and thereby of the case-number-
gender co-exponence that makes Slavic noun declension so complex.
Complexity levels (Table 7.3(c)) are not greatly different for the lan-
guages preserving case inflection, while Balkanized Bulgarian is much
less complex.
• Uto-Aztecan. The Uto-Aztecan family is probably 5,000 years old and has
undergone a gradual spread from a probably northern Mexican homeland
followed by two large recent spreads: in the south, ancestral Nahuatl spread
with the Aztec expansion and empire beginning in the thirteenth century,
and in the north the Numic branch spread rapidly from the Sierra Nevada
foothills across the Great Basin beginning in approximately the same time
frame (Fowler 1972; Miller 1983; Madsen & Rhode 1994; Hill 2001, 2010;
Merrill 2012). The languages in the sample, south to north, are Pipil
(Nicaragua), Hopi (Arizona), Cupeño (southerneastern California), and
Tümpisa Shoshone (east central California). Pipil is the southernmost
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Though these samples are small, the results are generally consistent with
predictions of higher complexity for sociolinguistically isolated communities.
CC appears to be the better mirror of sociolinguistic history, and EC points in
the same direction but unevenly. Nonetheless, combined CC + EC tends to yield
very good correlations with present and prehistoric sociolinguistics: sociolinguis-
tically isolated languages are more complex and expansive languages less complex.
genuinely an area (though a good answer will require surveying more than
complexity). Both EC and CC correlate positively with altitude, a geographical
factor that is not the cause of complexity levels but reflects the sociolinguistics of
isolation. The four families adequately represented in the sample all display
some positive correlation of complexity with sociolinguistic isolation, supporting
principles advanced in historical linguistics and sociolinguistics.
The definitions and coding used here were arrived at using the autotypologizing
principle (Bickel & Nichols 2002) of no fixed ontology and constant redefining
and recoding as the categories emerge from analysis of more and more languages.
Arriving at the current typology has been very labour-intensive, making this pilot
survey inordinately time-consuming. By now, though, the typology has stabilized
to the point that language surveys themselves are not unduly labour-intensive.
This line of inquiry can be improved by expanding the sample to give all
continents and areas comparably dense coverage to what has been done here for
northern Eurasia and North America, and covering thoroughly a larger number of
families and local areas. Methods of weighting the variables, and different calcu-
lations using different combinations of variables, need to be proposed and tested;
among other things this will give firm grounding to comparisons of the relative
complexity of Indo-European noun inflection and polysynthetic verb inflection.
For stem classes and inflectional classes, which as mentioned are rarely distin-
guished in grammars, we need improved and consistent descriptive coverage.
We also need consensus definitions and criteria for characterizing the numbers
of conforming and non-conforming members of classes that have some semantic
or other basis, such as gender classes; descriptions like ‘miscellaneous’ (compos-
ition of a class), ‘predictable’ (class membership), ‘arbitrary’, etc., are not consist-
ently used. The applicability thresholds used here (section 7.2.2)—few or no
members predictable, a sizable minority predictable, most or all predictable—
seem workable but require some quantification, however approximate, of the class
membership and openness.
Inflectional paradigms are ideally suited to an approach like this one. The same
approach works well for some domains of derivational morphology but not all.
For phonology and syntax and probably some derivational morphology,
non-transparency will probably need to be described with a measure of the
distance between underlying and surface.
I see this kind of study as moving linguistics in the direction of the data
sciences. Variables that form geographically very large patterns, or that correlate
with such things as sociolinguistics, expansions, and other human population
developments raise the prospects of multifactorial interdisciplinary collaboration.
A single variable surveyed in a 113-language sample is not what one would call Big
Data, but behind the convenient single number representing the CC value lie
seventeen variables surveyed across three POS and eight categories—a total of
over 200 datapoints per language or over 20,000 for the hundred-language sample.
Massive scope, making possible close comparison with the differently distributed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
data of other fields, might require some 400–500 languages plus similarly massive
data for a few other composite variables or many simple ones. Creating such a
resource is an ambitious but entirely feasible project.
80
70
60
50
40
30
20
10
70
60
50
40
30
20
10
0
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
8
The complexity of grammatical gender
and language ecology
Francesca Di Garbo
8.1 Introduction
Francesca Di Garbo, The complexity of grammatical gender and language ecology In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Francesca Di Garbo.
DOI: 10.1093/oso/9780198861287.003.0008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
(on the role of language contact in the loss of grammatical gender, see the recent
study by Igartua 2019; for a broader discussion of loss of morphology and
imperfect language learning, see the contributions by McWhorter, Chapter 10,
and Berdicevskis & Semenuks, Chapter 11, both in this volume). It has also been
observed that gender systems tend to cluster geographically and to be best
preserved in languages surrounded by other languages with gender (Nichols
1992, 2003). Thus languages that undergo complete gender loss are expected to
be neighbours with each other or to have languages without gender as their closest
neighbours (Nichols 2003: 299–304).
While instances of gender reduction and loss under contact situations are
relatively well documented in the literature, the role of language contact in the
rise of gender systems has, so far, been poorly explored, and scholars generally
agree on that gender systems very seldom arise within language families that
normally lack gender (Nichols 2003: 308). This is directly connected with the fact
that full-fledged gender marking systems are commonly associated with rather
pervasive patterns of agreement, which are notoriously unlikely to be borrowed
(for a similar argument, see Igartua 2019: 209). However, recent research (Stolz
2012, 2015; Di Garbo & Miestamo 2019) shows that elementary patterns of gender
agreement may emerge as a result of borrowing of noun phrases from contact
languages with gender, and that, albeit rare, these types of systems are spread
across unrelated languages and in different areas of the world.
Existing research on the stability and evolution of gender systems under contact
situations focuses either on the decline or on the rise of gender systems, and the
two processes are rarely discussed together. Here I argue that, in order to fully
understand to which extent morphological complexity in the domain of gram-
matical gender ties up with factors pertaining to the social history of a speech
community, a comprehensive survey of the evolutionary dynamics of gender
systems—focusing not only on loss and emergence, but also on reduction and
expansion—is in place. In addition, given that, by definition, gender systems are
bound to the existence of productive agreement patterns (Corbett 1991), I contend
that complexification and simplification in the morphological encoding of gender
distinctions must be primarily studied through the analysis of agreement pat-
terns.¹ Within contact linguistics, it is generally assumed that contact-induced loss
or emergence of agreement presupposes long-term contact, heavy borrowing and/
or extensive bilingualism between speech communities (Thomason 2001: 71).
However, to date, and to the best of my knowledge, there have been no studies
that systematically tackle the issue of which factors may account for the occur-
rence of these opposite patterns of change, agreement loss and emergence, under
¹ Focusing on patterns of gender agreement does not mean, of course, to underestimate the
importance that nominal gender marking has in languages that display it (for a more thorough
discussion, see section 8.2).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
allegedly similar sociohistorical scenarios. The present study attempts to fill in this
gap by investigating loss of gender agreement in language families characterized
by the presence of this feature and, conversely, the insurgence of gender agree-
ment in languages with no inherited gender systems. Beside loss and emergence,
I also study the reduction and expansion of gender agreement patterns within
gendered language families. With respect to sociohistorical variables, the study
especially focuses on language contact dynamics, with particular attention to
asymmetries between the populations in contact, both in terms of the demo-
graphic structure (population size) and prestige differences.
The chapter is structured as follows. In section 8.2, I discuss in what respects
gender systems, as a grammatical and functional domain, can be relevant to the
study of morphological complexity. The sampling methodology and data collec-
tion procedure are outlined in section 8.3. In section 8.4, I provide an overview of
the patterns of language change attested in the data set, and illustrate their
geographic distribution in section 8.5. Section 8.6 discusses the sociohistorical
factors that are associated with the patterns of change attested in the languages of
the sample. A summary of the results and some concluding remarks are given in
section 8.7.
• The number of gender distinctions, under the assumption that the higher the
number of distinctions, the more complex the gender system.
• The number and nature of assignment rules, under the assumptions that: (a)
a gender system where gender assignment is both semantic and formal is
more complex than a system where gender assignment is only semantic or
only formal, and (b) a gender system with flexible assignment is more
complex than a system with rigid assignment.
• The pervasiveness of gender marking, under the assumption that the higher
the number of word classes and syntactic domains that are subject to gender
marking, the more complex the gender system.
² In this chapter, the notion of descriptive, absolute complexity is kept distinct from the notion of
difficulty. Under the former approach, complexity is operationalized in terms of description length
(Dahl 2004; Miestamo 2008). Under the latter approach, complexity is a measure of difficulty and costs
in language learning and use (Kusters 2003). For a discussion of these and related topics, see Arkadiev
& Gardani, Chapter 1 in this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
In (1), the markers of class 7 (the singular form of gender 7/8 in Chichewa) occur
on the adnominal modifier, the verb, and the noun itself.
The relationship between nominal and non-nominal (agreement-based) gender
marking is not trivial. In some languages, as it is the case in Bantu, nominal and
non-nominal marking can have similar means of expression from the point of
view of the phonological appearance of the morphemes used to encode gender
distinctions. However, this formal correspondence may only apply to parts of the
system rather than to all nouns and all agreement targets. In addition, nominal
marking and agreement marking may have different sources and undergo differ-
ent types of diachronic developments. For instance, as is also the case in Bantu
languages, animacy-based marking may develop in the domain of agreement
without affecting nominal marking. Thus, in languages that have both nominal
and agreement-based marking of gender distinctions, it is important to consider
these as two separate dimensions that may, but need not interact with each other.
In this chapter, I restrict my focus to patterns of change in the domain of
agreement marking and their effect on the complexity of gender systems. The
reason behind this choice is twofold. On the one hand, while agreement marking
is definitional to gender (there is grammatical gender only if there is displaced
marking of classificatory distinctions through agreement), nominal marking is not
(many languages mark gender distinctions only via agreement). On the other
hand, while agreement marking directly hinges on inflectional morphology, in
that gender agreement targets obligatorily inflect for gender, nominal gender
marking resides more in the domain of lexicalized distinctions and/or word
formation rules, which can be argued to be less central to morphological com-
plexity. The patterns of change in the domain of agreement marking that the study
focuses on are presented and discussed in section 8.4.
Legend
Balto−Slavic Insular Celtic
Bantu Iranian
Basque Khasian
Chamorro
Lezgic
Central Gunwinyguan
Germanic Mek
Ghana−Togo−Mountain Michif
Greek Thebor
from at least five of the six world’s macro-areas are represented in the sample, the
data set is largely skewed towards Eurasia. The reason behind this bias is twofold.
First, along with Africa, Eurasia is one of the areas of the world where gender
systems are most frequent. Second, for many of the Eurasian genealogical units
included in the sample, diachronic developments in the domain of nominal
morphology have been studied with the support of historical-comparative data,
and the social history of many of these speech communities is also relatively well
documented. The languages of Eurasia thus qualify as an appropriate starting
point to explore the evolutionary dynamics of morphological complexity in the
domain of gender marking and their sociohistorical correlates. At least one
genealogical unit for all other macroareas (except for South America) has been
added. A complete list of the languages sampled for each of the genealogical units
is given in Appendix 8.1.
Each language set consists of one conservative language and at least one
innovative language with respect to gender agreement marking, with the excep-
tion of the Thebor (Bodic) languages Shumcho and Janshung, both of which
represent instances of emerging gender agreement patterns within the family.
Languages within one and the same set can be mutually intelligible with each
other (as in the case of Kelasi and Kafteji within the Northwestern Iranian set), or
more distantly related (as in the case of Nalca and Eipo within the Mek set). The
patterns of language change accounted for are: loss, reduction, emergence and
expansion in the domain of gender agreement. These are compared with either the
retention of gender agreement (in case of reduction, loss and expansion) or with
its absence (in case of emerging gender agreement). These diachronic processes
are investigated by examining the morphosyntactic domains of gender marking in
a language (e.g., attributive modifiers, predicates, pronouns), and the way in which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
these vary across genealogically related languages: what are the word classes that
inflect for gender in language X as opposed to the closest relatives Y and Z? Do all
targets of gender agreement mark the same kind of gender distinctions or is there
a split between, say, adjectives, articles and demonstratives distinguishing between
masculine, feminine and neuter gender, and personal pronouns distinguishing
between animate and inanimate gender? The relevance of these questions for the
understanding of the complexity of gender systems is discussed in section 8.4.
In addition to representing more or less conservative languages in the domain
of gender agreement, the sampled language sets and the individual languages
within each set, were selected so as to attempt to capture diversity at the socio-
historical level. In this respect, variables such as demography, domains of use, and
history of contact were considered.
This sampling methodology, which aims to capture both structural and socio-
historical diversity within sets of closely related languages, has been already
applied to studies of the relationship between language structures and social
structures. An example of this approach is the study of morphosyntactic com-
plexity and language contact by Maitz & Németh (2014), where morphosyntactic
complexity in three varieties of German is investigated to the effect that these
varieties represent three different sociohistorical profiles: one standard, and rela-
tively high contact language (Standard German), two contact languages (the
pidgin Kiche Duits and the creole Unserdeutch), and one low contact variety
typically learned as L1 only (Cimbrian).
Data were collected by using a questionnaire, which was sent out to experts of
individual languages, as well as by means of descriptive resources. For those
languages for which questionnaire responses could not be obtained, I used the
questionnaire as a guideline to conduct more informal consultations with lan-
guage experts and to gather information from descriptive resources.
The questionnaire consists of two parts. Part 1 focuses on language ecology and
language contact and aims at capturing information on the present and past
geographical and sociohistorical environment in which a given language is/was
used, with a set of fine-grained questions ranging from demography to domains of
language use, issues of language identity and prestige, code switching practices
and language contact in the past.⁵ Part 2 focuses on grammatical gender and aims
at capturing information on number and type of gender distinctions, gender
assignment rules, the morphology and syntax of gender marking and the
diachrony of a given gender system. The questionnaire is based on two different
⁵ Not all of these questions could be answered for all languages in the sample.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The reduction and loss of gender agreement in the languages of the sample may
result from two distinct processes of language change: (1) morphophonological
erosion and (2) redistribution of agreement patterns. Under morphophonological
erosion, gender marking is eroded or disappears as a result of sound changes that
lead to the loss of segmental morphology. Under redistribution of agreement
patterns, one gender agreement pattern spreads at the expenses of others, leading
to the partial or complete neutralization of gender distinctions. Both processes
exhibit properties of directionality, but the preferred directionalities differ under
one or the other process: morphophonological erosion is found to often spread
from the domain of attributive modifiers whereas the redistribution of gender
agreement patterns often has its onset in the domain of anaphoric pronouns.
An example of partial loss of gender marking as a result of morphophonological
erosion is Standard Swedish (Indo-European, North Germanic). In Standard
Swedish, two different systems of gender distinctions are attested. Within the
noun phrase, the language distinguishes between two genders: the Common
Gender and the Neuter Gender, en person ‘a person’ and ett hus ‘a house’. This
distinction is marked on definite and indefinite articles, demonstrative modifiers,
and adjectives. In the domain of third person pronouns, a Masculine/Feminine
⁶ Both questionnaires can be freely accessed through the repository for ‘Typological tools for field
linguistics’ from the website of the former Department of Linguistics at the Max Planck Institute for
Evolutionary Anthropology in Leipzig (http://www.eva.mpg.de/lingua/tools-at-lingboard/question
naires.php).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
⁷ The Masculine/Feminine distinction is also marked in the accusative and genitive forms of the
pronoun. Cf. honom (3..) vs. henne (3..), and hans (3..) vs. hennes (3..).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
As examples (2) and (3) show, utterances in Kelasi and Kafteji look (and sound)
practically the same and the two languages are highly mutually intelligible. One of
the few striking structural differences between the two languages is, in fact, the
presence of gender inflections in Kafteji (in the form of zero-marked Masculine
and marked Feminine) and its complete absence in Kelasi. Stilo (2019) describes
loss of gender in Kelasi as the result of morphophonological erosion in the domain
of nominal inflection, whereby the possibility to omit overt gender marking on
nouns in certain morphosyntactic contexts triggers the systematic erosion of
gender marking elsewhere. No information is however given about the ordering
of loss of gender inflection on the various agreement targets.
Loss of gender by the redistribution of agreement patterns is attested, among
other languages, in Cappadocian Greek (Indo-European, Greek), where it results
from the generalization of neuter agreement to all instances of masculine and
feminine gender agreement (Karatsareas 2009, 2014). Comparative evidence from
closely related dialects, such as Pontic Greek, allows us to infer how the process of
redistribution took place. In Pontic Greek, grammatically masculine and feminine
nouns denoting inanimate entities trigger neuter agreement on all agreement
targets but the prenominal articles. This is shown in (4), with the example of
the inanimate feminine noun pórta ‘door’, which triggers neuter agreement on the
past participle anixtón ‘open’, but feminine agreement on the prenominal definite
article i.
based both on semantic and formal rules. This is reflected in the statement in (9),
about number of rules.
The borrowed gender marking systems attested in the sample qualify as simple
also with respect to amount of formal marking, in that gender marking occurs
only in one syntactic domain and, more specifically, on one type of agreement
target, adnominal modifiers. This is captured by the statement in (10).
(11) a. Gender marking is fully productive < Only a subset of lexical items per
agreement target mark gender
b. Gender marking is obligatory < Gender marking is optional
(Audring 2017: 60)
The distribution of the patterns of change attested in the languages of the sample
is presented in Figure 8.2.
Given the limited size of the sample, it is not possible to formulate any
generalization on the relative frequencies of the observed patterns of change;
these, however, tend to be represented evenly within the data set.
One striking fact about the geographic spread of the phenomena under study is
that, within Eurasia (where the majority of the sampled language sets come from),
instances of complete or near-complete loss and of emergence of gender agree-
ment tend to cluster around language family edges, that is, within geographic
areas in which languages belonging to families with a strong bias towards the
presence of gender systems (e.g., different branches of Indo-European) are in
contact with languages belonging to families that are biased towards the absence
Legend
Emergence = 5/36 Reduction = 8/36
Loss = 7/36 Retention = 8/36
Expansion = 6/36 Lack = 2/36
of gender (e.g., varieties of Basque, different Turkic and Finnic languages). The
configuration of these language family edge zones within Eurasia, and the patterns
of change observed in the domain of gender agreement across individual lan-
guages within these zones, are presented in Table 8.2.
One exception to this trend is Irish. Frenda (2011) reports ongoing reduction in
the domain of gender agreement in contemporary urban varieties of Irish, which
he classifies as non-native, and which he compares with more conservative
varieties documented trough recordings from the 1960s and classified as native.
The study provides diachronic corpus evidence that, over the past forty years,
gender distinctions in the domain of personal pronouns have been reorganized
and restructured around a purely semantic type of opposition between ‘female
referents’ vs. ‘everything else’, with the masculine pronouns systematically being
selected not only to refer to semantically and grammatically masculine nouns (as
attested in conservative varieties), but also to semantically inanimate and gram-
matically feminine nouns. This semantic realignment of gender distinctions in the
pronominal domain is explained by Frenda (2011) as one of the many conver-
gence phenomena that Irish is undergoing under the influence of the dominant
language English, and as a result of language attrition. As a result of this process, at
least in the pronominal domain, the gender system of present-day Irish has
converged with the pronominal gender system of English, which is also organized
on the basis of semantic oppositions involving natural gender. According to
Frenda (2011: 20), the major difference between pronominal gender in English
and present-day Irish varieties is that ‘masculine pronouns in Irish fulfil the
functions of both the neuter and masculine pronouns in English’. The contact
situation observed in Irish thus differs from those surveyed in Table 8.2 in that the
contact language (which is a genealogically, yet distantly, related language) does
have gender, albeit a very reduced type of system, with agreement patterns
restricted exclusively to the pronominal domain.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
clear connection between loss and emergence of gender agreement and language
contact dynamics is observed only in situations of prolonged contact and exten-
sive bilingualism. Let us illustrate some examples starting with the loss of gender
agreement.
Innovations in gender agreement morphology are attested throughout all Asia
Minor Greek dialects⁹ but Silliot Greek. As demonstrated by Karatsareas (2009,
2014) on the basis of historical-comparative evidence, the onset of these dia-
chronic developments in the domain of gender agreement precede the intensifi-
cation of contact with Turkish. However, only in Cappadocian, the variety with
the longest and most intensive records of contact with Turkish, and the one which
was geographically most isolated from the rest of the Greek speaking communi-
ties, has gender agreement been completely lost. An illustration of noun-phrase
internal agreement morphology in Cappadocian Greek, as compared with
Standard Modern Greek, is given in example (12).
⁹ The label Asia Minor Greek dialects is used in the literature to refer to a set of varieties of Greek
that, prior to the population exchange occurred between Greece and Turkey in 1923–4, used to be
spoken in Turkey (Asia Minor). Within the sample, Asia Minor Greek dialects are represented by
Cappadocian, Pontic, and Rumeic.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
participle. On the other hand, in Standard Modern Greek the noun for ‘wall’ is
both masculine and plural and triggers masculine and plural agreement on the
past participle (12b). Similarly, in (12c), from Sílata Cappadocian, the noun for
‘door’ only triggers singular agreement while in (12d), from Standard Modern
Greek, it triggers both feminine and singular agreement.¹⁰ As discussed in section
8.4.1, loss of gender agreement in Cappadocian Greek resulted from the general-
ization of the former neuter agreement pattern to all instances of masculine and
feminine agreement. As Karatsareas (2009: 224) puts it ‘these processes were
probably aided and accelerated by Cappadocian–Turkish bilingualism and subse-
quent cross-linguistic influence from Turkish’ (see also Karatsareas 2014: 99).
Moving on to contact-induced emergence of gender agreement patterns, in
Chamorro (an independent branch within the Austronesian language family),
Spanish feminine and masculine inflectional endings have been borrowed along
with borrowed adjectival modifiers. These inflectional endings are used by
Chamorro speakers to encode a ‘feminine vs. everything else’ type of opposition
in agreement with nouns of both Spanish and Chamorro origins. Inflecting
adjectives are exclusively of Spanish origin. Examples are given in (13).
While (13a) illustrates the use of feminine agreement with a noun denoting a
female entity, in (13b) the Spanish masculine agreement pattern is reanalysed as a
marker of ‘everything else’ as opposed to the feminine. It is worth noting that the
controller of non-feminine agreement in (13b), kareta ‘car’, is grammatically
feminine in Spanish. This testifies to the fact that even though the morphological
means through which gender-like distinctions are encoded in Chamorro are
copied from Spanish, their use is not, and rather hinges on Chamorro-specific
assignment rules. Chamorro is spoken in the Northern Marian Islands, which
counted as Spanish territory between 1665 and 1899. According to Stolz (2012:
104) the use and influence of Spanish in the Marian Islands was at its highest
¹⁰ Examples (12c) and (12d) are not translation equivalents. Nevertheless, they are sufficient to show
absence of gender contrast in Cappadocian and presence of feminine gender marking in Standard
Modern Greek.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
between the early and mid nineteenth century ‘when a strong current of
Hispanisation put the survival of Chamorro at stake’. As a result of this influence,
and in spite of the fact that Spanish almost completely disappeared from the
linguistic landscape of the Marian Islands after World War I, present-day
Chamorro retains traces of this heavy process of hispanization at the lexical
and, to a lesser extent, also at the grammatical level. The gender agreement
patterns exemplified in (13) are part of this heritage.
In addition to aligning with previously documented tendencies in contact-
induced loss and emergence of agreement, the distribution of the patterns of
change surveyed in this study suggests that, given similar contact situations in
terms of degree and duration of contact, one additional factor that appears to be
associated with the direction of the observed changes (towards either loss or rise
of gender agreement patterns) is the asymmetric nature of the relationship
between the languages in contact, both in terms of proportions of speakers and
prestige dynamics between populations in contact. In the following, I use the
notion of dominant language to refer either to the language that, in a given
contact situation, has the highest number of speakers, or to the one that is
more prestigious.
On the one hand, patterns of change in the domain of gender agreement
tend to proceed towards reduction and loss when the dominant language in a
given contact zone lacks grammatical gender or displays an already reduced
gender system. This is, for instance, the case of Cappadocian Greek under the
influence of Turkish. Gender reduction and loss can also occur as a result of
language shift when the number of non-native speakers of a given language
outnumbers the number of native speakers. This is what Thomason (2015)
defines as shift-induced interference. In the languages of the sample, a clear cut
instance of shift-induced interference is Tamian Latvian where loss of gender
is typically explained as one of the results of Livonian speakers shifting to
Latvian.
On the other hand, asymmetries in the structure of the population and/or in the
prestige dynamics between the languages in contact also account for the emer-
gence of gender agreement patterns under language contact. In such cases,
extensive borrowing in the nominal domain (involving both nouns and adnom-
inal modifiers) from dominant languages with grammatical gender may lead to
the emergence of marginal instances of gender agreement in languages that are
otherwise devoid of gender. In the languages of the sample, this is, for instance, the
case of Chamorro.
To the best of my knowledge, the effect of language dominance, as defined in
this chapter, on changes related to the evolution of gender systems had gone
unnoticed so far. In the languages sampled for this study, this effect can be
observed in nearly all the genealogical units for which the complete or near-loss
and the emergence of gender agreement patterns can reliably be classified as a
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 8.3. Direction of change and asymmetries in the structure of the population
and/or prestige dynamics
¹¹ The Thebor languages Shumcho and Jamshung, where suffixes are productively used to encode
gender distinctions on adjectival modifiers of Indo-Aryan origin (as illustrated in example (8)), are
excluded from Table 8.3. This is because the specifics of the linguistic area in which the two languages are
spoken cannot be clearly characterized in terms of dominance and prestige dynamics between languages
in contact. Even though Hindi is the dominant lingua franca of the area, contact and multilingualism
between Tibetan and Indo-Aryan languages in this area go beyond the influence of Hindi, both
historically as well as in the dynamics of present-day interactions between neighbouring languages.
¹² The question of whether Ghana-Togo-Mountain languages constitute an independent genea-
logical grouping within Kwa or rather an areal grouping of genealogically more distantly related
languages is still debated. See Blench (2009) for a discussion.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
across the last three generations of speakers. While older speakers still product-
ively mark gender distinctions through segmental and tonal morphology, the
younger speakers tend to omit segmental agreement markers and to retain their
tones as floating tones or to lose all traces of agreement marking altogether. This
contrast between conservative (segmental and tonal) and reduced agreement
morphology is illustrated in example (14), for the Animate and Inanimate gender.
Gblem-Poidi (2007) describes this ongoing pattern of change as one of the results
of the process of language attrition that Igo is undergoing under the influence of
Ewe, the dominant second language of the area, which is genealogically related to
Igo (they both belong to the Kwa family), but lacks grammatical gender.
According to the author, this development needs to be understood in the
context of a highly bilingual society, where grammatical structures in the minority
language (Igo) have begun to align with those of the dominant second language
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
(Ewe), and particularly so for the younger speakers. The onset of this fast
unfolding change, which goes hand in hand with a series of other innovations
affecting the vowel harmony system and the numeral system of Igo, can be dated
back to the beginning of the twentieth century (Honorine Gblem-Poidi, p.c.).
Similar sociohistorical contingencies are observed in all the other languages of
the sample for which reduction and loss of gender agreement are associated with
a situation of prolonged and intensive contact with dominant genderless lan-
guages (see Table 8.3). The pace at which these developments seem to rise and
spread varies, however, from language to language. In contact situations that
involve not only extensive, long-term bilingualism, but also language attrition,
rates of change are faster. This is, for instance the case of Igo and Irish where
reduction of gender agreement morphology is reported to have taken place
within the space of a couple of generations (Gblem-Poidi 2007; Frenda 2011).
In other contexts, diachronic developments fostering the reshuffling, reduction,
and, in the most extreme cases, the loss of gender agreement may spread over a
larger time span (as, for instance, in the case of Karleby Swedish), and the onset
of these patterns of language change may even precede the intensification of
contact and the establishment of bilingual practices with the dominant gender-
less language (as in the case of Cappadocian Greek and closely related Asia
Minor Greek dialects).
Coming to the emergence of gender agreement patterns via borrowing, in at
least two of the relevant sampled languages (see Table 8.3), the use of these
constructions is reported to be subject to a considerable amount of intraspeaker
variation (Stolz 2012, with respect to Chamorro), and to be avoided in formal
registers (Jose Ignacio Hualde, p.c., with respect to Lekeitio Basque). The marking
of gender distinctions in adjectives borrowed from Spanish (e.g., altu/alta ‘tall./
tall.’) is widely attested across different varieties of Basque spoken by Basque/
Spanish bilinguals, and is reported to be rather frequent in spoken registers.
Lekeitio Basque (a variety spoken 53 kilometres away from Bilbao) is rather
unique among Basque varieties in that verbs derived from Spanish adjectives
through Basque word formation strategies maintain the overt coding of the
masculine/feminine distinction. This is shown in example (15).
(15) Deadjectival verbs indexing natural gender in Lekeitio Basque (Hualde et al.
1994: 109)
a. morenotu = ‘to become tanned (a male)’ derived from moréno ‘dark
(male)’
b. morenatu = ‘to become tanned (a female)’ derived from moréna ‘dark
(female)’
c. majotu = ‘to become handsome (a male)’ derived from májo ‘handsome
(male)’
d. majatu = ‘to become handsome (a female)’ derived from mája ‘hand-
some (female)’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Gender agreement patterns and gender assignment rules within the noun phrase
are sex-based (as in French). Gender agreement patterns and gender assignment
rules on demonstratives and verbs are animacy-based (as in Cree). As a conse-
quence, all Michif nouns have two lexical genders: either masculine or feminine,
and either animate or inanimate.
The gender agreement system of Michif is illustrated in example (16). (16a)
exemplifies a noun that is masculine according to the French system and animate
according to the Cree system. (16b) exemplifies a noun that is feminine according
to the French system and inanimate according to the Cree system.
and developed from the river trade language Bobangi. Kinshasa Lingala is the variety
of Lingala which is structurally closer to the two ancestor contact languages and
nowadays also the most prestigious variety spoken in the Democratic Republic of
Congo. The gender system of Kinshasa Lingala is rather atypical compared to that of a
‘canonical’ Bantu language. Kinshasa Lingala has only two genders, Animate and
Inanimate, there is no gender agreement within the noun phrase, but only pronouns
and verbs inflect for gender. Nouns are marked by number-sensitive prefixes, the
historical remnants of the original Bantu gender marking system, which is however no
longer productive at the level of agreement marking. Animate and inanimate gender
agreement as marked on subject prefixes is illustrated in examples (18) and (19).
b. Mu-nkanda mu-ko-kweya
3-book 3--fall
‘The book will fall.’
c. Ndako yɔkɔ e-ko-kweya
9.house one 9--fall
‘The house will fall.’
Kusters (2003: 38–9) classifies the functions that language fulfills in its contexts of
use under two main types: the communicative function, which encompasses
instances of language use as a means of depicting states of affairs and communi-
cating them to the hearer as clearly and as efficiently as possible, and the symbolic
function, which encompasses the use of language and language structures as a
means of communicating and reinforcing group identity and speakers’ attitudes.
That speakers may intentionally manipulate language structures and, more
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
¹³ It is worth mentioning that, based on the classification proposed in this chapter, Kafteji counts as
an instance of expansion of gender agreement (and not just retention). In Kafteji, within the domain of
verbal morphology, gender inflections have extended to all singular persons of all past tenses of
intransitive verbs, and all tenses of the verbs for ‘be’ (Stilo 2019: 49–65). Within the sample, similar
developments, which are not generalized to all Northwestern Iranian languages with gender, are also
attested in Eshtehardi.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Even though contact with genderless languages can reasonably account for the
loss of gender in Kelasi, it remains to be explained why loss of gender does not
occur in Kafteji, too. One possibility would be to interpret the retention of gender
agreement in Kafteji as a sort of distancing strategy through which Kafteji speakers
set themselves apart from speakers of all neighbouring genderless languages (Stilo
2019: 75). While this is extremely difficult to prove based on the data at hand,
I would argue that the connection between gender marking and identity marking
should not be a priori ruled out as one of the factors at play in the distribution of loss
and maintenance of gender agreement in Kelasi and Kafteji. Fieldwork data and,
ideally, metalinguistic data from sociolinguistic interviews with speakers of these
languages would be needed in order to investigate this possibility further.
While the two cases briefly discussed in this section do not have any direct
bearing on a theoretical account of morphological and morphosyntactic complex-
ity in the domain of gender marking, they do provide relevant insights on how
patterns of gender marking can be manipulated by speakers under situations of
intense language contact. How these patterns of use may affect the evolution of
gender marking systems and their transmission over time is an open question,
which cannot be answered here.
that are located at the crossroad between families characterized by the presence of
stable gender systems and families characterized by the absence of gender.
Second, both loss and emergence of gender agreement were found to occur in
linguistic areas with long historical records of intense language contact and
bilingual practices between diverse speech communities. However, the data
revealed that, given similar contact scenarios, asymmetries in the structure of
the bilingual population and/or in the prestige dynamics between the languages in
contact tend to favour one development more than the other. Loss of gender
agreement tends to prevail under circumstances in which the demographically
dominant and/or more prestigious language lacks grammatical gender. On the
other hand, borrowing of gender agreement patterns may be favoured when the
demographically dominant and/or more prestigious language has grammatical
gender.
Third, and last, the data suggest that gender marking, which has often been
described as a redundant and seemingly afunctional phenomenon in grammar
(Trudgill 1999; McWhorter 2007), may in fact have important ties to the way in
which speakers and speech communities construe their linguistic identity in
opposition to that of their neighbours. This appears to be even more evident
when, as in the case of Makanza Lingala, gender distinctions and gender agree-
ment patterns that have got lost as a result of natural language evolution are
reintegrated through policies of language planning and standardization.
In order to better frame the relevance of these results, some words of caution
are also in place. The data presented in this chapter are based on qualitative
observations of a small crosslinguistic sample and no claims are made here on the
quantitative significance of these distributions. Moreover, observed associated
distributions between certain patterns of change in the domain of gender marking
and certain sociohistorical factors are not assumed to necessarily imply causation.
In some contexts—for example, the rise of gender agreement through borrowing
or the expansion of gender agreement through language planning—the causal
connection between grammatical changes and social factors is obvious. In other
cases—for example, the reduction and loss of gender agreement in the context of
highly multilingual linguistic areas or concomitantly with historical changes in the
prestige dynamics between languages in contact—the observed distributions often
result from a fine interplay between language-internal dynamics of change and
aspects of the social history of a given speech community. In such cases, no causal
relation was posited, unless this was explicitly argued for in the sources and by the
experts consulted.
The purpose of this investigation was not to establish systematic causal rela-
tionships between the patterns of language change and the sociohistorical factors
in focus, but rather to carry out an exploratory analysis of possible associations
between the two, as observed through a small crosslinguistic sample and via
qualitative analysis. The tendencies unravelled with this procedure could be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
used as hypotheses to be further tested on larger datasets and with the support of
quantitative methods.
The Appendix lists patterns and contexts of change in the domain of gender agreement
for each of the sampled languages. Languages are grouped based on the macroarea and
genealogical unit they belong to. Genealogical units within each macroarea and
individual languages within each genealogical unit are listed alphabetically.
AFRICA
Bantu (Atlantic-Congo)
Kinshasa ling1263 Reduction. Gender Kinshasa Lingala is Bokamba 1977;
Lingala agreement is based on the direct descendant Meeuwis 2013
animacy and of the Bobangi and
restricted to Bangala pidgins,
anaphoric pronouns which, as typical of
and argument contact languages,
markers on verbs. displayed heavily
reduced gender
agreement
morphology.
Makanza ling1269 Expansion. Seven The expansion of de Boeck 1904;
Lingala non-sex-based gender agreement Bokamba 1977;
agreement patterns morphology was Meeuwis 2013
have been implemented via
reintroduced. Gender language planning.
agreement is
extensively marked
within the noun
phrase, on various
types of pronouns,
relative constructions
and verbs.
Ghana-Togo-Mountain (Atlantic-Congo)
Sεlεε sele1249 Retention NA Agbetsoamedo 2014
Igo igoo1238 Reduction, via the Fast-unfolding Gblem-Poidi 2007;
erosion of segmental change, language p.c.
gender agreement attrition due to
morphology. pressure from Ewe.
Ikposo ikpo1238 Loss. No traces of The language is in Soubrier 2013; Ines
gender agreement close contact with Fiedler p.c.
morphology are left Ewe, and is also a
Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
to grammatically
feminine, but
semantically
inanimate, nouns.
Irish (Ros conn1243 Retention NA Frenda 2011
Much)
Khasian (Austroasiatic)
Khasi khas1269 Expansion. Gender Language-internal Anne Daladier, p.c.
marking on development
pronouns, deictic
bases, pre-nominal
clitics, verbs.
Lyngngam lyng1241 Retention. Gender NA Anne Daladier, p.c.
marking on personal
pronouns and deictic
bases only.
Pnar pnar1238 Expansion. Gender Language-internal Anne Daladier, p.c.
marking on personal development
pronouns, deictic
bases, pre-nominal
clitics.
Lezgic (Nakh-Daghestanian)
Archi arch1244 Retention NA Michael Daniel, Nina
Dobrushina (Q)
Aghul aghu1253 Loss. No information Within Nakh- Nina Dobrushina
about the diachrony Daghestanian, gender (Q)
of gender loss. loss is restricted to
the Lezgic branch.
Genderless Lezgic
languages tend to be
neighbours with each
other and to share a
long-term history of
contact with
genealogically
unrelated languages
that also lack gender
(Azerbaijani;
Georgian)
Udi udii1243 Loss. No information Udi does not have Nichols 2003;
about the diachrony any genealogically Wolfgang Schulze
of gender loss related neighbours, (Q).
but it is surrounded
by languages that
lack gender
(Azerbaijani and
Georgian).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Acknowledgements
I am very grateful to one anonymous reviewer, Peter Arkadiev, Francesco Gardani, and
Maria Konoshenko for constructive criticism and insightful comments. I am also
thankful to Kaius Sinnemäki for reading and commenting through previous versions
of this chapter. Any remaining mistakes and shortcomings are my sole responsibility.
This work is the outcome of a project on ‘Gender systems, grammatical complexity
and stability: A crosslinguistic study of language pairs’, funded by the Wenner-Gren
Foundations. Later financial support from the Anna Ahlström and Elllen Terserus’
Foundation is also gratefully acknowledged. The data set examined in the chapter is
the same as the one used by Di Garbo & Miestamo (2019).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
9
Morphological complexity, autonomy,
and areality in western Amazonia
Adam J. R. Tallman and Patience Epps
9.1 Introduction
Adam J. R. Tallman and Patience Epps, Morphological complexity, autonomy, and areality in western Amazonia
In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Adam J. R. Tallman and Patience Epps.
DOI: 10.1093/oso/9780198861287.003.0009
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
¹ However, we note that other domains are also interesting candidates for such an investigation,
such as associated motion (Guillaume 2016). We hope to address more of these in future work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The features of western Amazonian classification systems reviewed here can all be
understood as outcomes of a relatively porous boundary between morphology
and syntax, and highlight its relationship to processes of grammaticalization. The
existence of a large or open class of elements and flexible class assignment
are relatively syntactic characteristics, while the agreement function and the
presence of semantically abstract, phonologically reduced elements within the
inventory are more morphological. The phenomenon of ‘repeater’ classifiers, by
which virtually any noun may fill a classifier slot, facilitates the elaboration of the
³ We note that our use of the term ‘flexible assignment’ in this context should not be confused with
an alternative usage found in the literature on grammatical gender, referring to the possible assignment
of more than one gender to a given noun in order to convey different construals of the referent; for
example, feminine gender with inherently masculine inanimate nouns in Berber communicates a
diminutive meaning. Thanks to Francesca di Garbo for bringing this point to our attention.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
system by easily sucking new elements into the classifier inventory. Once within
the inventory, they begin to undergo grammaticalization, and over time become
more morphological in their form and behaviour. Similarly, the close association
between the derivational function of classifiers and noun compounding (particu-
larly involving part-whole relationships like ‘banana-leaf ’) provides a route for
classifier systems to emerge via the grammaticalization of generic ‘bound nouns’,
as argued by Epps (2007b) for Hup (Naduhupan) and Payne (2007) for Yagua
(Peba-Yaguan; see also Facundes 2000 for Apurinã (Arawakan) and Ospina 2002
for Yuhup (Naduhupan)).
These processes of grammaticalization are highly sensitive to contact. As
Aikhenvald (2000: 383) points out, ‘the more lexico-syntactic the noun categor-
ization is, the easier it is to diffuse’. The development of a classifier system from
noun compounding in Hup and Yagua is attributed in both cases to contact, from
Tukanoan and Boran languages respectively (Epps 2007b; Payne 2007). That
diffusion has been widespread is strongly suggested by the similarities across
western Amazonian classifier systems, and their contrast to canonical systems
elsewhere in the world. In the northwest area, Seifart & Payne (2007: 384–5) note
the ‘close correspondences of—sometimes very specific—nominal classification
structures across Tucanoan, Witotoan, Peba-Yaguan, and some Arawak
languages . . . [pointing to] widespread processes of areal diffusion,’ and van der
Voort (2005) makes a similar observation for the languages of the Guaporé-
Mamoré region in the southwest (see also Krasnoukhova 2012: 263). Similar
trends can be observed beyond the western Amazonian region as well. For
example, the occurrence of possessive classifiers (and particularly their association
with domesticated animals)—otherwise relatively rare in the Americas—is iden-
tified as an areal feature of the Chaco region (Comrie et al. 2010; Campbell 2012;
Ciucci 2014), and Aikhenvald (2000: 383, citing Aikhenvald & Green 1998)
likewise observes the diffusion of possessed classifier constructions in the north-
eastern part of South America, from Cariban into North Arawakan languages. We
may also compare the contact-driven emergence and loss of gender systems in
languages in Eurasia, as explored by Di Garbo (Chapter 8, this volume).
Further evidence of contact in classifier systems involves the more fine-
grained restructuring of existing systems to affect specific semantic values and
morphosyntactic patterns. In the Vaupés region, for example, Gomez-Imbert
(1996) demonstrates how Baniwa (Arawakan) influence has caused Cubeo (East
Tukanoan) to change its strategies for classifying animate entities, from prioritiz-
ing animacy (the Tukanoan pattern) to shape (the Arawakan pattern), whereas
Aikhenvald (2002) reports that Tariana (Arawakan) has experienced exactly the
opposite change under the influence of Tukano (East Tukanoan).
Finally, while the examples above have all dealt with the contact-driven restruc-
turing of native material to fit system-level templates, western Amazonian classifier
systems also provide evidence of direct borrowing of classifier forms. This
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 9.2. Similar classifier forms in Guaporé-Mamoré languages (van der Voort
2005: 397)
Kwaza (isolate) -kalo -ko -su -mãi -mũ -tɛ -nĩ -mɛ̃ -nũ
Kanoê (isolate) -ko -mũ -tæ -nũ
Aikana (isolate) -zu -mũj -mũ -ðãw -nũ
Arikapú -nĩ -mrɛ̃ -nũ
(Macro-Jê)
Nambikwara -kalo -su³ -nũx³
(Namb.)
9.2.2 Tense
Alongside the tendency to proliferation, tense markers often display low select-
ivity and/or bondedness with respect to a host. In Hup, for example, the remote
and proximate tense markers appear only occasionally, and highlight a contrast
between the event time and the reference and/or utterance time. They also are
phonologically free elements in the clause, and while they usually follow the verb
complex, they may also associate with other constituents, and may even act as
demonstrative element and head a noun phase:
Similarly, while graded tense forms in Chácobo are normally bound, they project
their own prosodic word when not adjacent to a verb form (example (7)). Other
tense markers pattern more ostensibly with auxiliaries; some do not even need a head
verb in the same clause, such as the remote future auxiliary/enclitic (example (8)).
9.2.3 Evidentiality
Evidential systems in western Amazonian languages are also among the most
elaborate in the world with respect to the number of distinct categories encoded.
Languages of the Vaupés region, for example, tend to exhibit as many as five or even
six categories, generally along the lines of visual, non-visual, inferred, assumed, and
reported (Malone 1988; Aikhenvald 2000; Epps 2005; Stenzel 2008; Silva 2012), as
illustrated by Hup in example (10). Other western Amazonian languages with
complex inventories include Nambikwara (Nambikwaran; e.g., Lowe 1999) and
Shipibo-Konibo (Panoan, Valenzuela 2003: 35–7; see also Aikhenvald 2004),
while still others have complex systems in which the distinction between evidenti-
ality and modality is not fully clear (Karo (Tupian), Gabas Jr. 1999; Andoke
(isolate), Landaburu 2005).
⁴ We note that some of these forms appear to be internally analysable, so not all are clearly
portmanteau.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593)
R :
-o ‘recent past experiential’ immediate past to about 1 month ago
-ak ‘recent past inferential’ immediate past to about 1 month ago
-aşh ‘recent past conjecture’ immediate past to about 1 month ago
D :
-onda ‘distant past experiential’ about 1 month ago to about 50 years ago
-nëdak ‘distant past inferential’ about 1 month ago to speaker’s infancy
-nëdaşh ‘distant past conjecture’ about 1 month ago to no upper bound
R :
-denne ‘remote past experiential’ about 50 years ago to max. human life span
-ampik ‘remote past inferential’ before speaker’s infancy
-nëdampik ‘remote past conjecture’ before speaker’s infancy
In Hup, for example, the non-visual and inferential evidentials may appear as
either suffixes or enclitics (compare examples (1) and (10b) above). Interestingly,
however, it is the enclitic form that appears to have progressed the farthest along a
trajectory of grammaticalization. As noted above, the suffix forms of both eviden-
tials originated in a morphosyntactic slot that can be occupied by both com-
pounded roots and suffixes, and thus facilitates the reanalysis of the former to the
latter. Then, as argued in Epps (2005), the suffix moved out of the verb complex
via extension to non-verbal predicates and scopal widening, where it developed an
enclitic form that was able to reassociate with verbs in certain contexts. At this
point, the evidential had become fully distinct from the verb root, except in the
contexts where it still exhibited its earlier suffixed form.
In Nanti, on the other hand, the syntax-like behaviour of the reportive eviden-
tial probably does reflect its intermediate grammaticalization from verb to clause-
initial clitic. As described by Michael (2008), the Nanti reportive ke is transpar-
ently related to the verb root kem ‘hear’ and—like other Nanti verbs—is inflected
for person:
As with tense, the high heterogeneity seen in evidential forms among related
Amazonian languages suggests recurrent innovation. Aikhenvald’s (2004: 275–84)
discussion of evidential development in a number of Amazonian languages
includes such sources as verbs (e.g., ‘seem’ > inference in Jarawara (Arawan);
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
‘hear, feel’ > non-visual in Tariana (Arawakan)), nouns (e.g., ‘noise’ > reportive in
Xamatauteri Yanomami (Yanomaman), possibly via noun incorporation), and
other morphology (e.g., declarative–indicative marker > direct evidential in
Shipibo-Konibo (Panoan); past tense markers > reportive/attested in Kamayurá
(Tupi-Guaranian)).
Whether or not contact is responsible for such innovations is often unclear; in
some cases, such as Nanti (Michael 2008), emergent evidential systems do not
appear to be directly contact-driven. However, Müller (2013: 227) observes the
regional clustering of Amazonian languages exhibiting evidentiality, as for
example in the Guaporé-Mamoré (Crevels & van der Voort 2008) and the
Vaupés regions, and evidentiality does appear to be relatively prone to diffusion
crosslinguistically (see, e.g., Aikhenvald 2004: 21). Surveys of Amazonian eviden-
tiality (Aikhenvald & Dixon 1998; Aikhenvald 2004: 292; Müller 2013: 228)
suggest multiple points of independent innovation, from which the phenomenon
has likely diffused more widely. Probably the clearest examples of contact-driven
elaboration of evidential systems come from the Vaupés, in which a number of
unrelated languages have undergone the grammaticalization of native forms to fill
a regionally defined set of categories; this is the case for Hup (see above), Tariana
(Arawakan, see Aikhenvald 2002: 117–29), and Kakua (Kakua-Nukakan; Bolaños
2016), among other languages.
9.2.4 Valence-adjusting
While she suggests that the form originated as a postposition on noun phrases and
entered the verb word via incorporation, she observes that a shift from reciprocal
to comitative function also appears to have occurred in some languages. It seems
likely that the indeterminacy exhibited by Paresi kakoa can be reconstructed to
Proto-Arawakan itself.
In verb-final languages, the subtlety of the distinction between incorporation
and pre-verbal object placement can blur the syntax-morphology divide even
further. In Hup, for example, the ‘interactional’ (reciprocal) verbal prefix ʔũh-,
which originates in the incorporation of the noun ‘sibling’, can occur as a phono-
logically free element with an intervening object argument (see Epps 2008, 2010):
9.2.5 Summary
The studies we have reviewed thus far suggest that Amazonian languages tend to
display a high degree of morphological elaboration in particular grammatical
domains, and that many of these prolific domains show evidence of restructuring
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Our sample consists of eleven western Amazonian languages from nine language
families (see Figure 9.1): Cavineña (Tacanan; Guillaume 2008), Chácobo (Panoan;
Tallman 2018), Hup (Naduhupan; Epps 2008), Jarawara (Arawan; Dixon 2004),
Kokama-Kokamilla (Tupi-Guaranian; Vallejos 2010), Kotiria (Tukanoan; Stenzel
2013b), Movima (isolate; Haude 2006), Paresi (Arawakan; Brandão 2014),
Ashéninka Perené (Arawakan; Mihas 2015), Tariana (Arawakan; Aikhenvald
2003b), and Urarina (isolate; Olawsky 2006). The three Arawakan languages
represent distinct branches of this family. The eleven languages are distributed
widely across western Amazonia, although some (in particular Hup, Kotiria, and
Tariana) are not geographically independent. We have focused on languages with
descriptions that are detailed enough for us to code wordhood properties and
properties of EC for a range of morphemes.
The concept of morphological autonomy developed in this chapter is a relative
one, which we quantify as an index that can vary from language to language.
Accordingly, we need a baseline for assessing how this index ranks in comparative
perspective. While this is a large-scale typological problem, we take a preliminary
step by comparing the Amazonian languages in our sample to Central Alaskan
Yup’ik (CAY; Eskimo-Aleut family). There are three reasons for choosing CAY as
a point of comparison: (i) it is a well-described language with a relatively com-
prehensive grammar and an extensive literature on its morphological and syntac-
tic structure; (ii) it is comparable to Amazonian languages in displaying a high
degree of system complexity in its morphology (i.e., it is a polysynthetic language);
and (iii) it diverges from Amazonian languages in that its morphological and
syntactic structures have been described as easily distinguishable from one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Figure 9.1. Western Amazonian languages sampled
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 9.4. Number of morphemes coded in this study by language and functional
domain
Perene 16 14 14 68 98
Tariana 34 29 14 81 119
Jarawara 5 13 1 0 19
Kotiria 3 0 10 21 34
Urarina 6 15 3 0 24
Movima 32 8 2 111 153
Hup 9 5 5 46 65
Cavineña 17 6 3 0 26
Chácobo 38 20 4 0 61
Paresi 10 11 0 11 32
Kokoma-Kokamilla 6 19 4 0 29
CAY 13 10 2 0 25
⁵ In general, non-linear and syncretic morphology was not evident in the data. Given that the
relationship between morphology and syntax is treated in global fashion in the literature, we did not
address possible variation in this regard among domains; however, this could be an interesting question
to consider in future work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
more equal. Further details concerning the coding methodology and the metrics of
morphological autonomy are provided in the following sections.
The types of EC considered in this study are listed in (15). Each type of EC was
coded as a binomial or ordinal value for the morphemes coded in this study.
Table 9.5. Number of allomorphs per morpheme attested across the sample
Number of allomorphs 1 2 3 4 5
The opposite extreme can be seen in the Urarina causative, which displays no
variation in phonological form—it is always realized as -a:
The difficulty with suppletion in a study such as this one is that many (perhaps
most) linguists have an intuition that suppletive allomorphy is qualitatively
distinct from allomorphy based on productive morphophonological rules.
However, in order to calculate a global EC score we need to change this qualitative
intuition into a quantitative metric. To capture the fact that we view suppletive
allomorphy as a much stronger weight to EC, suppletion is coded as a binary
variable, but one that is weighted relatively heavily (2 for morphemes that display
suppletive allomorphy; and 0 for those that do not). Thus if a morpheme displays
suppletion its EC score will automatically be 4 (number of allomorphs: 2 +
presence of suppletion: 2).
⁶ Other examples involve the obligatory double-marking of a particular operation; for example, the
Tariana passive requires the co-occurrence of the prefix ka- (which elsewhere functions independently
as a ‘relative’ prefix) and the suffix -kana (Aikhenvald 2003b: 259). We did not consider other types of
deviation from biuniqueness (besides allomorphy and multiple expression) because they were found to
be very marginal in our data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
receive such a score are described as syntactic elements (e.g., function words)
or as agglutinative morphemes. Higher EC scores are associated with forms
that deviate from biuniqueness in some way. Our coding was carried out
independently of the grammarian’s structural classification of the morpheme
in question; for example, we include auxiliaries used as analytic causatives as
well as morphological causatives. For this reason, it is unsurprising that a
relatively high percentage of morphemes in even a highly polysynthetic
language like CAY have a low EC score (56%); this outcome simply reflects
the fact that elements Miyaoka (2012) regards as ostensibly syntactic (tem-
poral frame adverbs, evidential clitics, etc.) were coded alongside those he
treats as morphological elements. This strategy gets at precisely what we are
aiming for: we are interested in how morphology and syntax may (or may
not) be distinct in the languages in question, not just how morphemes that
grammarians have categorized as morphological correlate with indices of
morphological complexity.
Table 9.6. Percentage of morphemes for each EC value across the languages sampled
(with average scores across all the morphemes for each language)
⁷ As seen in (15) above, the EC score according to our metric could be higher for any given
morpheme, but in our data set none go above 5.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1 2 3 4 1 2 3 4
Movima Paresi Tariana Urarina
1.5
1.0
0.5
Kernel Density Distribution
1.5
1.0
0.5
1.5
1.0
0.5
1 2 3 4 1 2 3 4
Exponence Complexity
Figure 9.2. Kernel distribution of densities across the languages of this study
We note two points about the EC values across the languages of this study.
First, it is generally true that CAY morphemes are more evenly distributed across
the range of EC scores in comparison to the other languages—in other words, they
are less likely to cluster at any particular EC value, most notably 1 (the lowest).
This is to be expected based on current descriptions of CAY as highly morpho-
phonologically complex, such that affixal elements display a high degree of word
internal adjustments (i.e., fusion; see, e.g., Fortescue 1992); a higher degree of
allomorphy will produce higher EC values. Second, and in contrast to CAY, the
western Amazonian languages sampled cluster predominantly around the lowest
EC value (1)—in keeping with the observation that languages of this region tend
to exhibit a highly agglutinative profile. On the other hand, Movima, Jarawara,
and to a certain extent Kokama-Kokamilla display higher EC levels—a point we
return to below.
Despite the generalizations made here, we emphasize that a higher EC score
does not necessarily translate to a higher degree of morphological autonomy.
Higher morphological autonomy is only corroborated if EC correlates with other
criterial wordhood properties. In other words, morphological autonomy may be
manifested by high EC scores, but high EC scores may not be limited to autono-
mous morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
⁸ For an explanation of this methodology, including the concept of ties in rank statistics, see Kendall
& Gibbons (1992) and Gibbons (1993). For an introduction to using Kendall’s tau in R, see Field et al.
(2012: 225–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
statistical significance. The tau statistic can be read as a measure of the degree of
morphological autonomy that a relationship between EC and a criterial wordhood
property affords. For a given association, a strong positive correlation (a tau
coefficient that approaches 1) suggests a more robust distinction between morph-
ology and syntax; a weak or negative correlation (a tau correlation close to and/or
below 0) suggests a more porous boundary between the two. In this study we judge
a correlation to be significant if the p-value is lower than 0.05.⁹
⁹ Of course, high p-values do not necessarily imply that there is no relationship between the EC
score and a wordhood property (the sample sizes are too small to afford such an interpretation). We
include the information regarding statistical significance for the reader who is interested in gauging
how reliable our results are on this point.
¹⁰ A number of authors have pointed out problems with the minimum free form test (Haspelmath
2011; Bickel & Zuñiga 2017), in particular that it identifies compounds as phrasal elements and certain
function words (determiners) as morphological elements. However, this test is not uniquely problem-
atic among wordhood tests, as Haspelmath’s (2011) systematic review demonstrates. Furthermore, the
test still provides useful information regarding morphological vs. syntactic status; for instance,
Haspelmath (2011: 40) points out that if an element passes the minimum free form test this provides
strong evidence that this element is not an affix. We see non-affixicality as an important criterion in
calculating overall morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
¹¹ A reviewer suggests that contiguity might be better treated as a three-way variable, with inter-
mediate status given to elements that require adjacency in some constructions but not in others. We
concur that this could be a productive approach to explore, but for the purposes of this study it was
found to be too difficult to apply in a consistent way.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
A similar situation occurs with tense morphemes in Chácobo, but the syntagmatic
contexts that license prosodic word projection or incorporation are different: In
this language, a tense morpheme prosodically incorporates into an adjacent verb
root (24a), but projects its own prosodic word when a subject NP intervenes
((24b), repeated from (7a–b) above).
Finally, some grammatical formatives may always project their own prosodic
words, as exemplified by the Jarawara ‘aspect/time lexeme’ hibati ‘completed’
(example (25); Dixon 2004: 223); see also the Hup recent past marker páh in (6) above:
¹² Dixon uses the symbol ‘+’ to indicate what he refers to as ‘a grammatical word boundary within a
phonological word’ (2004: 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
9.3.4 Summary
¹³ One might argue that prosodic independence is more a fact about the phonological or prosodic
component of grammar, rather than having anything to do with the morphology-syntax distinction.
However, the relevance of this criterion is evident in the problem of clitics. As Spencer & Luís (2012)
argue, the clitic can be understood as a ‘boundary category’—which calls into question the discreteness
of the components that it straddles (Croft 1991, 2001). From this perspective, a language with a greater
degree of isomorphism between phonological words and grammatical words would be understood as
having a higher degree of morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(e.g., Anderson 2015a), our study illustrates that their status as necessarily mor-
phological cannot be assumed. This point is best exemplified by Movima, which
demonstrates a relatively high level of EC complexity in comparison to the other
languages in the sample, but coupled with a lower overall tendency for morphemes
to be dependent elements with respect to the wordhood criteria considered here,
particularly bound status. Similarly, while Jarawara comes closest to CAY in
displaying morphological autonomy via its relatively high correlations between
EC and the wordhood measures of contiguity and prosodic dependence, its
association between EC and bound status is weak and non-significant. The
Movima and Jarawara cases demonstrate that deviations from biuniqueness are
in principle orthogonal to the structural classification of form-meaning mappings
as either morphological or syntactic.
9.4 Conclusion
Our findings suggest that a relatively loose distinction between syntax and
morphology is an areal feature of western Amazonian languages (perhaps extend-
ing into neighbouring regions). In this chapter, we have presented evidence for
this view of Amazonian morphological profiles from two major angles. From the
perspective of system complexity, we addressed morphological behaviour across
four domains that show a tendency toward elaboration in western Amazonian
languages—nominal classification, tense, evidentiality, and valence-adjustment—
and for each explored the relationship between complexity and language contact
and change. Turning our focus to EC, we systematically evaluated aspects of this
domain against criteria associated with wordhood for a sample of eleven western
Amazonian languages, plus CAY as a point of contrast. In addition to showing
that the Amazonian languages all exhibit relatively low degrees of morphological
autonomy, our findings highlight the important point that factors associated with
morphological complexity are in fact not necessarily morphological: for two
Amazonian languages in our sample, high EC does not correlate strongly with
wordhood status. In future work, we hope to expand the typological scope of this
survey, in order to establish the degree to which Amazonian languages might
deviate from a more widely defined baseline relating to morphological autonomy,
and to determine a more precise understanding of the geographic distribution of
these patterns within and beyond South America.
The low degree of morphological autonomy in western Amazonia has import-
ant implications not only for our understanding of synchronic relationships
among linguistic subsystems, but also for our conception of diachronic processes
of contact and grammaticalization. As we have argued here, the porous nature of
the morphology-syntax distinction in Amazonian languages is associated with
other areal tendencies, such as productivity of compounding and incorporation,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Acknowledgements
Epps gratefully acknowledges funding from the University of Texas at Austin, as well as
earlier support from the National Science Foundation, Fulbright-Hays, and the Max
Planck Institute for Evolutionary Anthropology for work on Hup; Tallman thanks the
National Science Foundation and the Endangered Languages Documentation
Programme for supporting his work on Chácobo. We are grateful to the editors of this
volume for inviting us to contribute, and to Peter Arkadiev, Francesca di Garbo, Tony
Woodbury, and an anonymous reviewer for their suggestions.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
III
THE ACQUISITIONAL
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
10
Radical analyticity as a diagnostic
of adult acquisition
John H. McWhorter
10.1 Introduction
By radical analyticity, I refer to absence (or all but absence) of inflectional marking
indicated by affixation, tone, or vowel changes in quality or length. The difference
must be clear with relative analyticity, which linguists often refer to as ‘analyticity’
in a kind of shorthand, such as Nurse (2007) referring to the amply inflected
Supyire (Gur, Niger-Congo) as ‘analytic’ in comparison to especially inflected
languages like those of Narrow Bantu.
My hypothesis distinguishes two kinds of language contact effects: transfer
and structural simplification (although the two are hardly mutually exclusive).
The role of transfer in language contact would seem self-evident and is richly
studied. However, the role of simplification in language contact has been studied
more in regard to pidgins and creoles than to less extremely simplified lan-
guages. Kusters (2003) and McWhorter (2007) were pioneering explorations of
this intermediate range in a crosslinguistic sense, continued by the now seminal
Trudgill (2011).
John H. McWhorter, Radical analyticity as a diagnostic of adult acquisition In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John H. McWhorter.
DOI: 10.1093/oso/9780198861287.003.0010
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
This presentation proposes that there are three main geographical clusters of
radically analytic languages with extensive adult acquisition in their histories.
The first is the few Niger-Congo languages that are radically analytic, such as
the Gbe languages, Yoruba, and Nupe (henceforth GYN), which my hypothesis
suggests would have arisen from an earlier Niger-Congo variety with ample
inflection. Yoruba’s near lack of inflectional morphology of any kind is
indicated here:
(1) Yoruba
Mo mú ìwé wá fún ẹ.
I take book come give you
‘I brought you a book.’ (Stahlke 1970: 63)
(3) Men ben suk no nggwe yo, men ben suk sino.
we do thing garden then we do thing together
‘If we do things at the garden, then we do them together.’ (Berry & Berry
1999: 23)
Finally, the Sinitic languages can be seen as revealing, in their radical analyti-
city, adult acquisition in their past (cf. McWhorter 2016). The radical analyticity
in language families neighbouring Sinitic, such as Hmong-Mien, Tai-Kadai, and
Mon-Khmer, is often treated as an areal ‘Sinosphere’ feature. I suggest that within
this language area, the radical analyticity, at least, traces to Sinitic. This recon-
struction is especially compelling given that Mon-Khmer languages are most
analytic where Chinese has had influence, and much less so where it has not,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
among the Munda languages to the west and Aslian languages to the south. Under
this analysis, the question becomes how Sinitic itself reached a radically analytic
state in the first place, upon which I argue that adult acquisition is the most
plausible cause.
Under the analysis I will present, the GYN languages likely reached their state
as the result of waves of second-language acquisition as an earlier Niger-Congo
variety travelled southward towards the coast of the Bight of Benin. Various
isolates, as well as the Mande and Ijo groups Dimmendaal (2011) has argued
not to be members of Niger-Congo, are likely remnants of the original language
distribution in upper west Africa. The Flores languages were likely affected
by invasions from Sulawesi (or possibly the aboriginal population of Homo
floresiensis). Hull (1998) makes a strong case that the Timor languages were
deeply impacted by an invasion from the island of Ambon, while Paauw (2007)
suggested that contact with Austronesian as its speakers migrated eastward
affected the languages in Papua. The reason for the analyticity (and in general
the radically isolating structure) of Old Chinese remains unknown, although
DeLancey (2011) and McWhorter (2016: 81–2) offer suggestions—under an
analysis which, we must recall, posit the nature of Old Chinese as an indication
of adult acquisition yet to be identified.
In modern linguistics, many linguists are sceptical of the idea that the develop-
ment of even radical analyticity necessarily entails a loss of overall morphological
complexity. A guiding caveat is that what was once marked by an affix (or clitic)
can later be marked by a free morpheme, or even a process on some other level of
the grammar such as syntax (e.g., via word order). While this is true, any
assumption that this kind of replacement is somehow regular or even obligatory
in diachronic development is (i) logically unmotivated (i.e., for what reason or
purpose would grammars ‘compensate’ in this way towards an unspecified sine
qua non degree of structural complexity?); and (ii) empirically disproven (Shosted
2006 disproves that languages compensate for loss of complexity in one module by
gaining it in another).
Thus the development of radical analyticity is not a mere matter of a language
transforming its typology in a fashion independent of complexity. Rather, the
languages addressed in this chapter have lost, or all but lost, overt indication of
case marking and concord in any module. They do not mark these with free
morphemes. Moreover, while of course they have syntactic processes sensitive to
the distinction between, for example, subject and object, these are not as obliga-
torified (in the terminology of Lehmann 1985) as affixal markers of these categor-
ies tend to be, often qualifying more as pragmaticized structures rather than
grammaticalized ones.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Similarly, indeed noun class markers can be replaced with free morphemes as in
Wolof (Loporcaro, Chapter 6, this volume), and numeral classifiers in languages
such as many in East and Southeast Asia can be seen as functionally equivalent to
noun class marking (Grinevald & Seifart 2004). However, the free morphemes in
question neither vary for case—much less according to declensional classes indicat-
ing this case variantly—nor vary in form between modifiers and heads as affixal
noun class marking often does (Russian iz krasivyx ženščin ‘of the beautiful women’).
Similarly, while radically analytic languages indicate inherent inflectional cat-
egories such as tense and number with free morphemes, these free morphemes do
not occur in paradigmatic variants independent of semantics, in the vein of verb
conjugational affix paradigms. Furthermore, it would appear that affixation,
complete with the morphophonemic processes it encourages as well as distortions
into outright irregularity beyond, conditions much more irregularity—another
facet of complexity—than free morphemes do. The ‘irregular verb’ is quite rare in,
for example, Yoruba, Mandarin, and Rongga, where there are no affixal markers of
inherent inflection likely to drift into morphophonemic subrules, thorough
irregularity subject to no rule, and then utter suppletion.
Radical analyticity, that is, is less a change of type than an unravelling. Radically
analytic languages remain vastly complex in countless ways, as all languages are.
However, their radical analyticity does entail a significant degree of relative
simplification.
That is, I propose that we would no more question whether Yoruba, Rongga, or
Mandarin have extensive adult acquisition in their histories than that we would
question whether the difference between Haitian Creole French and French—loss
of grammatical gender, verbal inflection, and much else—were due to extensive
adult acquisition:
(4) a. French
Ils n’ont pas de ressources qui puissent
3. -have resource. can.3
leur permettre de résister à la famine.
3. allow of resist to . famine
b. Haitian Creole
Yo pa gen resous ki pou pèmètyo reziste anba
3 have resource can allow 3 resist under
grangou.
famine
‘They didn’t have the resources that would allow them to hold off famine.’
(Ludwig et al. 2001: 164)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
One indication of radical analyticity’s roots in adult acquisition rather than ‘drift’
is that in languages that have reached such a state, one type of inflection is
eliminated entirely or virtually so, while another type is retained in the form of
free morphemes. This is typical of adult acquisition, but not of grammar-internal
change.
Booij (1993) distinguishes inherent inflection from contextual inflection.
Inherent inflection contributes meaning, driven by the speaker’s choice of what
they wish to communicate. It thus includes nominal number, tense, and aspect,
and is not required for syntactic grammaticality. This contrasts with contextual
inflection which, indicating features such as case and concord necessary to the
syntactic composition of the sentence, has function.
Crucially, in creoles, the lexifier language’s inherent inflection is typically
preserved to a considerable extent in the form of free morphemes, such as pre-
verbal tense and aspect particles (even when the substrate languages were syn-
thetic, as was the case with many creoles; cf. section 10.6 below). However,
contextual inflection is typically not replaced in this fashion (Plag 2008; Luís
2009). In this Haitian sentence, French’s past tense inflection is replaced by the
free form te, but the nouns baay ‘thing’ and moun ‘people’ are not marked for
grammatical gender as their French equivalents are, nor is grammatical gender
marked on Haitian’s definite articles; also, pronouns such as li (here, ‘it’) are not
marked for case:
1. French has lost Latin’s case inflections on nouns (first collapsing the oblique
cases into one and then losing even this distinction), but retains case
distinctions in pronouns, and concord within NP.
2. Pashto has lost much of the inflection in early Iranian languages, but
nevertheless retains ample case marking and concord.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3. Within Niger-Congo, while Wolof lacks the noun class prefix paradigm
typical of Bantu and even many of its own relatives within the Atlantic
subfamily, it has replaced them with postposed free morphemes (Torrence
2013: 16; cf. also Babou & Loporcaro 2016, Loporcaro, Chapter 6, this
volume), as shown in Table 10.1.
4. Modern Armenian dialects retain Indo-European case marking as well as
inflections distinguishing declensional classes; Albanian also retains case
marking as well as grammatical gender. Adult acquisition is not assumed to
have been significant in the timelines of either of these branches of Indo-
European, as opposed to in Romance and Germanic.
5. Georgian has retained the contextual inflection of Proto-Kartvelian over
several millennia.
These cases serve to illustrate, as Nichols (1992: 169) indicates, that ordinary
grammar-internal change poses no threat to contextual inflection. The contrast is
clear with the extent to which adult acquisition indeed does so.
As such, the fact that radically analytic languages like the GYN ones and those
of central Flores like Rongga retain free morphemes in the function of inherent
inflection, but eschew contextual morphology completely, suggests that they have
roots in non-native acquisition, under which learners had access to inherent
morphology rather than contextual because inherent inflection is more like
derivational morphology, as in more ‘lexical’, and thus more salient to the non-
native learner. This distinction is the one reflected in borrowing as described by
Gardani (2008, 2012, 2018).
Thus, a sentence like the one below in (example (6)) Fongbe contrasts with a
Swahili one not only in encoding aspect with a free morpheme, but in lacking
either bound or free noun class morphology:
(6) Fongbe
Àvún ɔ́ nɔ hàn àɖú mὲ.
dog bite tooth person
‘The dog bites people.’ (Lefebvre & Brousseau 2002: 266)
(7) Swahili
U-levi hu-ondoa akili.
-drunkenness -remove .sense
‘Drunkenness takes away sense.’ (Perrott 1950: 56)
The languages in central Flores such as Rongga lack these prefixes, and case
marking, but mark tense and aspect with free morphemes:
I will present three observations which, together, suggest that a language can only
lose all of its bound inflection via external intervention.
First, the emergence of new grammatical items via grammaticalization pro-
cesses, as well as reanalysis, is a constant in the life cycle of a language. More to the
point, there is no indication in the grammaticalization literature that the process
only operates in a subset of languages, or that the process is given to halting for
long periods. Grammaticalization can be taken as equivalent to the movement of
bodies in the theory of physics: just as stasis under this formulation is irregular, we
can assume that in language change, the cessation of grammaticalization, indicates
the death of the language. To wit, grammaticalization is unceasing.
Second, following from this point is that there is no reason that while a
language were losing bound inflection, the development of new inflection via
grammaticalization would not be occurring simultaneously. Put differently, dia-
chronic theory knows no reason that there would be such a cessation. Moreover,
empirical evidence demonstrates its opposite. In Romance, the erosion of Latin’s
future marking suffixes was paralleled by the emergence of new ones from the
grammaticalization of habere ‘to have’ (as well as a new conditional marking
paradigm). Also, Italian developed new noun inflectional classes as original
ones were lost (Gardani 2013). In the Kartvelian language Svan, declension
marking suffixes proliferated amidst its loss of some of Common Kartvelian’s
original concord machinery (Harris 2004: 152–5). In Swahili, past marking prefix
li- grammaticalized from a locative verb as the Common Bantu equivalent
a- (Nurse 2008: 257) wore away (McWhorter 1994: 62–3). Affixes and paradigms
change function as often as they disappear (cf. Mukarovsky 1977: 32–5; Harris &
Campbell 1995; Good 2012a).
Third, following in turn from the above point, languages do not ‘cycle’ through
stages of radical analyticity followed by the development of new inflections which
eventually wear away such that the cycle begins again. That linguists sometimes
suppose so would seem to be due to a ‘folk’ interpretation of Hodge (1970) on
Egyptian, which actually showed a phase of relative analyticity, nothing approach-
ing radical. Meanwhile, no cycle through radical analyticity has been demon-
strated elsewhere. As Dahl (2004: 261–88) notes, the absence of such a cycle has
been explicitly noted in Afroasiatic, Uralic, and Altaic, and meanwhile specialists
in language groups worldwide report no such cycles.
In sum, grammaticalization is analogous to crocodiles’ and fishes’ teeth, which
are continually replaced throughout life. These animals do not ever reach a
toothless stage. If one were encountered toothless, we would know that this was
the result of an external disruption. We would neither venture that it was a normal
development nor expect it to develop a mouthful of new teeth overnight.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
With the three above observations, grammars such as Yoruba, Mandarin, and
Rongga become puzzles. Adult acquisition is the only mechanism which has been
empirically documented to shave away all or almost all of a language’s bound
inflection. There are no documents of radical analyticity’s emergence in East and
Southeast Asia, Indonesia, or West Africa: in all cases, the languages are radically
analytic by the time they were committed to writing. I suggest that a solution to
the puzzle that these other languages pose is that they, too, were born of adult
acquisition.
1. This account would neglect that bound affixation often includes vowel
changes within the root. A great deal of English’s inflectional morphology,
for example, is indicated with the root vowel changes in the past forms of
verbs. Even if destressing the final syllable had denuded English of all
inflectional suffixes, the vowel changes in the strong verb roots would
have remained.
2. Lack of stress on the final syllable is not as regularly destructive of inflectional
morphology as often supposed. Withdrawal of stress from the final syllable is
common in Indo-European, and usually the result has been languages that
have remained richly suffixed. Baltic and Slavic preserve a great deal of
Proto-Indo-European nominal morphology, and yet, for example, West
Slavic fixed its accent on the first syllable several centuries ago. Armenian
has fixed the accent on the penult, and yet retains a rich declensional system
and robust verbal inflection. A considerable degree of unaccented word-
final inflection has survived in Icelandic. In Celtic, when the accent was
retracted from endings, Goidelic (such as Irish and Scots Gaelic) retained
much verbal inflection and a degree of nominal. We must also consider the
Romance languages other than French, such as the Iberian languages and
Italian, in which unstressed inflectional suffixes are prolific and robust.
mean that it is incompatible with human cognition. In fact, many would recon-
struct that language emerged uninflected (e.g., Comrie 1992). However, the
development of grammatical affixes is a slow process. As Dahl (2018) notes, a
mere few inflectional affixes are documented to have emerged in Europe over the
past 2,000 years.
Thus while a single instance of disruption, such as the inmigration of a large
population of adult learners, can eliminate a language’s bound inflection (many
creoles) or vastly reduce it (English) in one stroke, the nature of grammatical-
ization conditions no reason to suppose that new affixes would emerge immedi-
ately. In fact, theoretically, this is what we would not expect.
Yet the radically analytic languages I have referred to do show signs of
grammaticalization, albeit the forms are not yet bound ones. This, too, is what
we would expect, and would find puzzling if absent. In Fongbe, an imperfective
marker wὲ has emerged, likely from a postposition, which in the modern language
could be treated as an inflection:
In Palu’e in Central Flores, a new first-person singular subject marking clitic has
developed (Donohue 2009).
In Mandarin, since the seventh century (Li & Thompson 1976), the marker
bǎ has emerged from the meaning take:
In a future stage of Mandarin this, as well as other items that cleave closely to roots
such as nominalizer zi, could become bound morphemes.
Also, in Mandarin, the modern usage of numeral classifiers began developing in
the second century (Norman 1988: 115–17), and diachrony has rendered
them quite often semantically unpredictable. Zhī is used with animals (although
only some of them) and birds, but is also used with eyes, hands, suitcases, and
boats. Tiáo is most immediately identified with long, thin things; less likely to
come to mind is that it is also used with proposal, voice, scheme, and ‘piece of
news’. Bă is used with things that one holds such as knives and teapots, but also
with chairs—and the experience of aging (niánjì). As such, Gao (1998) notes that
Mandarin speakers’ mental representation of classifiers is subdivided between
three classes of association, one transparent, one prototypical (metaphorically
extended in a synchronically processible fashion) and one arbitrary. This can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Because linguists tend to be familiar with analyticity from the textbook example of
Chinese, as well as from any acquaintance with creole languages, it can seem that
analyticity is as likely a state in a language as any other.
However, this is not true when it comes to, especially, radically analytic
languages. Outside of creole languages, where we take it as uncontroversial that
adult learning was the cause of the analyticity, radically analytic languages are
actually rare. Donohue & Denham (forthcoming) in their survey World Atlas of
Language Structures, find none outside of the areas I have cited.
If we treat Sinitic as about ten languages, Hmong-Mien as about twenty (a high
estimate according to most accounts), Tai-Kadai as about a hundred according to
Ethnologue, and treat about 130 of the 168 Austroasiatic languages tabulated by
Ethnologue while subtracting Munda and Aslian (again, yielding a likely high
tally), then in East and Southeast Asia there are about 260 radically analytic
languages. Furthermore, the analyticity of these can be treated as tracing to the
analyticity of Chinese alone (McWhorter 2016). In the meantime, outside of these
languages, the tally of radically analytic languages in Africa, Flores, Timor, and the
island of New Guinea is about three dozen at most.
How often the linguist encounters sentences of Mandarin, plus how familiar
creole languages have become within the field, can distort our sense of the bigger
picture. There would appear to have never been reported a radically analytic
indigenous language in:
A feature manifested in a mere few hundred of the world’s 7,000 older (as
opposed to creole) languages qualifies not as an ordinary result (‘Language
X simply lost its inflections’) but as an unusual circumstance. This is even more
the case if the feature manifests itself in solely a few dozen of 7,000, the result if we
count the analyticity of the Sinosphere as an areal feature spread from Chinese. It
is clear that radical analyticity is not a state that a language reaches easily and, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
fact, everything we know about how languages transform over time makes it
difficult to see how such a state could occur in stepwise fashion.
However, the origin of radical analyticity in acquisition by adults is richly
observed and thoroughly predictable. The scientific benefit of cordoning off creoles
into this origin scenario while assuming an unspecified different one for other
radically analytic languages is unclear. This bifurcated approach would be appro-
priate if there were evidence that large-scale acquisition of a language by adults
was impossible before the emergence of the transatlantic slave trade in the
fifteenth century . Obviously, however, there is not. Rather, we could treat
creoles as revealing to us how other languages reached a state which, according to
observable processes of stepwise grammar-internal evolution, is a mystery.
In short, the common idea that a given language simply ‘lost its inflection’ is
less coherent than it seems. Lack of stress on final syllables vastly undershoots
what would be necessary for a language to reach a radically analytic state, and
languages are not empirically recorded to undergo such a process short of
extensive acquisition by adults.
I will finally discuss two counterproposals to my reasoning.
The implication is that these creoles are analytic simply because their source
languages were: as Mufwene specifies (2009: 386), ‘The extent of morphological
complexity (in terms of range of distinctions) retained by a “contact language”
largely reflects the morphological structures of the target language and the par-
ticular languages that it came in contact with’.
However, an equal number of creoles are based on robustly inflected Iberian
languages, and/or have robustly inflected substrate languages such as Bantu, West
Atlantic, Nilo-Saharan, and even Austronesian languages, and yet are as analytic
as Sranan and Haitian. Linguists supporting the Feature Pool hypothesis have yet
to respond to such observations, such as that while Palenquero Creole Spanish was
created by Kikongo speakers, such that both of the languages in the ‘pool’ were
heavily inflected:
(14) Spanish
Est-a-s piedr-a-s grande-s y blanc-a-s
-- stone-- big- and white--
son las que hemos visto.
.3 .. have.1 see..
‘These great white stones are those which we have seen.’
Palenquero is yet a highly analytic language. The facts are similar with all of the
Portuguese-based creoles, as well as Nubi Creole Arabic and the Aboriginal
English-based creoles of Australia. Chinook Jargon creolized as well, and despite
its source languages all being richly inflected, the creole version was as analytic as
Sranan and Haitian (Grant 1996). Adherents of the Feature Pool hypothesis have
not responded to such observations, and it is difficult to see how their framework
could accommodate them.
In this presentation, therefore, I maintain on the basis of the argumentation
I have presented that adult acquisition does play a decisive and diagnostic role in
creole genesis. My aim is to extend this analysis to languages other than creoles.
languages, usually monosyllabic or at most bisyllabic, and the heavily affixed ones
in Narrow Bantu languages was the development of a phonological template
disallowing verbs of more than two syllables.
I suggest that extensive adult acquisition is a preferable explanation for both the
GYN verb’s lack of inflection and its phonotactics.
For one, the process Hyman describes has been proposed, to my knowledge,
nowhere else. Hyman’s account, in that light, is more descriptive than explanatory.
That is: the literature on language change does not record it as a crosslinguistic
commonplace that languages permitting richly multisyllabic words gradually take
on a phonological ‘template’ limiting words to one or two syllables, with this treated
as an ordinary phonological development alongside processes such as nasalization
or resyllabification. I submit that an adult acquisition account has more explanatory
power.
Second, the templatic account contravenes the tendency for languages to resist
letting phonological processes eliminate grammatical morphemes. Hyman’s
account requires that speakers of a language ‘drifted’ into a disyllabic or mono-
syllabic restriction even on the pain of eliminating grammatically crucial affixes,
replacing them with free morphemes—despite linguists’ well-known findings that
speakers resist phonological erosion when it threatens grammatical morphemes
(cf. Guy 1991; Carstairs-McCarthy 2010). Counterproposals to some reported
cases of this morphologically conditioned sound change (Hill 2014) have not
disproven the tendency itself.
Third, pidginization, specifically, explains the GYN situation as well as a
templatic explanation, and even better, in proceeding from an empirically
observed phenomenon. To wit, the reason words might become radically, as
opposed to modestly, shorter in a language, to such a degree as to force a vast
restructuring of the grammatical system, is the language’s transformation by non-
native acquirers who are less likely to master lengthier words (as well as gram-
matical features). To the extent that the GYN languages restrict their verbs to a
maximum of two syllables, it is relevant that, as pidgin specialist Mühlhäusler
(1997: 140) puts it, ‘There appears to be a tendency in most stable Pidgins,
whatever their sub- and superstrata languages and whatever their jargon prede-
cessors, to favour open syllables and words of the canonical shape CVCV.’
10.8 Conclusion
My goal has been to demonstrate the arguments for, and advantages of, assuming
that radical analyticity traces solely to extensive adult acquisition. Under this
analysis, radical analyticity sparks a search for sociohistorical factors that would
entail such adult acquisition. The processes in question occurred before written
history (otherwise, they would long have been readily apparent) and therefore the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
11
Different trajectories of morphological
overspecification and irregularity under
imperfect language learning
Aleksandrs Berdicevskis and Arturs Semenuks
11.1 Introduction
In the Introduction, Arkadiev & Gardani (Chapter 1, this volume: 5) list four most
important open questions in the study of morphological complexity. In our view,
the first three questions become important and interesting only as a means to
answer the fourth question, which could be reworded as ‘How is morphological
complexity related to socioecological factors?’. The true value of this question is
not even that it relates morphology and extralinguistic characteristics of the
environment in which the language is spoken, but that it makes complexity
more than a mere parameter of crosslinguistic variation. Complexity becomes a
parameter involved in explanatory theories, giving us the possibility to use it in
order to understand how language is structured. As was discussed in the
Introducton, in these theories complexity is a dependent variable, while socioeco-
logical parameters are predictors. This means that if the theories are correct, we can
better understand why linguistic structures are distributed across languages the way
they are, how the processes of language change and social interaction are structured
and work together, and how language is organized and functions in the brain.
If not for this explanatory attempt, the first three questions from Arkadiev and
Gardani’s list (Can we define morphological complexity? Can we find an under-
standing of morphological complexity which would be applicable to all languages
and quantify this understanding? Can we compare and typologize languages in
terms of morphological complexity?) would, in our view, be better described as
brain teasers rather than research avenues. Brain teasers are not at all useless, but
given how notoriously difficult it is to address these particular questions, it would
hardly be possible to expect that the potential benefit of finding answers would
outweigh the required effort. Arkadiev and Gardani provide examples which
Aleksandrs Berdicevskis and Arturs Semenuks, Different trajectories of morphological overspecification and irregularity
under imperfect language learning In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Aleksandrs Berdicevskis and Arturs Semenuks.
DOI: 10.1093/oso/9780198861287.003.0011
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Having this incentive to deal with all the questions from Arkadiev and Gardani’s
list, let us briefly outline what we mean by complexity in this chapter. As most
would agree, complexity is a multi-faceted phenomenon, and a language can be
complex in several different ways. This volume contains a variety of perspectives
on and approaches to complexity, see Dahl (Chapter 13, this volume) for an
overview. Trying to tackle all aspects of it simultaneously, however, is likely to
hinder progress rather than aid it. In order to usefully limit the scope of this
particular investigation, we will concentrate on two of the facets of complexity that
are, in our view, most crucial: overspecification and irregularity.
We define overspecification as overt and obligatory marking of a semantic
distinction that is not necessary for communication, following McWhorter’s
(2007: 21–8) understanding. The problem with this definition is that it is not at
all obvious what is necessary for communication. McWhorter makes inferences
about what is necessary by comparing the grammars of different languages. If
many of the world’s languages have neither subject-verb agreement nor any
apparent means to compensate for the lack of it, it seems reasonable to hypothe-
size that this feature is redundant and that languages that do possess it have
overspecified grammars.
A more direct way to find out what is necessary would be to run psycholin-
guistic experiments. MacWhinney et al. (1984), for instance, find that Italian
speakers do use the subject-verb agreement markers when establishing semantic
roles in a sentence. Note that this finding does not necessarily contradict the claim
that agreement is an instance of overspecification. That a feature is useful does not
mean it is necessary. Fortunately, in this chapter we will be dealing with an
artificial language where it is obvious what is overspecification and what is not
(see section 11.2).
Another facet of complexity we will discuss is irregularity (McWhorter 2007:
33–5). A linguistic system is irregular to the degree that it cannot be described by
exceptionless deterministic rules. Such a system can also be described as predict-
able and consistent. Intuitively, it is usually quite obvious whether a linguistic
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The IALL approach does have its limitations. The possibility to observe language
change in the laboratory and to have the full control over the environment comes
at the price of naturalness. The artificial languages are by necessity small and
relatively simple, and the learning usually takes less than one hour. Nonetheless,
while the experimental results should be treated with due caution, they can be a
valuable complement to the typological surveys.
Suppose a typological study shows a correlation between a proportion of non-
native speakers and absence of inflectional morphology, and suppose its data and
methods are completely reliable and trustworthy. Even in this best-case scenario,
we still do not know whether there really exists a causal link between non-native
acquisition and simplification (though we have good reasons to hypothesize that).
Moreover, we do not get an insight into how exactly adult acquisition facilitates
simplification (if it does). An iterated learning experiment can serve as a means
both to test the presence of the causal link and to identify a potential causal
mechanism.
Bentz & Winter (2013: 3–4) list three potential mechanisms of contact-induced
case loss (which can be generalized to other instances of morphological simplifi-
cation): imperfect acquisition by adult learners; the tendency of native speakers to
reduce morphosyntactic complexity of their speech when talking to foreigners; the
tendency of loan words to combine with more productive inflections, forcing the
least productive ones out (Barðdal & Kulikov 2009). The first mechanism from
this list seems to be mainstream in the typological, sociolinguistic, and evolution-
ary literature (Nettle 2012). Indeed, in the literature on language acquisition, there
is a consensus that morphology is hard for non-native learners, and that concerns
both production and perception, both tutored and untutored learners (DeKeyser
2005: 6–7).
The main factor causing simplification then is presumed to consist in the
differences between native (child) and non-native (adult) language acquisition.
However, given this, another question arises: what aspects of these differences and
what conditions are necessary to cause simplification? How deep into these
differences do we have to delve in order to find a proper explanation? It is possible
that deep differences in cognitive biases between children and adults have to be
invoked, together with nuanced properties of social network structure or other
cognitive processes besides learning.
However, it is also possible that the answer lies on the surface: children can
(usually) master a language perfectly, while adults (usually) cannot (Bley-Vroman
1989: 43–4), and that by itself is enough to provoke simplification processes. It
seems safe to claim that imperfect learning is one of the driving forces behind
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
simplification. Can we go further and assume it is the only driving force? While
this hypothesis may be too simplistic, it is reasonable to start the search for
explanations and mechanisms by testing it.
In this chapter, we analyse the data from Berdicevskis & Semenuks (submitted),
one of the largest-scale (in terms of the number and the length of transmission
chains) IALL experiments so far that directly address linguistic complexity. In
Berdicevskis and Semenuks (submitted) we showed that imperfect language
learning by itself reduces overspecification. Here we focus on irregularity (see
1.2) and show that it behaves differently from overspecification. We also investi-
gate how the two facets of complexity interact with learnability of the language.
In section 11.2, we summarize the methodology of Berdicevskis & Semenuks
(submitted). In section 11.3 we describe the trajectory of overspecification, and in
section 11.4, that of irregularity. In section 11.5, we draw on the existing know-
ledge about language acquisition to explain the observed differences. In section
11.6, we conclude.
permanently interrupted condition chains all participants after the first generation
received less time. A more detailed description is given in section 11.2.2.
Before that, however, we want to note the apparent fact that the IALL approach
lacks ecological validity due to a variety of both quantitative and qualitative
differences between language learning in an experimental setting and in the real
world. Because of that, the claims that one makes based only on IALL experiments
need to be tempered. Taken as a piece of a larger picture, however, they provide
important supporting evidence and new perspectives on the questions of interest.
In the context of the current study, in particular, although we ultimately are
interested in differences between native and non-native acquisition, we are not
contrasting adult and child learners in our experiment. However, since we are
interested in whether the difference between normal and imperfect learning by
itself can be a sufficient cause for morphological simplification, we consider our
model to possess the necessary external validity.
singular
segN fuvN
event: none
plural
segN-lPL fuvN-lPL
singular
plural
segN-lPl mV-oAGR fuvN-lPL mV-iAGR
singular
segN rV-oAGR fuvN rV-iAGR
event: grow
antlers
plural
segN-lPL rV-oAGR fuvN-lPL rV-iAGR
singular
segN bV-oAGR fuvNbV-iAGR
event: fly
plural
segN-lPL bV-oAGR fuvN-lPL bV-iAGR
Figure 11.1. The meaning space of the experimental languages with the
corresponding sentences from an example generation 0 language
Notes: Subscript N denotes noun stems, V = verb stems, PL = plural marker, AGR = agreement marker.
Morphemes are hyphenated and subscripts are provided for clarity’s sake. Glosses for the meanings of
the sentences are provided in parentheses.
Source: Adapted with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
After the initial introductory instructions, the participants learned the language in
the training stage of the experiment. The stage consisted of a number of training
blocks interspersed with interim test blocks. In the training blocks, the partici-
pants saw all of the pictures from the meaning space, which were presented in a
random order and accompanied by the sentence corresponding to the picture in
the participants’ input language. Each picture-sentence pair remained on the
screen for four seconds, after which the next pair appeared. In the interim test
blocks the participants were shown one by one eight pictures randomly selected
from the meaning space and were asked to type in the corresponding sentences for
each of them. The instructions preceding the training block prohibited the
participants to take any notes during the experiment.
In order to model the difference between normal and imperfect learning, we
manipulated the number of training and interim test blocks that the participants
received. Normal learner generation participants received six training blocks,
whereas imperfect learner generation participants received three blocks. In order
to investigate how the amount of imperfect learners in a population would affect
the tendency to eliminate morphological overspecification from the language
spoken by its members, we compared the development of generation 0 languages
in transmission chains in three different conditions: normal, temporarily inter-
rupted and permanently interrupted. Figure 11.2 illustrates the differences in the
numbers of normal and imperfect learner generations between the conditions.
Since the experiment contained 15 generation 0 languages, each of which was
used once in each of the three experimental conditions, and each of the
Normal transmission
L L L L L L L L L L
Figure 11.2. A schematic representation of the chains in the normal (a), temporarily
interrupted (b), and permanently interrupted (c) conditions
Notes: L = generations with long (full) learning time, S = generations with reduced learning time
(imperfect learners). Arrows denote languages transmitted between generations. The very first arrows
denote pre-generated input languages for the first generation learners.
Source: Reproduced with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The qualitative analysis of the final languages revealed a general trend for the
structure of the languages to deteriorate. Several reasons could have led to this,
most likely the underestimated difficulty of learning the language even with six
training blocks and the absence of true communicative pressures in the experi-
ment. However, it was not the case that this deterioration of structure was equally
likely to affect all aspects of the language and was equally likely to affect chains of
all three conditions. The agreement system was eroded by the participants much
more often than the other morphological aspects of the system, and this erosion of
structure was less frequent in the chains with normal transmission. Nonetheless, it
is important to keep in mind that the learning was not entirely perfect in normal
condition either. Thus, when speaking about imperfect learning we will mean the
degree of imperfect learning rather than its presence or absence.
The system was fully preserved in just three languages, two of which were
generated in normal condition chains and one in a temporarily interrupted condi-
tion chain, and it was also almost fully preserved in three other languages, all of
which belonged to normal condition chains. An example of a final (generation 10)
language without any damage to the agreement system can be seen in Table 11.1.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
As one can see, the last generation language preserves the generation 0 agree-
ment system fully -o is consistently used to mark agreement with seg, and -i with
fuv. The only deviation from the generation 0 language structure is the loss of the
verb root in one of the sentences of the language (gen. 0 segl ro => gen. 10 segl o),
however, this change still conserved the correct agreement suffix.
The system disappeared, in turn, in fourteen languages, three of which
belonged to the normal condition, five in temporarily interrupted condition,
and six in permanently interrupted condition. An example of a generation 10
language that has fully lost the agreement system can be seen in Table 11.2.
As Table 11.2 shows, the generation 10 language in this chain has fully lost the
-i agreement pattern used for fuv in the generation 0 language, and now uses -o in
all sentences, which now is more reasonably analysed as a part of the verb stems
than an agreement marker. One can also note that one of the noun stems changed
from fuv to fug, likely under the influence of seg.
In the other chains the initial agreement system substantially deteriorated, but
did leave some remnants in generation 10 languages, which made it difficult to
precisely characterize the level of system erosion in a qualitative yet objective way.
Nevertheless, taking the above findings together, we can see that chains including
imperfect learner generations were more likely to completely shed the agreement
system and less likely to preserve it.
The initial languages used in the study described above are perfectly regular.
While the rule ‘change the verb form depending on the agent’ is redundant, it is
still a rule, deterministic and exceptionless, as are the other properties of the initial
languages. Irregularity in this setup is equal to zero and thus cannot decrease. At
first glance, this setup cannot then be used to test any hypotheses about the
potential role of imperfect learning in regularization. Manual inspection of the
evolving languages, however, quickly reveals noticeable changes in irregularity.
Due to the reasons outlined above they always start with an increase, but some
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1.00
0.75
Transmission
Normal
Overspecification
Temporarily interrupted
0.50
Permanently interrupted
0.25
0.00
0 1 2 3 4 5 6 7 8 9 10
Generation
transmission chains show less trivial patterns later. In this section, we present and
analyse these patterns.
Irregularity emerges because participants fail to learn or to apply a certain rule.
Most often, this is the agreement rule, and we will focus solely on the irregularity
of agreement (as we did with overspecification in section 11.3).
While the participants often fail to learn the rule that governs the distribution of
the two agreement markers in the initial languages, they seldom ignore the fact
that there are two different markers. When a deterministic distribution rule is not
available to learners, they often resort to probability matching, that is, reproduce
the variants with approximately the same relative frequency as in the input
(Hudson Kam & Newport 2009; Smith & Wonnacott 2010: 447, figure 1), but
without a clear consistent rule for when to use which variant. Figure 11.4 dem-
onstrates that our participants do the same with the agreement markers. In all
three conditions, the mean relative frequency of the round-animal marker does
not deviate much from the initial 50% (and, consequently, the same is true for the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1.00
Proportion of Round Animal Agreement Marker
0.75
Transmission
Normal
Permanently interrupted
0.25
0.00
0 1 2 3 4 5 6 7 8 9 10
Generation
Figure 11.4. Relative frequency of the agreement marker which denoted the round
animal in the initial language of the chain
Note: Shaded regions denote the standard error.
second marker). The narrow error bars show that relative frequencies in the
individual chains do not deviate much from 50% either (i.e., it is not the case
that the mean 50% is a result of half the chains using one marker in 100% cases
and the other half in 0% cases).
Out of our forty-five chains, fourteen lose agreement completely (see section
11.3.1). Some of those completely replace one marker by another, as the language
in Table 11.2, but this happens only in three chains, in the other chains both
markers get reanalysed as parts of the verb stems. The most common scenario is
represented in Table 11.3.
In the final language, all three verbs have only one form. Two (m- and b-)
preserve the original round-animal form with the -o ending, one (r-) preserves the
square-animal form (-i), thus making the relative frequencies of the markers 2/3
and 1/3, respectively. Out of the fourteen agreement-losing chains, nine arrive at
this frequency distribution at the end (counting both cases when it is the round-
animal marker that has frequency of 2/3 and when it is the square-animal one).
Analysis of all the individual chains confirms that while a few chains do replace
one marker by another completely or almost completely, most keep the propor-
tion not too far from 50% throughout all the generations.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
It should be noted that in some chains, verb endings different from the original
two emerge. If we calculate denominator of the ratio as the number of all present
verb endings and not just the original two, the general picture does not change.
This measure is similar to Cuskley et al.’s (2015: 215) Sj measure, used to measure the
variability of sub-rules a participant uses in the formation of irregular past tenses.
Consider some examples. In the final language in Table 11.1 there is only one
pattern of agreement marking: {o, i}, and the same is true for the final language in
Table 11.2 (the same-symbol type). Both languages would get an irregularity score
of zero. So would the final language in Table 11.3: while there are two different
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
pairs {o, o} and {i, i}, they both fall under the same-symbol pattern. The language
in Table 11.4, however, is less regular. The strategy here is almost the same as in
Table 11.3 with two exceptions: the verb r- preserved the agent marking in singular,
the verb b- in plural. Hence, there are two patterns: the same-symbol pattern (four
cases) and {o, i} (two cases). The language gets an irregularity score of 0.36.
Irregularity depends on the number of patterns (the more patterns, the higher
irregularity is) and the distribution of their probabilities (irregularity is highest if
all the patterns are equiprobable). Thus, the least irregular language (apart from
the fully regular one, which scores 0) would have two patterns, one of which
occurs only once, and would score 0.25. The most irregular language would have
six equiprobable patterns and score 1. However, this never happens in our data,
the highest observed score is 0.74 (it can be achieved, e.g., by having four patterns:
two that occur twice and two that occur once).
As can be seen on Figure 11.5, unlike overspecification, in all three conditions
irregularity increases rather steeply at first, then starts oscillating around what
seems to be a plateau. In the permanently interrupted condition, there is a rather
steep decrease during the last two generations, in the other two conditions the
peak of irregularity is also closer to the middle (i.e., there is a slight decrease
towards the end), but the difference is small. It is, however, interesting to take a
look at the individual trajectories of irregularity and compare it to those of
overspecification. We do that in Figure 11.6.
In most chains, the initial changes in overspecification and irregularity go in
exactly opposite directions, that is, the two measures seem to be almost perfectly
negatively correlated. Sometimes this trend continues through all the generations
(see, e.g., chains 2 and 13). If, however, the overspecification decreases beyond 0.5,
the measures become positively correlated and subsequently change almost in
unison (see, e.g., chains 22 and 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1.00
0.75
Transmission
Normal
lrregularity
Temporarily interrupted
0.50
Permanently interrupted
0.25
0.00
0 1 2 3 4 5 6 7 8 9 10
Generation
This behaviour largely follows from the definition of the measures. There are
two states where the system is fully regular: complete overspecification and
complete absence of overspecification. If the system is closer to the first state
(overspecification > 0.5), almost any mutation would change the two measures in
different directions (if agreement is lost in one case out of six, it is a decrease in
overspecification, but an increase in irregularity), but if it is closer to second state
(overspecification < 0.5), then the measures usually change in the same direction
(e.g., if the two remnants of agreement in the language in Table 11.4 disappear,
both overspecification and irregularity would go down to zero).
For every generation (apart from the final ones) we estimate how learnable its
language is. The measure of learnability is transmission fidelity, which is obtained
by comparing the language of generation n with the language of generation n+1,
calculating the normalized pairwise Levenshtein distance between the sentences
with the same meanings and subtracting it from 1. We found that, unlike in most
other IALL experiments, learnability clearly decreases over time. If, however, we
look at the learnability as a function of overspecification, we find that it follows a
1 2 3 4 5
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
6 7 8 9 10
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
11 12 13 14 15
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Figure 11.6 Change of overspecification (solid line) and irregularity (dashed line) in verbal agreement over generations in individual chains:
(a) normal condition; (b) temporarily interrupted condition; (c) permanently interrupted condition
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
16 17 18 19 20
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
21 22 23 24 25
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
26 27 28 29 30
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
36 37 38 39 40
41 42 43 44 45
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
0.75
Learn ability
0.50
0.25
0.00
0 0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74 0 0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74
Irregularity
11.5 Discussion
² We are grateful to Kenny Smith for bringing this difference to our attention.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
language deviate from the regular state is higher in our case (note that Smith &
Wonnacott filtered away certain random mutations that they deemed irrelevant
before passing the input on to participants).
It should also be noted that while there is a clear difference between the
trajectory of overspecification in the normal condition and the interrupted con-
ditions, such difference is absent for irregularity. It can be that overspecification is
more sensitive to the degree of imperfect learning. The effect of irregularity on
learnability, however, seems to be different for normal and imperfect learners: as
irregularity increases, the learnability decreases steeper in the latter category.
11.6 Conclusion
Acknowledgements
The experiment was funded by Faculty of Humanities, Social Sciences and Education
at UiT, The Arctic University of Norway. AB was supported by the Norwegian
Research Council grant ‘Birds and Beasts’ (222506).
We are also grateful to the popular-science portal ‘Elementy’ and its editor-in-chief
Elena Martynova for advertising the experiment, to Tanja Russita for designing the
Epsilon fauna, to Kenny Smith and Peeter Tinits for commenting on an earlier version
of the chapter, and to Peter Arkadiev and Francesco Gardani for inviting us to
contribute to this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
12
Where is morphological complexity?
Marianne Mithun
12.1 Introduction
Two of the prefixes seen in (1) also occur here: ʔ- ‘fine finger action’ and m-
‘involving heat’.
Another widespread feature is the specification of location and direction.
Central Pomo examples of such suffixes are in (3) with the verb čá- ‘run’.
(Perfective aspect is marked here with the suffix -w after vowels and glottal stop
after obstruents. Imperfective aspect here is -an.)
All of the speakers represented here learned Central Pomo as their mother
tongue. All subsequently learned English as well, but they had varied histories. All
ultimately returned to live in Central Pomo communities.
The Central Pomo of Speakers 1 and 2 shows full fluency and articulateness.
That of the others provides some insight into potential effects of contact on
morphological complexity.
The nature of the morphological complexity with varying contact effects differs
in several ways, however. One is underspecification of certain distinctions regu-
larly mentioned by the most Pomo-dominant speakers. Speaker 4, for example,
who was away from the community for some time and did not use the language
often after her return, made the comment in (7).
She used reduplication to describe her staggering, but Speaker 2 later commented
as we were transcribing the recording that a more dominant Pomo speaker would
have used the verb in (8), specifying direction with the suffix -:ʔw- ‘around here
and there’.
Speaker 2 later commented that Speaker 4 made it sound like just one person is
smart. Speaker 4’s imperfective verb, a frequently-occurring one, was well-formed,
but inappropriately selected in this context.
The speech of less fluent speakers does show morphological complexity. On
another occasion, Speaker 4 offered the explanation in (10) with a complex verb.
This verb, too, shows some morphological complexity, and it is well-formed. But
the context is revealing. It was part of a conversation among Speakers 2, 3, and 4.
(The full conversation was in Central Pomo. Just the translation of Speaker 3’s
remarks are presented here for context.)
Slightly less fluent Speaker 3 used a well-formed passive verb, but inappropriately.
Here, too, the passive verb forms are well-formed but incompatible with mention
of the agent, the lady. Both (14) and (15) indicate that the speakers were selecting
pre-formed words, rather than constructing them online as they spoke. Example
(15) also reflects a smaller lexical inventory. As Speaker 2 later noted, a better
choice for the first verb would have been qó=di-w, and for the second qó=be-w.
Different verb roots are used for carrying a single round item (de-), multiple round
items (di-), and long items carried horizontally (be-).
Speaker 4 did use some passive verbs appropriately, as in (16).
Speaker 2 later commented that the first verb should have been čályohdu-n,
ending in the realis event dependency suffix, rather than the perfective -w,
yielding a sentence meaning ‘When I go to E’s house I talk Indian with her’.
Speaker 3’s prosody in (17) reflected this structure: she did not end the first clause
with a terminal fall in pitch or a significant pause.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Speaker 2 later commented that she herself would have used the dependent verb
form čʰó-w=da in the first clause, with the realis different event dependency
enclitic =da.
The puzzle remains as to why the joint conversation between fully fluent
Speaker 2 and struggling, English-dominant Speaker 5 should show exactly the
same morpheme per word ratio: 1.44. Speaker 2 actually spoke more during the
conversation, with twice as many words (tokens). Significantly, she used many
more different words. Speaker 5 used just nine different verbs (types), all but four
of them repetitions of verbs just used by Speaker 2.
Overall, there are two main differences between the speech of fully fluent
Speakers 1 and 2 on the one hand, and more English-dominant Speakers 3, 4, 5,
and 6 on the other. The first is lexical knowledge. Fluent speakers who spend more
time in the language know more words and lexicalized constructions. They can
thus make finer semantic distinctions, as with verbs specifying means/manner,
location/direction, and different kinds of carrying, all seen here. The second is that
fluent speakers have more alternatives for shaping the flow of information, with
passives, clause linkers, and discourse particles. A significant difference between
the two groups is in fact the use of discourse particles, which convey such
distinctions as source and certainty of information (hearsay, inference, etc.),
contrast with expectation versus common knowledge, and much more. Fully
fluent speakers use substantially more such particles. Since the particles are
monomorphemic, their pervasiveness lowers the average number of morphemes
per word.
12.5 Mohawk
word, a verb complete with pronominal arguments and predicate, can constitute a
full sentence. The verb in (19), for example, would be a complete sentence on
its own.
There are three lexical categories in Mohawk, defined in terms of their internal
morphological structure: verbs, nouns, and particles. (Particles are monomor-
phemic, though they are sometimes compounded.) The morphological structures
are templatic; that of verbs is the most elaborate. The basic verb template is in
Figure 12.1.
Within the blocks of pre-pronominal prefixes and derivational suffixes there
are multiple slots. The prepronominal prefixes include a Contrastive, Coincident,
Partitive, Translocative, Factual, Duplicative, Irrealis, Future, Cislocative, and
Repetitive. The derivational suffixes include an Inchoative, Reversives,
Causatives, Instrumental Applicatives, Benefactive Applicatives, a Directional
Applicative, Distributives, Andatives, and Ambulatives. There are around sixty
pronominal prefixes, three aspect suffixes, and four final tense/mood suffixes.
Nearly all show phonologically and/or morphologically conditioned allomorphy.
As in many templatic systems, there are discontinuous dependencies among
morphemes. Certain verb roots require a Duplicative prefix (), for example. In
some cases, a semantic rationale can be discerned: the Duplicative can indicate
some kind of ‘two-ness’ or a change of state or position, though its occurrence is
lexicalized with each verb. In other cases, any semantic contribution has faded.
Some other verb roots require certain other prepronominal prefixes, in what are
now lexicalized combinations. Another discontinuous dependency holds between
inflectional prefixes and suffixes. The perfective aspect suffix, for example,
requires the presence of a Factual, Future, or Irrealis prepronominal prefix.
PRE-
PRONOMINAL REFLEXIVE NOUN VERB DERIVATIONAL ASPECT TENSE
PRONOMINAL
PREFIXES MIDDLE STEM ROOT SUFFIXES SUFFIXES MOOD
PREFIXES
12.5.1 Inflection
All verbs must contain a verb stem, an inflectional pronominal prefix identifying
the core arguments of the clause, and an inflectional aspect suffix.
There are three sets of pronominal prefixes: grammatical Agents, grammatical
Patients, and transitives, which are Agent>Patient combinations. A transitive
prefix can be seen in (20).
12.5.2 Derivation
had happened before, she included the Repetitive prefix sa- ‘again’ on the verb,
whether or not there was a separate particle á:re’ ‘again’.
Noun incorporation, the compounding of a noun stem with a verb stem to form a
new verb stem, is pervasive in Mohawk, but it is a word-formation device.
Speakers generally know which forms are part of the lexicon of the language
and which could be but are not. Awareness of neologisms depends on the
productivity of individual noun and verb stems. Some noun stems are never
incorporated, some are sometimes incorporated, some are often incorporated,
and some occur only incorporated. Similarly, some verb stems never occur with an
incorporated noun, some occur in a few combinations with nouns, and some in
many. New combinations with less productive stems are more often noticed than
those involving highly productive ones. Often the language provides alternatives
for packaging information: a noun may occur as an independent word or incorp-
orated into a verb. The density of incorporation for discourse purposes generally
varies across speakers with the degree of language use.
Examples of noun incorporation can be seen in (22), part of a conversation
between a grandmother and her granddaughter as they were making meat pies.
The grandmother was a highly skilled speaker, who learned English only after she
went to school. The granddaughter heard Mohawk as a child, but spent most of
her daily life in English. (The entire conversation was in Mohawk, but just the free
translation is given for the first few lines to provide context.)
The grandmother first introduced the fat with the independent noun kén:ie’. Once
it was an established referent, she incorporated it: enke-wist-á:wenhte’ ‘I will fat
melt’. (Incorporated noun stems are not always the same as their independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
At this point the pies were well-established referents, active in the consciousness of the
speakers, so it is no surprise that the noun stem -na’tar- ‘baked goods’ was incorpor-
ated. There was little need to highlight it. The verb ‘make’ is what could be called a
‘light verb’, not adding highly complex, new information. It is one that frequently
incorporates, and the combination ‘baked.goods-make’ = ‘bake’ is a common one.
As the conversation continued, the granddaughter introduced referents with
independent nouns, and the grandmother picked them up with incorporated
nouns.
During this conversation, the grandmother talked more than the granddaughter,
with five times as many words (tokens) overall. But as in Central Pomo, perhaps
surprisingly, their average morpheme per word ratios were nearly identical: the
grandmother’s speech averaged 2.4 morphemes per word, and the granddaughter’s
2.3. As in the Central Pomo conversations, the skilled speaker, the grandmother,
used many more discourse particles, which are monomorphemic. The granddaugh-
ter used some highly lexicalized polymorphemic words, words that she clearly
selected as familiar chunks, and fewer particles. The speech of the two was overall
quite different, in many of the same ways as in Central Pomo. Skilled speakers like
the grandmother here spend more of their time in the language and simply know
more words and more constructions. They have more lexical items to choose from,
including verb stems with incorporated nouns, and more choices among construc-
tions for shaping the flow of information.
12.5.4 Processing
then disappears, leaving vowel length. The masculine singular agent prefix thus has
the forms -hra-, ra-, -ha-, -hr-, r-, -hren-, -hen-, and -ren. And yet, children never
seem to make mistakes! At age 2 years and 10 months, one child easily asked Ka’ wà:
re? ‘Where’s he going?’, never tripping over those complex phonological processes.
Of course this is no surprise. The child knew the full question as a chunk; he did
not manufacture the word from underlying forms of morphemes, then apply
multiple phonological processes to arrive at a surface form.
The second person singular agent pronominal prefix ‘you’ is basically s- word-
initially, -hs- word-internally, with epenthetic -e- before stems beginning in n, r, or
w and certain consonant clusters. The basic form of the perfective aspect suffix is
glottal stop ’, with epenthetic -e- after consonants. As noted above, stress is
penultimate, with stressed vowels lengthened in open syllables, but epenthetic
vowels do not enter into the determination of stress. The child cited above in (27)
similarly came out with the exclamation in (28) below easily and perfectly, despite
the complexity of the processes that would go into building it from underlying
forms then applying a sequence of phonological rules.
Of course the child learners did not emerge instantaneously with Mohawk
equivalent to that of adults. About the time they were producing three-syllable
words, they began to discover morphology, usually with a few more frequent
pronominal prefixes. (These immediately precede the verb stem.) From this point
on, acquisition was governed more by morphology than phonology. As seen
earlier, Mohawk speakers generally specify the direction of directed motion,
with a Translocative prefix i-/ie-/ia-/ia’-/iaha- ‘thither’ or a Cislocative prefix t-/
te-/ta-/-onta-/-onte-/-ont- ‘hither’. A Translocative prefix was seen earlier in (19)
in the verb i-a’akiate’serehtínion’t ‘we pulled in there’. At 2 years and 10 months,
the child cited in (27) and (28) generally omitted the directional prefixes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
The adult version would include a negative prepronominal prefix on the verb:
te’-hr-ak-s > tè:raks. (Mohawk verbs must contain at least two syllables. If a verb
would otherwise be monosyllabic, a prothetic vowel i- is added at the beginning,
which bears stress.)
Overall, children learning Mohawk apparently first build vocabulary within
phonological length limitations, then begin to abstract morphological distinctions.
The fact that they so rarely make allomorphic errors suggests that they are not in
fact producing language by assembling underlying forms then applying sequences
of phonological rules. This accords well with the findings of Tomasello (2006 and
elsewhere) on acquisition: Children’s earliest acquisitions are concrete pieces of
language—words, complex expressions, or mixed constructions—because particu-
larly early in development they do not possess fully abstract categories and schemas.
Children construct these abstractions only gradually and in piecemeal fashion.
The strategies observed in children learning Mohawk as a first language differ
interestingly from those seen in adult second-language learners. In several of the
Mohawk communities, an extraordinary generation of young adults are develop-
ing an impressive competence in the language. They are becoming fluent, some-
thing that would have been considered an impossible dream only a short time ago.
These second-language learners show brilliant mastery of the complex morph-
ology, certainly making allomorphic mistakes along the way, but exquisitely tuned
in to the complexities involved. First-language speakers are delighted to see their
accomplishments, though, interestingly, they observe uniformly that these
second-language speakers continually create words that do not exist in the
language.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Work by Blevins (2006, 2013, 2016a, 2016b), Pirrelli et al. (2015), and others
draws a distinction between Constructive and Abstractive models of morphology.
In Constructive frameworks, surface word forms are described as built up from
subword units, either in terms of substance or rules. In Abstractive frameworks,
the basic units of the grammatical system are surface word forms. Roots, stems,
and exponents are understood as abstractions over a lexicon of word forms.
Constructive perspectives underlie efficient linguistic descriptions, the kinds of
descriptions that are useful for both linguists and adult second-language learners.
They also fit well with what is seen in adult acquisition of Mohawk as a second
language, in particular allomorphy mistakes and the overgeneration of derived
forms. Such descriptions also provide measures of objective complexity in the
sense described by Dahl and others cited earlier.
Abstractive perspectives are word based, though it is recognized that words can
be internally structured into recognizable constituent parts. Constituent parts are
analysed as emergent from independent principles of lexical organization,
whereby full lexical forms are redundantly stored and mutually related through
entailment relations (Matthews 1991; Corbett & Fraser 1993; Pirelli 2000; Burzio
2004; Booij 2010; all cited in Pirelli et al. 2015: 142). It is significant that the
processing of a given form may be facilitated or inhibited by other, related forms.
This makes sense only if the related forms are available as elements of a speaker’s
mental lexicon (Taft 1979; Baayen et al. 1997; Schreuder & Baayen 1997; Hay
2001; de Jong 2002; Moscoso del Prado Martin 2003, cited in Blevins 2006; Blevins
2006: 535).
Abstractive models accord well with differences between highly fluent first-
language speakers of Central Pomo and Mohawk on the one hand, and English-
dominant first-language speakers on the other. One of the most salient differences
is that while less fluent speakers do use highly synthetic words if they are very
frequent or primed, they have a smaller inventory of choices. Their more limited
lexical inventories can result in some inappropriate lexical selections, both
inflected and derived, and fewer options for shaping information flow.
Abstractive models also accord well with the strong sense among both Central
Pomo and Mohawk speakers of whether a possible word exists and exactly when it
is used. They are in line with the problems even skilled speakers sometimes face in
attempting to process speech from other dialects. They would predict the variable
ability of speakers to isolate morphemes which never occur on their own as
independent words, the existence of discontinuous dependencies, and speakers’
differential facility in producing inflectional paradigms. Speakers can certainly
extend patterns of inflection by analogy on occasion, but rarer forms and com-
binations present greater challenges. Abstractive perspectives also accord with the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
IV
DISCUSSION
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
13
Morphological complexity and the
minimum description length approach
Östen Dahl
13.1 Introduction
I will take as my point of departure the idea that the complexity of an entity can be
understood as the amount of information needed to recreate or specify it—which
in most cases can be identified with the length of the shortest possible complete
description of it. This is often referred to as ‘Kolmogorov complexity’ or ‘algo-
rithmic information content’ and has its most natural application when applied to
strings (of symbols or characters): the Kolmogorov complexity of a string is the
inverse of its compressibility. Kolmogorov complexity is behind the ‘minimum
description length (MDL) principle’ which is said to build on the insight that ‘any
regularity in the data can be used to compress the data’ (Grünwald 2007), leading
to the conclusion that finding the best hypothesis for a given set of data means
finding the optimal way to compress it. As in Dahl (2004), I will here use the term
‘pattern’ rather than ‘regularity’, following Goertzel (1994) and Shalizi (2001).
Östen Dahl, Morphological complexity and the minimum description length approach In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Östen Dahl.
DOI: 10.1093/oso/9780198861287.003.0013
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
¹ The maxim (Sanskrit ardhamātrā lāghavena putrotsavaṃ manyante vaiyākaraṇ āḥ) is often
quoted in the literature without a source. There is no known formulation of it from classical times.
In the form cited here, it derives from the treatise Paribhāṣenduśekhara by the nineteenth-century
Indian scholar Nagēśa or Nāgojībhaṭtạ , which was translated into English by the German Indologist
Franz Kielhorn (Kielhorn 1871). Incidentally, Occam’s Razor in its commonly cited form (entia non
sunt multiplicanda praeter necessitatem) is not found in the writings of William of Ockham but derives
from the seventeenth-century Irish philosopher John Punch.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
333
When describing a set of objects, the most parsimonious way is often to separate
the information about their general properties from the information that is
specific to each member of the set. Descriptions of languages are traditionally
divided into ‘grammar’ and ‘lexicon’. So let’s see what that implies for
morphology.
We can see the goal of the morphological component of a grammar—or
‘morphology’ for short—as a tool to generate the set of all word forms, organized
in paradigms, in a language from a lexicon. Another way of putting this in the
spirit of the MDL principle is to regard the morphological component as a way of
compressing the set of paradigms. The morphology and the lexicon together
constitute the description of the word forms. The lexicon will consist of a set of
entries, which I shall call ‘lexical specifications’, containing the information
needed by the morphology to generate one particular paradigm, that is, on the
one hand, one or more basic forms or principal parts; on the other, membership in
inflection classes, genders, etc. I shall here assume that the lexicon contains no
other information.
The total length of the morphology and the lexicon is thus indicative of the
complexity of the paradigms. But in speaking of morphological complexity we
have to sort out a few different components in this. Primarily, the morphological
complexity of a language would be the complexity of the morphological compo-
nent in the sense of the system that relates the lexicon with the set of paradigms.
To start with, although I have been speaking of a set of word forms and a set of
paradigms as if those things were equal, the difference between them is crucial.
Think of the paradigm as a table. Since there is a number of ways any given set of
word forms can be organized into a table, and the choice between them is
significant, it follows that there is information hidden in the organization of the
paradigm and consequently the paradigm is more complex than the set of word
forms. Furthermore, the paradigms belonging to lexical items of one part of
speech usually share a common structure. But this structure can be studied
independently of the system that relates paradigms and lexical specifications. So
paradigm organization can be seen as a component of its own.
Another problem is to what extent the lexicon is relevant to the question of
morphological complexity. On the one hand, to the extent that the morphological
component does not treat all lexical items equally, the lexicon will have to contain
information that makes that possible. On the other hand, if items are added to or
removed from the lexicon, the total length of the lexicon will change—and it
seems counter-intuitive that these changes should always influence the morpho-
logical complexity of a language. For this reason, it is rather the information
contained in the individual lexical specifications that is of interest.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
As noted above, the chapters in the volume differ in the notions of complexity that
are invoked. But they also differ in the extent to which they place these notions
within explicit frameworks.
The minimum description length approach to complexity is mentioned in the
chapters by Di Garbo, Chapter 9; Loporcaro, Chapter 6; Mithun, Chapter 12; and
Nichols, Chapter 7. But more salient in the volume is the approach of Ackerman &
Malouf (2013). Several chapters (Henri et al., Chapter 5; Parker and Sims,
Chapter 2; Mansfield and Nordlinger, Chapter 3; and Meakins and Wilmoth,
Chapter 4) draw on their distinction between two ‘dimensions in the analysis of
morphological complexity’, viz. ‘enumerative complexity’ or ‘E-complexity’ and
‘integrative complexity’ or ‘I-complexity’. This motivates discussing these con-
cepts in some detail, which I will do below.
A superficially somewhat similar dichotomy is that made by Nichols between
‘inventory complexity (IC)’ and ‘canonical complexity (CC)’, but while ‘IC’ and
‘enumerative complexity’ are fairly closely related, the second members of the
pairs bear little resemblance to each other. (There is a potential source of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
335
In the introduction to Miestamo et al. (2008), the volume editors apply the
analysis of the notion of complexity in Rescher (1998) to linguistic complexity.
For Rescher, description length (in his terms, ‘descriptive complexity’), is just one
of several ‘modes of complexity’. Another is ‘compositional complexity’, which
relates to the constituent elements of a system and is subdivided into two
submodes: ‘constitutional complexity’—the number of elements, and ‘taxonomic
complexity’—their variety. Miestamo et al. (2008: viii) exemplify the former with
the number of ‘phonemes, inflectional morphemes, derivational morphemes,
lexemes’, and the latter with the variety of ‘phoneme types, secondary articula-
tions, parts-of-speech, tense-mood-aspect categories, phrase types’, etc.
Although there are no references to Rescher’s taxonomy (but see the editors’
Introduction, Chapter 1), notions close to ‘constitutional complexity’ show up in a
number of ways in the chapters of the volume, notably as one of the poles of the
dichotomies of Nichols and Ackerman & Malouf.
Nichols’s ‘IC’ is based on ‘assessing the number of elements in an inventory or
values in a system’, exemplified by ‘the number of phonemes, genders, tenses,
derivation types, alignments, word orders’. She identifies it with Miestamo et al.’s
(and thus indirectly Rescher’s) notion of ‘taxonomic complexity’. It may be noted
that some of the items in her list seem rather to belong to ‘constitutional
complexity’ in Rescher’s schema, illustrating that the borderline is somewhat
fuzzy.
Nichols also quotes the term ‘resources’ from Dahl (2004) in this context, which
is slightly problematic. In my book, I opposed ‘resources’ and ‘regulations’, saying
that intuitively, ‘resources determine what is possible or permitted, regulations
what is obligatory’, and noting that ‘the distinction is reminiscent of that between
grammar and lexicon but does not coincide with it’ (Dahl 2004: 41). The basic idea
was that resources are things that one can more or less freely choose from. The
primary examples are lexical items. As the quotation suggests, I did not primarily
think of the notion as applying to grammar. Many of the phenomena Nichols
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
enumerates are not freely chosen by speakers but rather show up as a consequence
of forced choices due to what I called regulations.
Later in the book (Dahl 2004: 42) I say that if one wants to characterize a
language with respect to its ‘resources’, the parameter that comes first to mind is
‘richness’. Dressler (2011), who is quoted by Loporcaro (Chapter 6, this volume),
also uses this term, characterizing the size of paradigms as a criterion of ‘richness’
rather than of ‘complexity’. However, Dressler defines ‘richness’ as ‘the amount of
productive morphological patterns’, associating complexity with unproductive
patterns, so his notion is different from mine (and apparently also from
Nichols’s ‘IC’).
Let me now turn to Ackerman & Malouf’s notion of E-complexity. It is not
quite clear what is supposed to go into it. The abstract says that E-complexity
reflects ‘reflects the number of morphosyntactic distinctions that languages make
and the strategies employed to encode them, concerning either the internal
composition of words or the arrangement of classes of words into inflection
classes’ (Ackerman & Malouf 2013: 429). The definition in the main text (2013:
433) is formulated in a somewhat roundabout way. The authors first note that
‘descriptive linguists often comprehensively catalogue the array of morphological
markers and patterns in a given language or languages’, making possible on the
one hand typological investigations of the types of information encoded in words
and taxonomies of formal strategies for encoding this information, on the other,
inferences by theoretical linguists about the bounds on possible word structures in
natural languages. ‘We refer to patterns found via this general cataloguing of
properties and their surface exponence for words in all of their variety as the
enumerative complexity or E-complexity of a morphological system.’ What is
unclear here is whether E-complexity is basically a count of distinctions and
patterns/strategies or something more. Later formulations in the paper do not
really solve this problem. On p. 434, we learn that ‘[on]e salient dimension of
E-complexity is the number and nature of inflection classes in a language’, with the
word ‘nature’ suggesting that it is not only a question of counting. On the other
hand, on p. 437, it is said that paradigm-based models ‘reflect a measure of
E-complexity’ which is specified as ‘a greater number of possible exponents, inflec-
tional classes, and principal parts’. Likewise, on p. 451, ‘the same E-complexity’ is
equated with ‘the same number of declensions, paradigm cells, and allomorphs’, and
in a later work (Ackerman & Malouf 2016: 125), E-complexity is said to increase
with ‘(i) larger numbers of morphosyntactic properties a language contains,
(ii) greater numbers of allomorphic variants it uses to encode them, and
(iii) more inflectional classes that lexemes can be distributed over’.
The interpretation of enumerative complexity as being simply an inventory
count is clearly the one chosen by Henri et al. (Chapter 5, this volume): ‘a
linguistic phenomenon’s enumerative complexity depends on how many categor-
ies (of whatever type) it employs’ (p. 106). They seem to have the same thing in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
337
mind when saying earlier (p. 106) that ‘[m]orphological complexity is often
equated with numerousness—of morphs, categories, processes, or paradigm
cells’. They also refer to Stump (2017), who is quite explicit on this point when
he describes the distinction introduced by Ackerman & Malouf: ‘a linguistic
phenomenon’s enumerative complexity depends on how many categories (of
whatever type) it employs . . . ’. Parker & Sims (Chapter 2, this volume) also refer
to enumerative complexity as ‘the number of inflection classes or the size of
paradigms’.
Ackerman & Malouf (2013), like also the chapters in this volume that quote it,
tend to give the impression that the two notions of I-complexity and E-complexity
more or less exhaust the possible approaches to morphological complexity, and
that earlier work has been dominated by E-complexity. Thus, Ackerman & Malouf
say in a footnote (2013: 434): ‘For examples of efforts to identify and quantify E-
complexity, see, for example, Juola 1998, 2007, Sampson et al. 2010, Moscoso del
Prado Martín 2011.’ But the works listed here represent a variety of approaches to
linguistic complexity, including MDL-based ones. And it should be clear that E-
complexity cannot be identified with description length. A list of morphosyntactic
categories, inflection classes, and allomorphs is not yet a morphological descrip-
tion of a language.
cell’. The central result of the study is that high E-complexity of paradigmatic
systems is possible as long as low I-complexity is found in those in the form of
average conditional entropy of paradigms.
The definition of average conditional entropy presupposes that the set of
possible realizations of the cell to be guessed is known and finite, otherwise the
entropy cannot be calculated. This condition is not fulfilled when some of the
possible realizations are suppletive. That is, the notion of conditional entropy
cannot be applied to cases such as English go and went. It may perhaps be argued
that those are precisely the situations where you have to know the paradigm in
advance so guessing is not possible anyway. But it restricts the applicability of the
notion to some extent.
The notions of ‘conditional entropy’ and ‘average conditional entropy’, as
applied to inflection templates, have some interesting mathematical properties
not discussed by Ackerman & Malouf. ‘Average conditional entropy’ involves
bidirectional predictability relations between cells in a paradigm template. These
turn out to be ‘entangled’ in that there is an upper bound on the sum of two
symmetric entropies, which has as a consequence that the average conditional
entropy of a paradigm can never exceed 50% of what Ackerman & Malouf call its
‘declension entropy’, that is, the surprisal of the inflection class membership of a
lexeme under the assumption that each inflection class is equally probable. I have
no formal proof of this claim,² but I have tested it for all possible value combin-
ations for sets of classes with sizes up to eight, where I had to stop due to
limitations on computer capacity. Concretely, this means that in a system with
eight declension classes and declension entropy equal to 3—like the Greek one
exemplified in Ackerman & Malouf (2013), the average conditional entropy could
not be higher than 1.5. This fact should be taken into account when assessing the
actual average conditional entropies calculated by Ackerman & Malouf—as, when
they (p. 442) say that the overall average conditional entropy for the eight Greek
² But consider the simplest case: a system with two inflection classes and two inflectional forms, as
illustrated in Table 13.1. There are four logical possibilities in such a 22 matrix: (1) identity between
the rows in both columns; (2) identity in row 1 and no identity in row 2; (3) no identity in row 1 but
identity in row 2; (4) no identity in either row. Case (1) can be disregarded since it would mean there is
really only one inflection class. The entropy is zero. In case (4), one form always gives full information
about the other, so the entropy is zero. In case (2), the cells in row 1 do not say anything about the cells
in row 2, so the entropy for each cell is equal to the choice between two items, that is 1 (=one bit). But
since there is no choice in row 1, the entropy in the opposite direction is 0, which gives an average of
0.5. Case (3) is analogous, but with the columns swapped—the average will again be 0.5. Note further
that adding a third column will not change anything for the following reason. Guessing is always from
one column to another, so we are always dealing with pairs of columns, in which guessing can go in
either direction. While a 22 matrix involves just one such pair, a 32 matrix with columns ABC
entails three pairs of columns: AB, AC, BC. But that makes such a matrix equivalent to three 22
matrices—and as we saw, a 22 matrix has a maximum average guessing entropy of 0.5, the value for
the 32 matrix is the same. And adding further columns gives an analogous result. Things get more
complicated when rows are added, but my computer simulation strongly suggests that the relation
between declension entropy and maximum average conditional entropy is constant.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
339
declensions is 0.644 bits, which is equal ‘to a choice among . . . 1.56 equally likely
declensions’ or ‘slightly more than one’ declension. This is misleading in the sense
that no system with two declensions could ever have an average conditional
entropy higher than 0.5. Thus, if the entropy is 0.644, the system must have at
least three declensions. The low values for average conditional entropy found by
Ackerman & Malouf thus at least partly depend on mathematical necessity rather
than on anything else.
It appears that integrative complexity, in the form of conditional entropy,
primarily depends on two factors: one is the extent to which forms ‘wear their
inflection class on their sleeve’, that is, are informative about their own inflectional
class, the other is the extent to which the distributions of allomorphs—or, more
generally, exponents—differ between forms and thus, in the words of Parker &
Sims (Chapter 2, this volume), increase the ‘extent to which the system inhibits
motivated inferences about the realized form of a lexeme, given one or more other
realized forms of the same lexeme’.
The dependence of conditional entropy on these factors means that its rela-
tionship to minimum descriptive length complexity is not straightforward. The
first factor—the informativity of a form about its inflection class membership—
means that there is an inverse relation between the diversity of forms in the
predicting cells and integrative complexity. Thus, lack of overt marking, which
will in general decrease description length, can actually increase integrative
complexity. Consider the hypothetical noun inflection templates in Table 13.1,
with the rows representing two inflectional classes.
The templates can be generated by the rules beneath the table.
(a) (b)
sg pl sg pl
1 -∅ -e 1 -a -e
2 -∅ -i 2 -o -i
Rules: (a) If plural then (if 1 -e else -i) else -∅; (b) if plural then (if 1 then -e
else -i) else (if 1 then -a else -o).
Thus, (b) has a greater description length than (a). However, in (b), the singular
and plural markers are wholly predictable from each other, so the integrative
complexity is 0. In (a), on the other hand, the plural form cannot be determined
from the singular, which results in an average integrative complexity of 0.5—the
theoretic maximum—for the whole template.
The second factor—the degree to which allomorph distributions differ—means
that a high average number of allomorphs—which would presumably lead to a
higher description length—does not necessarily lead to a high integrative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
complexity. Thus, paradoxically, the situation we saw in (b), where all cells of the
paradigm are different from each other, will, irrespective of the size of the
paradigm, always mean that the integrative complexity is zero. But this is not so
strange if we realize that what integrative complexity really measures is the
amount of discordance between the classifications of the lexicon entailed by the
different columns in a paradigm.
It would appear that this creates a problem for the notion of CC, since we would
have to choose a concept to relate it to and also be rather cautious in doing so,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
341
since for some concepts, such as suppletion, ‘more canonical’ actually means
‘more complex’. I cannot see that this issue is addressed in an explicit fashion in
Nichols’s Chapter 7, but since she says that she is concerned exclusively with
‘morphological complexity and specifically inflectional morphology’, it can be
assumed that the canonicity she is speaking of is ‘canonical inflection’ as under-
stood in Corbett’s (2015) paper.
There is still a catch here, though. In general, one would assume that a language
with minimal inflectional complexity would be one without any inflection at all, or
that a minimally complex inflectional class system would be having no inflectional
differences between lexemes. Under a consistent canonical approach, however, it
would appear that isolating languages should not be seen as having zero inflec-
tional complexity (and thus being maximally canonical), rather the notion of
inflectional complexity would not be applicable to them. So far as I can see,
Nichols’s sample does not contain any purely isolating languages (Mandarin is
the one that comes closest) so it is not apparent how she would treat them. But the
problem may show up again at another level. Thus, with regard to unpredictability
of gender, Nichols puts languages with entirely predictable gender together with
languages without gender—which maybe makes sense assuming that one is
looking at canonicity of inflection but not if what is at stake is canonicity of
gender.
Nichols notes one point where there is a discrepancy between Kolmogorov
complexity and CC—syncretism, that is, when two or more cells in a paradigm
share the same word form. She notes that syncretism does ‘not increase the
amount of information required to describe a language’. This may in fact be
made stronger—syncretism often makes it possible to shorten a description. But
syncretism will in general lead to violations of what Nichols refers to as ‘the
structuralist notion of biuniqueness, or “one form, one function” ’,³ which Nichols
sees as central to canonicity and thus syncretism increases CC. Likewise, Corbett
(2015: 152) says: ‘In the canonical situation, the inflectional material is different in
every cell of the lexeme. The major deviation here is syncretism; we have an
expectation of a given number of inflectional forms, while with syncretism two or
more of them are identical (two or more morphosyntactic specifications share a
single realization).’ Sometimes it seems that the choice of criteria on canonicity
rely on a demand for ‘proper behaviour’—if you have a distinction somewhere,
you had better have it everywhere. If that makes things more complex does not
really matter.
What Nichols calls ‘biuniqueness’ (like Tallman & Epps (Chapter 9, this vol-
ume), who mention ‘deviations from biuniqueness’ as a criterion that relates
³ Cf. also the following statement by Mansfield & Nordlinger (Chapter 3, this volume): ‘Inflectional
allomorphy is a prototypical form of morphological complexity, introducing unpredictability into the
mapping of form to meaning’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
13.8 Overabundance
343
complexity, ‘in that it requires speakers to make calculated choices about forms
based on features beyond the paradigm’. The particular problem studied is
optional subject marking in the mixed language Gurindji Kriol, more specifically
‘the alternation in the nominative cell of the Gurindji Kriol case paradigm between
zero and -ngku’. They identify three factors which govern the variation: (i)
transitivity; (ii) priming by a preceding subject in the discourse; (iii) presence of
a co-referential (crossreferential) pronoun. This obviously expands the domain
within which morphological complexity is considered. I think it may be ques-
tioned if this variation is to be treated within morphology at all; it looks similar to
other cases of differential argument marking and would naturally be seen as a
syntactic phenomenon. On the other hand, as I said above, seeing complexity only
from a module-internal perspective can be seen as artificial and may prevent us
from making relevant generalizations. In this case, we seem to be dealing with
phenomena that were discussed in Dahl (2004: 128–34) under the rubrics ‘pattern
competition’ and ‘pattern regulation’. I was mainly interested in what happens
during grammaticalization in a single language, but it seems that what I said can
be generalized to contact situations. My main point was that competition between
two patterns, whether lexical or grammatical, may lead to an increase in com-
plexity. As long as the patterns are in free variation, the increase is minimal (and
does not lead to any significant difficulty for learners and users), but there appears
to be a universal tendency towards regulation of the variation, which at the initial
stages shows itself merely in the form of tendencies.
13.9 Conclusion
The chapters of the volume that I have looked at here are those in which there is
explicit discussion of the basic notions relating to complexity employed in the
chapters. Time and space considerations do not allow me to comment on the
others, in spite of many of them being on topics that are of direct interest to me.
One reflection is that the study of morphological complexity has still quite some
way to go before there is a set of shared notions and standard works that everyone
refers to. Which approaches will prevail in the long run is obviously an open
question. It is notable that both the notion of minimum description length and
Ackerman & Malouf’s notion of integrative complexity are ultimately based on
information theory. It is not excluded that we will see other applications of this
theory in the future.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
References
Abel, Jennifer (2006). ‘That crazy idea of hers: The English double genitive as a focus construc-
tion’, Canadian Journal of Linguistics 51(1): 1–14. doi:10.1017/S0008413100003790
Aboh, Enoch O. (2009). ‘Competition and selection: That’s all!’, in Enoch O. Aboh and
Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins,
317–44. doi:10.1075/cll.35.20abo
Aboh, Enoch O. (2015). The Emergence of Hybrid Grammars. Cambridge: Cambridge
University Press. doi:10.1017/CBO9781139024167
Aboh, Enoch O. and Umberto Ansaldo (2007). ‘The role of typology in language creation’,
in Umberto Ansaldo, Stephen Matthews, and Lisa Lim (eds), Deconstructing Creole.
Amsterdam: John Benjamins, 39–66. doi:10.1075/tsl.73.05abo
Abouda, Lotfi and Marie Skrovec (2015). ‘Du rapport entre formes synthétique et analy-
tique du futur. Étude de la variable modale dans un corpus oral micro-diachronique’,
Revue de Sémantique et Pragmatique 38: 35–57.
Abouda, Lotfi and Marie Skrovec (2017). ‘Du rapport micro-diachronique futur simple/
futur périphrastique en français moderne. Étude des variables temporelles et aspec-
tuelles’, Corela, HS-21. URL: http://corela.revues.org/4804
Ackerman, Farrell, James Blevins, and Robert Malouf (2009). ‘Parts and wholes: Implicative
patterns in inflectional paradigms’, in James P. Blevins and Juliette Blevins (eds), Analogy
in Grammar: Form and Acquisition. Oxford: Oxford University Press, 54–82.
Ackerman, Farrell and Robert Malouf (2013). ‘Morphological organization: The Low
Conditional Entropy Conjecture’, Language 89(3): 429–64. doi:10.1353/lan.2013.0054.
Ackerman, Farrell and Robert Malouf (2015). ‘The No Blur Principle effects as an emergent
property of language systems’, Proceedings of the 41st Annual Meeting of the Berkeley
Linguistics Society. Berkeley, CA, 1–14. doi:10.20354/B4414110014
Ackerman, Farrell and Robert Malouf (2016). ‘Word and pattern morphology: An
information-theoretic approach’, Word Structure 9: 125–31. doi:10.3366/word.2016.0090
Agbetsoamedo, Yvonne (2014). ‘Noun classes in Sɛlɛɛ’, The Journal of West African
Languages 41: 95–124.
Aglarov, M. A. (1988). Sel’skaja obsčina v Nagornom Dagestane v XVII-načale XIX v.
Moscow: Nauka.
Aikhenvald, Alexandra Y. (2000). Classifiers: A Typology of Noun Categorization Devices.
Oxford: Oxford University Press.
Aikhenvald, Alexandra Y. (2002). Language Contact in Amazonia. Oxford: Oxford
University Press.
Aikhenvald, Alexandra Y. (2003a). ‘Mechanisms of change in areal diffusion: New morph-
ology and language contact’, Journal of Linguistics 39(1): 1–29. doi:10.1017/
S0022226702001937
Aikhenvald, Alexandra Y. (2003b). A Grammar of Tariana. Cambridge: Cambridge
University Press.
Aikhenvald, Alexandra Y. (2004). Evidentiality. Oxford: Oxford University Press.
Aikhenvald, Alexandra Y. and Robert M. W. Dixon (1998). ‘Evidentials and areal typology:
A case study from Amazonia’, Language Sciences 20: 241–57.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
346
347
Arnott, David Whitehorn (1970). The Nominal and Verbal Systems of Fula. Oxford:
Clarendon.
Aronoff, Mark (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA:
The MIT Press.
Aronoff, Mark (1998). ‘Isomorphism and monotonicity: Or the disease model of morph-
ology’, in Steven Lapointe, Diane Brentari, and Patrick Farrell (eds), Morphology and Its
Relation to Phonology and Syntax. Stanford, CA: CSLI Publications, 411–18.
Aronoff, Mark (2015). ‘Thoughts on morphology and cultural evolution’, in Laurie Bauer,
Lívia Körtvélyessy, and Pavol Štekauer (eds), Semantics of Complex Words. Cham:
Springer, 277–88. doi:10.1007/978-3-319-14102-2_13
Aski, Janice M. (1995). ‘Verbal suppletion: An analysis of Italian, French and Spanish to go’,
Linguistics 33(3): 403–32. doi:10.1515/ling.1995.33.3.403
Atkinson, Mark, Kenny Smith, and Simon Kirby (2018). ‘Adult learning and language
simplification’, Cognitive Science 42(8): 2818–54. doi:10.1111/cogs.12686
Audring, Jenny (2014). ‘Gender as a complex feature’, Language Sciences 43: 5–17.
doi:10.1016/j.langsci.2013.10.003
Audring, Jenny (2017). ‘Calibrating complexity: How complex is a gender system?’,
Language Sciences 60: 53–68. doi:10.1016/j.langsci.2016.09.003
Audring, Jenny (2019). ‘Canonical, complex, complicated?’, in Francesca Di Garbo, Bruno
Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity,
vol. I: General Issues and Specific Studies. Berlin: Language Science Press, 15–52. URL:
http://langsci-press.org/catalog/book/223
Azen, Razia and Nicole Traxel (2009). ‘Using dominance analysis to determine predictor
importance in logistic regression’, Journal of Educational and Behavioral Sciences 34(3):
319–47. doi:10.3102/1076998609332754
Baayen, R. Harald (2001). Word Frequency Distributions. Dordrecht: Kluwer Academic
Publishers.
Baayen, R. Harald (2007). ‘Storage and computation in the mental lexicon’, in Gonia
Jarema and Gary Libben (eds), The Mental Lexicon: Core Perspectives. Amsterdam:
Elsevier, 81–104.
Baayen, R. Harald (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics
Using R. Cambridge: Cambridge University Press.
Baayen, R. Harald, Rochelle Lieber, and Robert Schreuder (1997). ‘The morphological
complexity of simplex nouns’, Linguistics 35: 861–77. doi:10.1515/ling.1997.35.5.861
Baayen, R. Harald, Petar Milin, Dusica Filipović Đurđević, Peter Hendrix, and Marco
Marelli (2011). ‘An amorphous model for morphological processing in visual compre-
hension based on naive discriminative learning’, Psychological Review 118(3): 438–81.
doi:10.1037/a0023851
Baayen, R. Harald, Lee H. Wurm, and Joanna Aycock (2007). ‘Lexical dynamics for low-
frequency complex words: A regression study across tasks and modalities’, The Mental
Lexicon 2(3): 419–63. doi:10.1075/ml.2.3.06baa
Babou, Cheikh Anta and Michele Loporcaro (2016). ‘Noun classes and grammatical gender
in Wolof ’, Journal of African Languages and Linguistics 37(1): 1–57. doi:10.1515/jall-
2016-0001
Baechler, Raffaela (2017). Absolute Komplexität in der Nominalflexion. Berlin: Language
Science Press. URL: http://langsci-press.org/catalog/book/134
Baechler, Raffaela and Guido Seiler (eds) (2016). Complexity, Isolation, and Variation.
Berlin: De Gruyter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
348
349
350
Bickel, Balthasar, Goma Banjade, Martin Gaenszle, Elena Lieven, Netra Prasad Paudyal,
Ichchha Purna Rai, Manoj Rai, Novel Kishore Rai, and Sabine Stoll (2007). ‘Free prefix
ordering in Chintang’, Language, 83(1): 43–73. doi:10.1353/lan.2007.0002
Bickel, Balthasar and Johanna Nichols (2002). ‘Autotypologizing databases and their use in
fieldwork’, in Peter Austin, Helen Dry, and Peter Wittenburg (eds), International LREC
Workshop on Resources and Tools in Field Linguistics, Las Palmas, 26–7 May 2002.
Nijmegen: Max Planck Institute for Psycholinguistics.
Bickel, Balthasar and Johanna Nichols (2005). ‘Inflectional synthesis of the verb’, in Martin
Haspelmath, Matthew Dryer, David Gil, and Bernard Comrie (eds), The World Atlas of
Language Structures. Oxford: Oxford University Press, 94–7.
Bickel, Balthasar and Johanna Nichols (2007). ‘Inflectional morphology’, in Timothy
Shopen (ed.), Language Typology and Syntactic Description, vol. 3: Grammatical
Categories and the Lexicon. Cambridge: Cambridge University Press, 169–240.
Bickel, Balthasar and Johanna Nichols (2013). ‘Inflectional synthesis of the verb’, in
Matthew Dryer and Martin Haspelmath (eds), World Atlas of Language Structures
Online. URL: http://wals.info/chapter/22
Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine
Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga, and John B. Lowe
(2017). The Autotyp typological databases. Version 0.1.0. URL: https://github.com/
autotyp/autotyp-data/tree/0.1.0
Bickel, Balthasar and Fernando Zúñiga (2017). ‘The “word” in polysynthetic languages:
Phonological and syntactic challenges’, in Michael Fortescue, Marianne Mithun, and
Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University
Press, 158–85.
Bickerton, Derek (1981). Roots of Language. Ann Arbor, MI: Karoma.
Bickerton, Derek (1984). ‘The language bioprogram hypothesis’, Behavioral and Brain
Sciences 7(2): 173–88. doi:10.1017/S0140525X00044149
Bickerton, Derek (1988). ‘Creole languages and the bioprogram’, in Frederick Newmeyer
(ed.), Linguistics: The Cambridge Survey, vol. 2: Linguistic Theory. Extensions and
Implications. Cambridge: Cambridge University Press, 268–84.
Birchall, Joshua (2014). Argument Marking Patterns in South American Languages.
Universiteit Nijmegen PhD dissertation.
Blasi, E. Damián, Susanne Maria Michaelis, and Martin Haspelmath (2017). ‘Grammars are
robustly transmitted even during the emergence of creole languages’, Nature Human
Behaviour 1: 723–9. doi:10.1038/s41562-017-0192-4
Blench, Roger (2009). ‘Do the Ghana-Togo mountain languages constitute a genetic
group?’, The Journal of West African Languages 36(1–2): 19–36.
Blevins, James P. (2006). ‘Word-based morphology’, Journal of Linguistics 42(3): 531–73.
doi:10.1017/S0022226706004191
Blevins, James P. (2013). ‘Word-based morphology from Aristotle to modern WP (Word
and Paradigm models)’, in Keith Allen (ed.), The Oxford Handbook of the History of
Linguistics. Oxford: Oxford University Press, 375–95.
Blevins, James P. (2016a). ‘The minimal sign’, in Gregory Stump and Andrew Hippisley
(eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University
Press, 50–69.
Blevins, James P. (2016b). Word and Paradigm Morphology. Oxford: Oxford University
Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
351
Blevins, James P., Petar Milin, and Michael Ramscar (2017). ‘The Zipfian paradigm cell
filling problem’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives
on Morphological Structure: Data and Analyses. Leiden: Brill, 141–58.
Bloomfield, Leonard (1914). ‘Sentence and word’, Transactions and Proceedings of the
American Philological Association 45: 65–75.
Bloomfield, Leonard (1933). Language. New York: Holt.
Blythe, Joe (2009). Doing Referring in Murriny Patha Conversation. University of Sydney
PhD dissertation.
Blythe, Joe, Rachel Nordlinger, and Nicholas Reid (2007). ‘Murriny Patha finite verb
paradigms’. Unpublished ms.
Boilat, David (1858). Grammaire de la langue woloffe. Paris: Imprimerie Impériale. URL:
http://babel.hathitrust.org/cgi/pt?id=wu.89012299343;view=1up;seq=11
Bokamba, Eyamba (1977). ‘The impact of multilingualism on language structures: The case
of Central Africa’, Anthropological Linguistics 19: 181–202.
Bolaños, Katherine (2016). A Grammar of Kakua. Utrecht: LOT.
Bonami, Olivier (2013). ‘Towards a robust assessment of implicative relations in inflec-
tional systems’. Paper given at the ‘Workshop on Computational Approaches to
Morphological Complexity’, Paris.
Bonami, Olivier (2015). ‘Periphrasis as collocation’, Morphology 25: 63–110. doi:10.1007/
s11525-015-9254-3
Bonami, Olivier and Sarah Beniamine (2015). ‘Implicative structure and joint predictive-
ness’, in Vito Pirrelli, Claudia Marzi, and Marcello Ferro (eds), Word Structure and
Word Usage: Proceedings of the NetWordS Final Conference, Pisa, Italy, March 30–April
1, 2015. Pisa: Institute for Computational Linguistics, National Research Council, 4–9.
Bonami, Olivier and Sarah Beniamine (2016). ‘Joint predictiveness in inflectional para-
digms’, Word Structure 9(2): 156–82. doi:10.3366/word.2016.0092
Bonami, Olivier and Gilles Boyé (2002). ‘Suppletion and dependency in inflectional
morphology’, in Frank van Eynde, Lars Hellan, and Dorothee Beermann (eds),
Proceedings of the 8th International Conference on Head-Driven Phrase Structure
Grammar. Stanford: CSLI, 51–70.
Bonami, Olivier and Gilles Boyé (2003). ‘Supplétion et classes flexionnelles dans la con-
jugaison du français’, Langages 15: 102–26.
Bonami, Olivier and Gilles Boyé (2007). ‘French pronominal clitics and the design of
Paradigm Function Morphology’, in Geert E. Booij, Luca Ducceschi, Bernard Fradin,
Emiliano Guevara, Angela Ralli, and Sergio Scalise (eds), On-line Proceedings of the Fifth
Mediterranean Morphology Meeting (MMM5) Fréjus, 15–18 September 2005. Bologna:
University of Bologna, 291–322.
Bonami, Olivier, Gilles Boyé, and Fabiola Henri (2011). ‘Measuring inflectional complexity:
French and Mauritian’. Paper given at the ‘Workshop on Quantitative Measures in
Morphology and Morphological Development’, San Diego.
Bonami, Olivier, Gilles Boyé, and Françoise Kerleroux (2009). ‘L’allomorphie radicale et la
relation flexion-construction’, in Bernard Fradin, Françoise Kerleroux, and Marc Plénat
(eds), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de
Vincennes, 103–25.
Bonami, Olivier and Fabiola Henri (2010). ‘Assessing empirically the complexity of
Mauritian Creole’. Paper given at the conference ‘Formal Approaches to Creole
Studies 2’, Berlin.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
352
Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2013). ‘Comparing sources of inflectional
morphology in Romance-based creoles’. Paper given at the workshop ‘Portuguese-based
Creoles in Perspective’, Coimbra.
Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2015). ‘Making sense of morphological
complexity’. Paper given at the ‘SeePiCLa Meeting’, Lisbon.
Bond, Oliver, Greville G. Corbett, Marina Chumakina, and Dunstan Brown (eds) (2016).
Archi: Complexities of Agreement in Cross-theoretical Perspective. Oxford: Oxford
University Press.
Booij, Geert E. (1993). ‘Against split morphology’, in Geert E. Booij and Jaap van Marle
(eds), Yearbook of Morphology 1993. Dordrecht: Kluwer, 27–49. doi:10.1007/978-94-
017-3712-8_2
Booij, Geert E. (1997). ‘Allomorphy and the autonomy of morphology’, Folia Linguistica
31: 25–56. doi:10.1515/flin.1997.31.1-2.25
Booij, Geert E. (2010). Construction Morphology. Oxford: Oxford University Press.
Boyé, Gilles and Patricia Cabredo Hofherr (2006). ‘The structure of allomorphy in Spanish
verbal inflection’, Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 13:
9–24.
Bozic, Mirjana and William Marslen-Wilson (2010). ‘Neurocognitive contexts for mor-
phological complexity: Dissociating inflection and derivation’, Language and Linguistics
Compass 4(11): 1063–73. doi:10.1111/j.1749-818X.2010.00254.x
Brandão, Ana Paula B. (2014). A Reference Grammar of Paresi-Haliti (Arawak). University
of Texas at Austin PhD dissertation.
Bresnan, Joan (2007). ‘Is syntactic knowledge probabilistic? Experiments with the English
dative alternation’, in Sam Featherston and Wolfgang Sternefeld (eds), Roots: Linguistics
in Search of Its Evidential Base. Berlin: Mouton de Gruyter, 77–96.
Bresnan, Joan and Marilyn Ford (2013). ‘Predicting syntax: Processing dative constructions
in American and Australian varieties of English’, Language 86(1): 186–213. doi:10.1353/
lan.0.0189
Brown, Dunstan, Greville G. Corbett, Norman M. Fraser, Andrew Hippisley, and Alan
Timberlake (1996). ‘Russian noun stress and Network Morphology’, Linguistics 34(1):
53–107. doi:10.1515/ling.1996.34.1.53
Brown, Dunstan and Andrew Hippisley (2012). Network Morphology: A Defaults-Based
Theory of Word Structure. Cambridge: Cambridge University Press.
Burzio, Luigi (2004). ‘Paradigmatic and syntagmatic relations in Italian verbal inflection’, in
Julie Auger, J. Clancy Clements, and Barbara Vance (eds), Contemporary Approaches to
Romance Linguistics. Amsterdam: John Benjamins, 17–44.
Bybee, Joan L. (1985). Morphology: A Study of the Relation between Meaning and Form.
Amsterdam: John Benjamins.
Bybee, Joan L. (1995). ‘Regular morphology and the lexicon’, Language and Cognitive
Processes 10(5): 425–55. doi:10.1080/01690969508407111
Bybee, Joan L. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford
University Press.
Bybee, Joan L. and Clay Beckner (2015). ‘Language use, cognitive processes, and linguistic
change’, in Claire Bowern and Bethwyn Evans (eds), The Routledge Handbook of
Historical Linguistics. London: Routledge, 503–18.
Bybee, Joan L. and Carol Lynn Moder (1983). ‘Morphological classes as natural categories’,
Language 59: 251–70. doi:10.2307/413574
Bybee, Joan and Dan I. Slobin (1982). ‘Rules and schemas in the development and use of the
English past tense’, Language 58(2): 265–89. doi:10.2307/414099
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
353
354
Comrie, Bernard (1992). ‘Before complexity’, in John A. Hawkins and Murray Gell-Mann
(eds), The Evolution of Human Languages. London: Addison-Wesley, 193–211.
Comrie, Bernard, Lucía A. Golluscio, Hebe Gonzáles, and Alejandra Vidal (2010). ‘El
Chaco como área lingüística’, in Z. Estrada Fernández and R. Arzápalo Marín (eds),
Estudios de lenguas amerindias, vol. 2: Contribuciones al estudio de las lenguas originarias
de América. Hermosillo, Sonora (Mexico): Editorial Unison, 85–130.
Corbett, Greville G. (1982). ‘Gender in Russian: An account of gender specification and its
relationship to declension’, Russian Linguistics 6(2): 197–232.
Corbett, Greville G. (1991). Gender. Cambridge: Cambridge University Press.
Corbett, Greville G. (2000). Number. Cambridge: Cambridge University Press.
Corbett, Greville G. (2007). ‘Canonical typology, suppletion, and possible words’, Language
83(1): 8–42. doi:10.1353/lan.2007.0006
Corbett, Greville G. (2009). ‘Suppletion: Typology, markedness, complexity’, in Patrick
O. Steinkrüger and Manfred Krifka (eds), On Inflection. Berlin: Mouton de Gruyter,
25–40.
Corbett, Greville G. (2013a). ‘Canonical morphosyntactic features’, in Dunstan Brown,
Marina Chumakina, and Greville Corbett (eds), Canonical Morphology and Syntax.
Oxford: Oxford University Press, 48–65.
Corbett, Greville G. (2013b). ‘The unique challenge of the Archi paradigm’, in Chundra
Cathcart, Shinae Kang, and Clare S. Sandy (eds), Proceedings of the 37th Annual Meeting,
Berkeley Linguistics Society: Special Session on Languages of the Caucasus, 52–67.
Corbett, Greville G. (2015). ‘Morphosyntactic complexity: A typology of lexical splits’,
Language 91(1): 145–93. doi:10.1353/lan.2015.0003
Corbett, Greville G. and Sebastian Fedden (2016). ‘Canonical gender’, Journal of Linguistics
52: 495–531. doi:10.1017/S0022226715000195
Corbett, Greville G. and Norman M. Fraser (1993). ‘Network Morphology: A DATR
account of Russian nominal inflection’, Journal of Linguistics 29(1): 113–42.
doi:10.1017/S0022226700000074
Corbett, Greville G., Andrew Hippisley, Dunstan Brown, and Paul Marriott (2001).
‘Frequency, regularity and the paradigm: A perspective from Russian on a complex
relation’, in Joan Bybee and Paul J. Hopper (eds), Frequency and the Emergence of
Linguistic Structure. Amsterdam: John Benjamins, 201–26.
Corne, Chris (1982). ‘A contrastive analysis of Reunion and Isle de France Creole French:
Two typologically diverse languages’, in Philip Baker and Chris Corne (eds), Isle de
France Creole: Affinities and Origins. Ann Arbor, MI: Karoma, 8–129.
Corne, Chris (1999). From French to Creole. London: University of Westminster Press.
Cotterell, Ryan, Christo Kirov, Mans Hulden, and Jason Eisner (2019). ‘On the complexity
and typology of inflectional morphological systems’, Transactions of the Association for
Computational Linguistics 7: 327–42. doi: 10.1162/tacl_a_00271
Crevels, Mily and Hein van der Voort (2008). ‘The Guaporé-Mamoré Region as a Linguistic
Area’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam:
John Benjamins, 151–79.
Croft, William (1991). Syntactic Categories and Grammatical Relations: The Cognitive
Organization of Information. Chicago: University of Chicago Press.
Croft, William (2001). Radical Construction Grammar: Syntactic Theory in Typological
Perspective. Oxford: Oxford University Press.
Cruschina, Silvio, Martin Maiden, and John C. Smith (eds) (2013). The Boundaries of Pure
Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
355
356
357
Doneux, Jean Léonce (1975). ‘Hypothèses pour la comparative des langues atlantiques’,
Africana Linguistica 6: 41–129.
Doneux, Jean Léonce (1978). ‘Les liens historiques entre les langues du Sénégal’, Réalités
africaines et langue française 7: 6–55.
Donohue, Mark (2009). ‘Flores languages’, in Keith Brown and Sarah Ogilvie (eds), Concise
Encyclopedia of Languages of the World. Oxford: Elsevier, 420–1.
Donohue, Mark and Tim Denham (to appear). ‘Becoming Austronesian: Mechanisms of
language dispersal across southern island Southeast Asia’, in David Gil and Antoinette
Schapper (eds), Austronesian Undressed.
Donohue, Mark and Johanna Nichols (2011). ‘Does phoneme inventory size correlate with
population size?’, Linguistic Typology 15(2): 161–70. doi:10.1515/lity.2011.011
Dorian, Nancy (1978). ‘The fate of morphological complexity in language death: Evidence
from East Sutherland Gaelic’, Language 54(3): 590–609.
Dressler, Wolfgang U. (2003). ‘Degrees of grammatical productivity in inflectional morph-
ology’, Italian Journal of Linguistics 15(1): 31–62.
Dressler, Wolfgang U. (2005). ‘Morphological typology and first language acquisition:
Some mutual challenges’, in Geert E. Booij, Emiliano Guevara, Angela Ralli, Salvatore
Sgroi, and Sergio Scalise (eds), Morphology and Linguistic Typology: On-line Proceedings
of the Fourth Mediterranean Morphology Meeting (MMM4), Catania, 21–23 September
2003, 7–20.
Dressler, Wolfgang U. (2011). ‘The rise of complexity in inflectional morphology’, Poznań
Studies in Contemporary Linguistics 47(2): 159–76. doi:10.2478/psicl-2011-0013
Dressler, Wolfgang U. (2019). ‘Natural morphology’, in Mark Aronoff (ed.), The Oxford
Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/
acrefore/9780199384655.013.576
Dressler, Wolfgang U. and Marianne Kilani-Schoch (2016). ‘Natural morphology’, in
Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of
Morphology. Cambridge: Cambridge University Press, 356–89.
Dressler, Wolfgang U., Alona Kononenko, Sabine Sommer-Lolei, Katharina Korecky-Kröll,
Paulina Zydorowicz, and Laura Kamandulytė-Merfeldienė (2019). ‘Morphological rich-
ness, transparency and the evolution of morphonotactic patterns’, Folia Linguistica
s40(1): 85–106. doi:10.1515/flih-2019-0005
Dressler, Wolfgang U., Willi Mayerthaler, Oswald Panagl, and Wolfgang U. Wurzel (1987).
Leitmotifs in Natural Morphology. Amsterdam: John Benjamins.
Dressler, Wolfgang U., Sabine Sommer-Lolei, Katharina Korecky-Kröll, Reili Argus, Ineta
Dabašinskienė, Laura Kamandulytė-Merfeldienė, Johanna J. Ijäs, Victoria
V. Kazakovskaya, Klaus Laalo, and Evangelia Thomadaki (2019). ‘First-language acqui-
sition of synthetic compounds in Estonian, Finnish, German, Greek, Lithuanian, Russian
and Saami’, Morphology 29(3): 409–29. doi:10.1007/s11525-019-09339-0
Dryer, Matthew S. (2013). ‘Coding of nominal plurality’, in Matthew S. Dryer and Martin
Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck
Institute for Evolutionary Anthropology. URL: https://wals.info/chapter/33
Dryer, Matthew and Martin Haspelmath (eds) (2013). The World Atlas of Language
Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL:
http://wals.info
Duke, Janet (2010). ‘Gender reduction and loss in Germanic: The Scandinavian, Dutch, and
Afrikaans case studies’, in Antje Dammel, Sebastian Kürschner, and Damaris Nübling
(eds), Kontrastive germanistische Linguistik. Hildesheim: Olms, 643–72.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
358
359
Fenk-Oczlon, Gertraud and August Fenk (2014). ‘Complexity trade-offs do not prove the
equal complexity hypothesis’, Poznań Studies in Contemporary Linguistics 50(2): 145–55.
doi:10.1515/psicl-2014-0010
Ferguson, Charles A. (1971). ‘Absence of copula and the notion of simplicity: A study of
normal speech, baby talk, foreigner talk, and pidgins’, in Dell Hymes (ed.), Pidginization
and Creolization of Languages. Cambridge: Cambridge University Press, 141–50.
Ferronha, António Luís (ed.) (1994). Tratado breve dos Rios de Guiné do Cabo-Verde. Feito
pelo Capitão André Álvares d’Almada. Ano de 1594. Lisboa: Grupo de Trabalho do
Ministério da Educação para as Comemorações dos Descobrimentos Portugueses.
Ferry, Marie-Paule and Konstantin Pozdniakov (2001). ‘Dialectique du régulier et de
l’irrégulier. Le système des classes nominales dans le groupe tenda des langues atlan-
tiques’, in Robert Nicolaï (ed.), Leçons d’Afrique. Filiations, ruptures et reconstitution de
langues. Un hommage à Gabriel Manessy. Louvain: Peeters, 153–67.
Fertig, David (2000). Morphological Change Up Close: Two and a Half Centuries of Verbal
Inflection in Nuremberg. Berlin: De Gruyter.
Field, Andy, Jeremy Miles, and Zoë Field (2012). Discovering Statistics Using R. London:
Sage.
Finkel, Raphael and Gregory Stump (2007). ‘Principal parts and morphological typology’,
Morphology 17(1): 39–75. doi:10.1007/s11525-007-9115-9
Finkel, Raphael and Gregory Stump (2009). ‘Principal parts and degrees of paradigmatic
transparency’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form
and Acquisition. Oxford: Oxford University Press, 13–53.
Finkel, Raphael and Gregory Stump (2013). Principal parts analyzer. URL: http://www.cs.
uky.edu/raphael/linguistics/analyze.html (accessed July 2016).
Fiorentino, Robert and David Poeppel (2007). ‘Compound words and structure in the
lexicon’, Language and Cognitive Processes 22(7): 953–1000. doi:10.1080/
01690960701190215
Fitch, W. Tecumseh and Marc D. Hauser (2004). ‘Computational constraints on syntactic
processing in a nonhuman primate’, Science 303(5656): 377–80. doi:10.1126/
science.1089401
Fleck, David (2007). ‘Evidentiality and double tense in Matses’, Language 83: 589–614.
doi:10.1353/lan.2007.0113
Forshaw, William (2016). Little Kids, Big Verbs: The Acquisition of Murrinhpatha Bipartite
Stem Verbs. University of Melbourne PhD dissertation.
Fortescue, Michael (1992). ‘Morphophonemic complexity and typological stability in a
polysynthetic language family’, International Journal of American Linguistics 58(2):
242–8. doi:10.1086/ijal.58.2.3519761
Fowler, Catherine S. (1972). ‘Some ecological clues to Proto-Numic homelands’, in Don
D. Fowler (ed.), Great Basin Cultural Ecology: A Symposium. Reno Desert Research
Institute Publications in the Social Sciences, 105–21.
Frenda, Alessio (2011). ‘Gender in Irish between continuity and change’, Folia Linguistica
45: 283–316. doi:10.1515/flin.2011.012
Gabas Jr, Nilson (1999). A Grammar of Karo, Tupi (Brazil). University of California at
Santa Barbara PhD dissertation.
Gal, Susan (1989). ‘Lexical innovation and loss: Restricted Hungarian’, in Nancy Dorian
(ed.), Investigating Obsolescence: Studies in Language Contraction and Death.
Cambridge: Cambridge University Press, 313–31.
Gamble, David (1957). Elementary Wolof Grammar. London: Research Department
Colonial Office. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
360
361
Anthropological Studies with Special Emphasis on the Languages and Cultures of the
Andean-Amazonian Border Area. Leiden: Leiden University, 401–28.
Good, Jeff (2012a). ‘How to become a “Kwa” noun’, Morphology 22: 293–335. doi:10.1007/
s11525-011-9197-2
Good, Jeff (2012b). ‘Typologizing grammatical complexities: Or why creoles may be
paradigmatically simple but syntagmatically average’, Journal of Pidgin and Creole
Languages 27(1): 1–47. doi:10.1075/jpcl.27.1.01goo
Good, Jeff (2015). ‘Paradigmatic complexity in pidgins and creoles’, Word Structure 8(2):
184–227. doi:10.3366/word.2015.0081
Good, Jeff (2016). The Linguistic Typology of Templates. Cambridge: Cambridge University
Press.
Grant, Anthony P. (1996). ‘The evolution of functional categories in Grande Ronde
Chinook Jargon: Ethnolinguistic and grammatical considerations’, in Philip Baker and
Anand Syea (eds), Changing Meanings, Changing Functions: Papers Relating to
Grammaticalization in Creole Languages. London: University of Westminster Press,
225–42.
Grant, Anthony (2009). ‘Admixture, structural transmission, simplicity and complexity’, in
Nicholas Faraclas and Thomas Klein (eds), Simplicity and Complexity in Creoles and
Pidgins. London: Battlebridge Publications, 125–52.
Green, Ian (2003). ‘The genetic status of Murrinh-patha’, in Nicholas Evans (ed.), The Non-
Pama-Nyungan Languages of Northern Australia. Canberra: Pacific Linguistics, 125–58.
Greenberg, Joseph H. (1954). ‘A quantitative approach to the morphological typology of
language’, in Robert F. Spencer (ed.), Method and Perspective in Anthropology: Papers in
Honor of Wilson D. Wallis. Minneapolis: Minnesota University Press, 192–220.
Greenberg, Joseph H. (1960). ‘A quantitative approach to the morphological typology of
language’, International Journal of American Linguistics 26(3): 178–94. doi:10.1086/
464575
Grijns, Cornelis D. (1991). Jakarta Malay: A Multidimensional Approach to Spatial
Variation. Leiden: KITLV Press.
Grinevald, Colette and Frank Seifart (2004). ‘Noun classes in African and Amazonian
languages: Towards a comparison’, Linguistic Typology 8: 243–85. doi:10.1515/
lity.2004.007
Grünwald, Peter D. (2007). The Minimum Description Length Principle. Cambridge, MA:
The MIT Press.
Guérin, Maximilien (2011). Le syntagme nominal en wolof. Une approche typologique. Paris:
Université Sorbonne Nouvelle—Paris 3 MA thesis.
Guillaume, Antoine (2008). A Grammar of Cavineña. Berlin: Mouton de Gruyter.
Guillaume, Antoine (2016). ‘Associated motion in South America: Typological and areal
perspectives’, Linguistic Typology 20: 81–177. doi:10.1515/lingty-2016-0003
Guillaume, Antoine and Françoise Rose (2010). ‘Sociative causative markers in South
American languages: A possible areal feature’, in Franck Floricic (ed.), Essais de typologie
et de linguistique générale, Mélanges offerts à Denis Creissels. Lyon: ENS Éditions,
383–402.
Guy, Gregory (1991). ‘Explanation in variable phonology: An exponential model of mor-
phological constraints’, Language Variation and Change 3: 1–22. doi:10.1017/
S0954394500000429
Hale, Kenneth (1969). Walbiri Conjugations. Cambridge, MA: MIT.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
362
Halle, Moris (1994). ‘The Russian declension: An illustration of the theory of Distributed
Morphology’, in Jennifer S. Cole and Charles Kisseberth (eds), Perspectives in Phonology.
Stanford: CSLI Publications, 29–60.
Hammarström, Harald, Robert Forkel, and Martin Haspelmath (eds) (2019). Glottolog 3.4.
Jena: Max Planck Institute for the Science of Human History. URL: https://glottolog.org
Hansson, Inga-Lill (2003). ‘Akha’, in Randy LaPolla and Graham Thurgood (eds). The
Sino-Tibetan Languages. London: Routledge, 236–51.
Harris, Alice (2004). ‘History in support of synchrony’, in Charles Chang, Michael
J. Houser, Yuni Kim, David Mortensen, and Mischa Park-Doob (eds), Proceedings of
the Berkeley Linguistics Society. Berkeley Linguistics Society, 142–59.
Harris, Alice (2017). Multiple Exponence. Oxford: Oxford University Press.
Harris, Alice and Lyle Campbell (1995). Historical Syntax in Cross-linguistic Perspective.
Cambridge: University of Cambridge Press.
Haspelmath, Martin (2009). ‘An empirical test of the Agglutination Hypothesis’, in Sergio
Scalise, Elisabetta Magni, and Antonietta Bisetto (eds), Universals of Language Today.
Dordrecht: Springer, 13–29.
Haspelmath, Martin (2011). ‘The indeterminacy of word segmentation and the nature of
morphology and syntax’, Folia Linguistica 45(1): 31–80. doi:10.1515/flin-2017-1005
Haspelmath, Martin, Matthew Dryer, David Gil, and Bernard Comrie (eds) (2005). The
World Atlas of Language Structures. Oxford: Oxford University Press.
Haspelmath, Martin and Thomas Müller-Bardey (2004). ‘Valency change’, in Geert
E. Booij, Christian Lehmann, Joachim Mugdan, and Stavros Skopeteas (in collaboration
with Wolfgang Kesselheim) (eds), Morphology: A Handbook on Inflection and Word
Formation, vol. 2. Berlin: de Gruyter, 1130–45.
Haspelmath, Martin and Andrea D. Sims (2010). Understanding Morphology. 2nd ed.
London: Hodder Education.
Haude, Katharina (2006). A Grammar of Movima. Universiteit Nijmegen PhD dissertation.
Hauser, Marc D., Noam Chomsky, and Tecumseh W. Fitch (2002). ‘The faculty of
language: What is it, who has it, and how did it evolve?’, Science 298(5598): 1569–79.
doi:10.1126/science.298.5598.1569
Hawkins, John A. (2004). Efficiency and Complexity in Grammars. New York: Oxford
University Press.
Hawkins, John A. (2007). ‘Processing typology and why psychologists need to know about
it’, New Ideas in Psychology 25: 87–107. doi:10.1016/j.newideapsych.2007.02.003
Hawkins, John A. (2014). Cross-Linguistic Variation and Efficiency. Oxford: Oxford
University Press.
Hay, Jennifer (2001). ‘Lexical frequency in morphology: Is everything relative?’, Linguistics
39(6): 1041–70. doi:10.1515/ling.2001.041
Hay, Jennifer (2003). Causes and Consequences of Word Structure. New York: Routledge.
Hay, Jennifer and Laurie Bauer (2007). ‘Phoneme inventory size and population size’,
Language 83(2): 388–400. doi:10.1353/lan.2007.0071
Haynie, Hannah, Claire Bowern, Patience Epps, Jane Hill, and Patrick McConvell (2014).
‘Wanderwörter in languages of the Americas and Australia’, Ampersand 1: 1–18.
doi:10.1016/j.amper.2014.10.001
Hazaël-Massieux, Marie-Christine (2002). ‘Les créoles à base française: une introduction’,
Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA)
21: 63–86.
Hengeveld, Kees and Sterre Leufkens (2018). ‘Transparent and non-transparent languages’,
Folia Linguistica 52(1): 139–75. doi:10.1515/flin-2018-0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
363
364
Hull, Geoffrey (1998). ‘The basic lexical affinities of Timor’s Austronesian languages:
A preliminary investigation’, Studies in the Languages and Cultures of East Timor 1:
97–174.
Hull, Geoffrey (1999). Standard Tetum-English Dictionary. Sydney: Allen & Unwin.
Hultman, Oskar Fredrik (1894). De östsvenska dialekterna. Helsinki: Svenska
landsmålsföreningen.
Humboldt, Wilhelm von (1836). Über die Verschiedenheit des menschlichen Sprachbaues
und ihren Einfluss auf die geistige Entwickelung des Menschengeschlechts. Berlin:
F. Dümmler.
Hyman, Larry M. (2004). ‘How to become a Kwa verb’, Journal of West African Languages
30: 69–88.
Igartua, Iván (2019). ‘Loss of grammatical gender and language contact’, Diachronica 36:
181–221. doi:10.1075/dia.17004.iga
Irvine, Judith (1978). ‘Wolof noun classification: The social setting of divergent change’,
Language in Society 7: 37–64. doi:10.1017/S0047404500005327
Irvine, Judith (2011). ‘Société et communication chez les Wolof à travers le temps et
l’espace’, in Anna M. Diagne, Sascha Kesseler, and Christian Meyer (eds),
Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan,
37–70.
Jakobson, Roman (1929). Remarques sur l’évolution phonologique du russe comparée à celle
des autres langues slaves. Praha: Jednota československých matematiků a fysiků.
Jakobson, Roman (1959). ‘On linguistic aspects of translation’, in Reuben A. Brower (ed.),
On Translation. Cambridge, MA: Harvard University Press, 232–9.
Jamieson, Carole Ann (1982). ‘Conflated subsystems marking person and aspect in
Chiquihuitlán Mazatec verbs’, International Journal of American Linguistics 48(2):
139–67. doi:10.1086/465725
Janda, Laura A. (1994). ‘The spread of athematic 1sg -m in the major West Slavic
languages’, The Slavic and East European Journal 38(1): 90–119. doi:10.2307/308549
Janhunen, Juha (2008). ‘Mongolic as an expansive language family’, in Tokusu Kurebito
(ed.), Past and Present Dynamics: The Great Mongolian State. Tokyo: Tokyo University
of Foreign Studies, Research Institute for Languages and Cultures of Asia and Africa,
127–37.
Janse, Mark and Sijmen Tol (eds). (2003). Language Death and Language Maintenance:
Theoretical, Practical and Descriptive Approaches. Amsterdam: John Benjamins.
Jespersen, Otto (1949). A Modern English Grammar on Historical Principles. London: Allen
& Unwin.
Joanisse, Marc F. and Mark S. Seidenberg (2005). ‘Imaging the past: Neural activation in
frontal and temporal regions during regular and irregular past-tense processing’,
Cognitive, Affective & Behavioral Neuroscience 5(3): 282–96.
Johnson, Jacqueline S., Kenneth D. Shenkman, Elissa L. Newport, and Douglas L. Medin
(1996). ‘Indeterminacy in the grammar of adult language learners’, Journal of Memory
and Language 35: 335–52. doi:10.1006/jmla.1996.0019
Joseph, Brian D. and Richard D. Janda (1988). ‘The how and why of diachronic morpho-
logization and demorphologization’, in Michael Hammond and Michael Noonan (eds),
Theoretical Morphology. New York: Academic Press, 193–210.
Joseph, John E. and Frederick J. Newmeyer (2012). ‘ “All languages are equally complex”:
The rise and fall of a consensus’, Historiographia Linguistica 39(2–3): 341–68.
doi:10.1075/hl.39.2-3.08jos
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
365
Juola, Patrick (1998). ‘Measuring linguistic complexity: The morphological tier’, Journal of
Quantitative Linguistics 5: 206–13. doi:10.1080/09296179808590128
Karatsareas, Petros (2009). ‘The loss of grammatical gender in Cappadocian Greek’,
Transactions of the Philological Society 107: 196–230. doi:10.1111/j.1467-968X. 2009.01217.x
Karatsareas, Petros (2014). ‘On the diachrony of gender in Asia Minor Greek: The
development of semantic agreement in Pontic’, Language Sciences 43: 77–101.
doi:10.1016/j.langsci.2013.10.005
Kelly, Barbara, Gillian Wigglesworth, Rachel Nordlinger, and Joseph Blythe (2014). ‘The
acquisition of polysynthetic languages’, Language and Linguistics Compass 8(2): 51–64.
doi:10.1111/lnc3.12062
Kendall, Maurice and Jean Dickinson Gibbons (1990). Rank Correlation Methods. 5th ed.
Oxford: Oxford University Press.
Kibrik, Aleksandr E. (1991). ‘Organizing principles for nominal paradigms in Daghestanian
languages: Comparative and typological observations’, in Frans Plank (ed.), Paradigms:
The Economy of Inflection. Berlin: Mouton de Gruyter, 255–74.
Kibrik, Aleksandr E. (2003). ‘Nominal inflection galore: Daghestanian, with side glances at
Europe and the world’, in Frans Plank (ed.), Noun Phrase Structure in the Languages of
Europe. Berlin: Mouton de Gruyter, 37–112.
Kibrik, Andrej A. (2012). ‘What’s in the head of head-marking languages?’, in Pirkko
Suihkonen, Bernard Comrie, and Valery Solovyev (eds), Argument Structure and
Grammatical Relations: A Crosslinguistic Typology. Amsterdam: John Benjamins, 211–40.
Kielhorn, Franz (1871). The Paribhāṣenduśekhara of Nāgojībhaṭṭa (2 vols). Bombay: Indu-
Prakāsh Press.
Kihm, Alain (1994). Kriyol Syntax. Amsterdam: John Benjamins.
Kihm, Alain (2014). ‘Theories of morphology and theories of creole emergence: The inner
connection’. PAPIA, São Paulo, 24(1): 43–89.
Killian, Don (2015). Topics in Uduk Phonology and Morphosyntax. University of Helsinki
PhD dissertation.
Kirby, Simon, Hannah Cornish, and Kenny Smith (2008). ‘Cumulative cultural evolution in
the laboratory: An experimental approach to the origins of structure in human language’,
Proceedings of the National Academy of Sciences 105(31): 10681–6. doi:10.1073/
pnas.0707835105
Kiso, Andrea (2012). Tense and Aspect in Chichewa, Citumbuka and Cisena: A Description
and Comparison of the Tense-Aspect Systems in Three Southeastern Bantu Languages.
Stockholm University dissertation.
Klausenburger, Jurgen (1976). ‘(De)morphologization in Latin’, Lingua 40(4): 305–20.
doi:10.1016/0024-3841(76)90082-6
Klingler, Thomas (2003). If I Could Turn My Tongue Like That: The Creole of Pointe Coupee
Parish, Louisiana. Baton Rouge: Louisiana State University Press.
Kobès, Aloys (1869). Grammaire de la langue volofe. Ouvrage nouveau. Saint-Joseph de
Ngasobil: Imprimerie de la Mission.
Kobès, Aloys (1875). Dictionnaire volof-francais. Saint-Joseph de Ngasobil: Mission
Catholique [cited from the new edition: Kobès, Aloys and Olivier Abiven (1923),
Dictionnaire volof-francais. Nouvelle édition revue et considerablement augmentée par
le R. P. O. Abiven. Dakar: Mission Catholique].
Koopman, Hilda and Claire Lefebvre (1981). ‘Haitian Creole pu’, in Pieter C. Muysken
(ed.), Generative Studies on Creole Languages. Dordrecht: Foris, 201–21.
Koptjevskaja-Tamm, Maria and Bernhard Wälchli (2001). ‘The Circum-Baltic languages:
An areal-typological approach’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds),
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
366
367
Loporcaro, Michele (2018). Gender from Latin to Romance: History, Geography, Typology.
Oxford: Oxford University Press.
Loporcaro, Michele, Francesco Gardani, and Alberto Giudici (forthcoming). ‘Contact-
induced complexification in the gender system of Istro-Romanian’. Journal of
Language Contact.
Loporcaro, Michele and Tania Paciaroni (2011). ‘Four gender-systems in Indo-European’,
Folia Linguistica 45(2): 389–434. doi:10.1515/flin.2011.015
Lowe, Ivan (1999). ‘Nambiquara’, in Robert M. W. Dixon and Aikhenvald Y. Aikhenvald
(eds), The Amazonian Languages. Cambridge: Cambridge University Press, 269–92.
Ludwig, Ralph, Sylviane Telchid, and Florence Bruneau-Ludwig (eds) (2001). Corpus créole.
Hamburg: Helmut Buske.
Luís, Ana R. (2009). ‘The loss and survival of inflectional morphology: Contextual vs.
inherent inflection in creoles’, in Sonia Colina, Antxon Olarrea, and Ana Carvalho (eds),
Romance Linguistics 2009. Amsterdam: John Benjamins, 323–36.
Luís, Ana R. (2014). ‘Inflectional structure without morphemes: Similarities between
creoles and non-creoles’, PAPIA, São Paulo, 24(2): 381–406.
Lüpke, Friederike and Mary Raymond (eds) (2010). Documenting Atlantic-Mande
Convergence and Diversity. Special issue of the Journal of language contact—THEMA 3.
Lupyan, Gary and Rick Dale (2010). ‘Language structure is partly determined by social
structure’, PLoS ONE 5(1): e8559. doi:10.1371/journal.pone.0008559
MacWhinney, Brian, Elizabeth Bates, and Reinhold Kliegl (1984). ‘Cue validity and sen-
tence interpretation in English, German, and Italian’, Journal of Verbal Learning and
Verbal Behavior 23(2): 127–50. doi:10.1016/S0022-5371(84)90093-8
Madsen, David and David Rhode (1994). Across the West: Human Population Movement
and the Expansion of the Numa. Salt Lake City, UT: University of Utah Press.
Maiden, Martin (2005). ‘Morphological autonomy and diachrony’, in Geert E. Booij and
Jaap van Marle (eds), Yearbook of Morphology 2004. Dordrecht: Springer, 137–75.
doi:10.1007/1-4020-2900-4_6
Maiden, Martin (2013). ‘ “Semi-autonomous” morphology? A problem in the history of the
Italian (and Romanian) verb’, in Silvio Cruschina, Martin Maiden, and John C. Smith
(eds), The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives.
Oxford: Oxford University Press, 24–44.
Maiden, Martin (2018). The Romance Verb: Morphomic Structure and Diachrony. Oxford:
Oxford University Press.
Maiden, Martin, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds) (2011).
Morphological Autonomy: Perspectives from Romance Inflectional Morphology. Oxford:
Oxford University Press.
Maitz, Péter and Attila Németh (2014). ‘Language contact and morphosyntactic complex-
ity: Evidence from German’, Journal of Germanic Linguistics 26(1): 1–29. doi:10.1017/
S1470542713000184
Malone, Terrell A. (1988). ‘The origin and development of Tuyuca evidentials’,
International Journal of American Linguistics 54: 119–40. doi:10.1086/466079
Manessy, Gabriel and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et
de grammaire descriptive. Dakar: University of Dakar Press.
Mansfield, John (2014). Polysynthetic Sociolinguistics: The Language and Culture of
Murrinh Patha Youth. Australian National University PhD dissertation.
Mansfield, John (2015a). ‘Consonant lenition as a sociophonetic variable in Murrinh Patha
(Australia)’, Language Variation and Change 27(2): 203–25. doi:10.1017/
S0954394515000046
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
368
Mansfield, John (2015b). ‘Morphotactic variation, prosodic domains and the changing
structure of the Murrinhpatha verb’, Asia-Pacific Language Variation 1(2): 163–89.
doi:10.1075/aplv.1.2.03man
Mansfield, John (2016). ‘Intersecting formatives and inflectional predictability: How do
speakers and learners predict the correct form of Murrinhpatha verbs?’, Word Structure
9(2): 183–214. doi:10.3366/word.2016.0093
Mansfield, John (2019). Murrinhpatha Morphology and Phonology. Berlin: De Gruyter
Mouton.
Marschner, Ian C. (2011). ‘glm2: Fitting generalized linear models with convergence
problems’, The R Journal 3(2): 12–15.
Marslen-Wilson, William D. (2007). ‘Morphological processes in language comprehen-
sion’, in M. Gareth Gaskell (ed.), The Oxford Handbook of Psycholinguistics. Oxford:
Oxford University Press, 175–93.
Marzi, Claudia, Marcello Ferro, Ouafae Nahli, Patrizia Belik, Stavros Bompolas, and Vito
Pirrelli (2018). ‘Evaluating inflectional complexity crosslinguistically: A processing per-
spective’, in Nicoletta Calzolari (ed.), LREC 2018: Eleventh International Conference on
Language Resources and Evaluation: May 7–12, 2018, Miyazaki, Japan. Paris: European
Language Resources Association ELRA, article n. 745.
Matras, Yaron (1998). ‘Utterance modifiers and universals of grammatical borrowing’,
Linguistics 36: 281–331. doi:10.1515/ling.1998.36.2.281
Matras, Yaron (2009). Language Contact. Cambridge: Cambridge University Press.
Matras, Yaron and Jeanette Sakel (eds) (2007). Grammatical Borrowing in Cross-Linguistic
Perspective. Berlin: Mouton de Gruyter.
Matthews, Peter H. (1972). Inflectional Morphology. Cambridge: Cambridge University
Press.
Matthews, Peter. H. (1991). Morphology. 2nd ed. Cambridge: Cambridge University Press.
McGregor, William (2010). ‘Optional ergative case marking systems in a typological-
semiotic perspective’, Lingua 120: 1610–36. doi:10.1016/j.lingua.2009.05.010
McGregor, William and Jean-Christophe Verstraete (2010). ‘Optional ergative marking
and its implications for linguistic theory’, Lingua 120: 1607–9. doi:10.1016/j.
lingua.2009.05.009
Mc Laughlin, Fiona (1997). ‘Noun classification in Wolof: When affixes are not renewed’,
Studies in African Linguistics 26(1): 1–28.
Mc Laughlin, Fiona (2000). ‘Consonant mutation and reduplication in Seereer-Siin’,
Phonology 17: 333–63. doi:10.1017/S0952675701003955
Mc Laughlin, Fiona (2001). ‘Dakar Wolof and the configuration of an urban identity’,
Journal of African Cultural Studies 14(2): 153–72. doi:10.1080/13696810120107104
McLeod, A. Ian (2011). ‘Package “Kendall”. R package documentation’. URL: https://
cran.r-project.org/web/packages/Kendall/Kendall.pdf
McWhorter, John H. (1994). ‘From focus marker to copula in Swahili’, in Kevin E. Moore,
David Peterson, and Comfort Wentum (eds), Proceedings of the Berkeley Linguistics
Society, Special Session on Historical Issues in African Linguistics. Berkeley, CA: Berkeley
Linguistics Society, 57–66.
McWhorter, John H. (1998). ‘Identifying the creole prototype: Vindicating a typological
claim’, Language 74: 788–818. doi:10.2307/417003
McWhorter, John H. (2001). ‘The world’s simplest grammars are creole grammars’,
Linguistic Typology 5(2–3): 125–66. doi:10.1515/lity.2001.001
McWhorter, John H. (2002). ‘What happened to English?’, Diachronica 19: 217–72.
doi:10.1075/dia.19.2.02wha
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
369
McWhorter, John H. (2005). Defining Creole. New York: Oxford University Press.
McWhorter, John H. (2007). Language Interrupted: Signs of Non-Native Acquisition in
Standard Language Grammars. New York: Oxford University Press.
McWhorter, John H. (2008). ‘Why does a language undress? Strange cases in Indonesia’, in
Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity:
Typology, Contact, Change. Amsterdam: John Benjamins, 167–90.
McWhorter, John H. (2011). Linguistic Simplicity and Complexity: Why Do Languages
Undress? Berlin: Walter de Gruyter.
McWhorter, John H. (2012). ‘Case closed? Testing the Feature Pool Hypothesis’, Journal of
Pidgin and Creole Languages 27: 171–82. doi:10.1075/jpcl.27.1
McWhorter, John H. (2016). ‘Is radical analyticity normal? Implications of Niger-Congo
and Sino-Tibetan for typology and diachronic theory’, in Elly van Gelderen (ed.), Cyclical
Change Continued. Amsterdam: John Benjamins, 49–91. doi:10.1075/la.227.03mcw
McWhorter, John H. (2018). The Creole Debate. Cambridge: Cambridge University Press.
McWhorter, John H. (2019). ‘The radically isolating languages of Flores: A challenge to
diachronic theory’, Journal of Historical Linguistics 9: 177–207. doi:10.1075/jhl.16021.mcw
Meakins, Felicity (2009). ‘The case of the shifty ergative marker: A pragmatic shift in the
ergative marker in one Australian mixed language’, in Jóhanna Barðdal and Shobhana
L. Chelliah (eds), The Role of Semantic, Pragmatic, and Discourse Factors in the
Development of Case. Amsterdam: John Benjamins, 59–91.
Meakins, Felicity (2011). Case Marking in Contact: The Development and Function of Case
Morphology in Gurindji Kriol. Amsterdam: John Benjamins.
Meakins, Felicity (2013). ‘Gurindji Kriol’, in Susanne Maria Michaelis, Philippe Maurer,
Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole
Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia
and the Americas. Oxford: Oxford University Press, 131–9.
Meakins, Felicity (2015). ‘From absolutely optional to only nominally ergative: The life
cycle of the Gurindji Kriol ergative suffix’, in Francesco Gardani, Peter Arkadiev, and
Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 189–218.
Meakins, Felicity, Patrick McConvell, Erika Charola, Norm McNair, Helen McNair, and
Lauren Campbell (2013). Gurindji to English dictionary. Batchelor, Australia: Batchelor
Press.
Meakins, Felicity and Rachel Nordlinger (2014). A Grammar of Bilinarra: An Australian
Aboriginal Language of the Northern Territory. Berlin: Mouton de Gruyter.
Meakins, Felicity and Carmel O’Shannessy (2010). ‘Ordering arguments about: Word order
and discourse motivations in the development and use of the ergative marker in two
Australian mixed languages’, Lingua 120(7): 1693–713. doi:10.1016/j.lingua.2009.05.013
Meakins, Felicity, Xia Hua, Cassandra Algy, and Lindell Bromham (2019). ‘Birth of a
contact language did not favor simplification’, Language 95(2): 294–332. doi:10.1353/
lan.2019.0032
Meeuwis, Michael (2013). ‘Lingala’, in Susanne Maria Michaelis, Philipe Maurer, Martin
Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol.
III: Contact Languages Based on Languages from Africa, Asia, Australia and the
Americas. Oxford: Oxford University Press, 25–33.
Meijer, Guus and Pieter C. Muysken (1977). ‘On the beginnings of pidgin and creole
studies: Schuchardt and Hesseling’, in Albert Valdman (ed.), Pidgin and Creole
Linguistics. Bloomington: Indiana University Press, 21–48.
Mel’čuk, Igor (1994). ‘Suppletion: Toward a logical analysis of the concept’, Studies in
Language 18: 339–410. doi:10.1075/sl.18.2.03mel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
370
371
Moscoso del Prado Martín, Fermín (2011). ‘The mirage of morphological complexity’, in
Laura Carlson, Christoph Hoelscher, and Thomas F. Shipley (eds), Proceedings of the
33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science
Society, 3524–9.
Moscoso del Prado Martín, Fermín, Aleksandar Kostic, and R. Harald Baayen (2004).
‘Putting the bits together: An information-theoretical perspective on morphological
processing’, Cognition 94(1): 1–18.
Mufwene, Salikoko S. (2001). The Ecology of Language Evolution. Cambridge: Cambridge
University Press.
Mufwene, Salikoko S. (2008). Language Evolution: Contact, Competition, and Change.
London: Continuum Press.
Mufwene, Salikoko S. (2009). ‘Restructuring, hybridization, and complexity in language
evolution’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New
Languages. Amsterdam: John Benjamins, 367–400.
Mufwene, Salikoko S., François Pellegrino, and Christophe Coupé (eds) (2017). Complexity
in Language: Developmental and Evolutionary Perspectives. Cambridge: Cambridge
University Press.
Mugdan, Joachim (1994). ‘Morphological units’, in Ron Asher (ed.), The Encyclopedia of
Language and Linguistics. Oxford: Pergamon Press, 2543–53.
Mühlhäusler, Peter (1997). Pidgin and Creole Linguistics. London: University of
Westminster.
Mukarovsky, Hans (1977). A Study of Western Nigritic, vol. I. Wien: Institut für
Ägyptologie und Afrikanistik der Universität Wien.
Müller, Neele (2013). Tense, Aspect, Modality, and Evidential Marking in South American
Indigenous Languages. Utrecht: LOT.
Munro, Pamela and Dieynaba Gaye (1997). Ay Baati Wolof: A Wolof Dictionary. Revised
ed. Los Angeles: Department of Linguistics CLA.
Muysken, Pieter C., Harald Hammarström, Joshua Birchall, Swintha Danielsen, Love
Eriksen, Ana Vilacy Galucio, Rik van Gijn, Simon van de Kerke, Vishnupraya
Kolipakam, Olga Krasnoukhova, Neele Müller, and Loretta O’Connor (2014). ‘The
languages of South America: Deep families, areal relationships, and language contact’,
in Loretta O’Connor and Pieter C. Muysken (eds), The Native Languages of South
America. Cambridge: Cambridge University Press, 299–322.
Myers-Scotton, Carol (2002). Contact Linguistics: Bilingual Encounters and Grammatical
Outcomes. Oxford: Oxford University Press.
Nakagawa, Shinichi and Holger Schielzeth (2013). ‘A general and simple method for
obtaining R2 from generalized linear mixed-effects models’, Methods in Ecology and
Evolution 4(2): 133–42.
Nash, David (1980). Topics in Warlpiri Grammar. Massachusetts Institute of Technology
PhD dissertation.
Ndiaye, Moussa D. (2004). Eléments de morphologie du wolof. Méthodes d’analyse en
linguistique. München: LINCOM Europa.
Nettle, Daniel (2012). ‘Social scale and structural complexity in human languages’,
Philosophical Transactions of the Royal Society B: Biological Sciences 367(1597):
1829–36. doi:10.1098/rstb.2011.0216
Neubauer, Kathleen and Harald Clahsen (2009). ‘Decomposition of inflected words in a
second language: An experimental study of German participles’, Studies in Second
Language Acquisition 31(3): 403–35. doi:10.1017/S0272263109090354
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
372
373
Nordlinger, Rachel and Patrick Caudal (2012). ‘The tense, aspect and modality system in
Murrinh-Patha’, Australian Journal of Linguistics 32(1): 73–112. doi:10.1080/
07268602.2012.657754
Norman, Jerry (1988). Chinese. Cambridge: Cambridge University Press.
Nurse, Derek (2007). ‘Did the proto-Bantu verb have a synthetic or an analytic structure?’,
SOAS Working Papers in Linguistics 15: 239–56.
Nurse, Derek (2008). Tense and Aspect in Bantu. New York: Oxford University Press.
O’Connor, Catherine, Joan Maling, and Barbora Skarabela (2013). ‘Nominal categories and
the expression of possession: A cross-linguistic study of probabilistic tendencies and
categorial constraints’, in Kersti Börjars, David Denison, and Alan Scott (eds),
Morphosyntactic Categories and the Expression of Possession. Amsterdam: John
Benjamins, 89–121.
Olawsky, Knut (2006). A Grammar of Urarina. Berlin: Mouton de Gruyter.
Ospina Bozzi, Ana María (2002). Les structures élémentaires du Yuhup Maku, langue de
l’Amazonie Colombienne: Morphologie et syntaxe. Université Paris 7—Denis Diderot
PhD dissertation.
Öztürk, Balkız and Markus A. Pöchtrager (2011). Pazar Laz. München: LINCOM Europa.
Paauw, Scott (2007). ‘A North Papua linguistic area?’. Paper given at the ‘Workshop on the
Languages of Papua’, Manokwari.
Parker, Jeff (2016). Inflectional Complexity and Cognitive Processing: An Experimental and
Corpus-Based Investigation of Russian Nouns. The Ohio State University PhD
dissertation.
Parker, Jeff, Robert Reynolds, and Andrea D. Sims (to appear). ‘The role of language-
specific network properties in the emergence of inflectional irregularity’, in Andrea
D. Sims, Adam Ussishkin, Jeff Parker, and Samantha Wray (eds), Morphological
Typology and Linguistic Cognition. Cambridge: Cambridge University Press.
Parkvall, Mikael (2008). ‘The simplicity of creoles in cross-linguistic perspective’, in Matti
Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology,
Contact, Change. Amsterdam: John Benjamins, 265–85.
Payne, Doris L. (1990). ‘Morphological characteristics of lowland South American lan-
guages’, in Doris L. Payne (ed.), Amazonian Linguistics: Studies in Lowland South
American Languages. Austin, TX: University of Texas Press, 213–41.
Payne, Doris L. (2007). ‘Source of the Yagua nominal classification system’, International
Journal of American Linguistics 73(4): 447–74. doi:10.1086/523773
Payne, John (2013). ‘The oblique genitive in English’, in Kersti Börjars, David Denison, and
Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession.
Amsterdam: John Benjamins, 178–92.
Payne, Thomas (1997). Describing Morphosyntax. Cambridge: Cambridge University Press.
Perrin, Loïc-Michel (2012). L’expression du temps en wolof—langue atlantique parlée au
Sénégal. Köln: Köppe.
Perrott, D. V. (1950). Teach Yourself Swahili. New York: Random House.
Pienemann, Manfred (1998). Language Processing and Second Language Development:
Processability Theory. Amsterdam: John Benjamins.
Pinheiro, José C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New
York: Springer.
Pinker, Steven and Alan Prince (1988). ‘On language and connectionism: Analysis of a
parallel distributed processing model of language acquisition’, Cognition 28: 73–193.
doi:10.1016/0010-0277(88)90032-7
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
374
375
Roberts, Sarah J. and Joan Bresnan (2008). ‘Retained inflectional morphology in pidgins:
A typological study’, Linguistic Typology 12(2): 269–302. doi:10.1515/LITY.2008.039
Roberts, Seán (2018). ‘Chield: Causal hypotheses in evolutionary linguistics database’, in
Christine Cuskley, Molly Flaherty, Hannah Little, Luke McCrohon, Andrea Ravignani,
and Tessa Verhoef (eds): The Evolution of Language: Proceedings of the 12th
International Conference (EVOLANG12). doi:10.12775/3991-1.099
Robins, R. H. (1958). The Yurok Language: Grammar, Texts, Lexicon. Berkeley, CA:
University of California Press.
Romaine, Suzanne (1988). Pidgin and Creole Languages. London: Longman.
Rottet, Kevin J. (1992). ‘Functional categories and verb movement in Louisiana creole’,
Probus 4: 261–89. doi:10.1515/prbs.1992.4.3.261
Russell, Kevin (1999). ‘What’s with all these long words anyway?’, in Leora Bar-El, Rose-
Marie Dechaine, and Charlotte Reinholtz (eds), Papers from the Workshop on Structure
and Constituency in Native American Languages. Cambridge, MA: The MIT Press,
119–30.
Sadock, Jerrold (2017). ‘The subjectivity of the notion of polysynthesis’, in Michael
Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of
Polysynthesis. Oxford: Oxford University Press, 99–114.
Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport (1996). ‘Statistical learning by 8-
month infants’, Science 274(5294): 1926–8. doi:10.1126/science.274.5294.1926
Sagot, Benoît and Géraldine Walther (2011). ‘Non-canonical inflection: Data, formalisation
and complexity measures’, in Cerstin Mahlow and Michael Piotrowski (eds), Systems and
Frameworks for Computational Morphology. Berlin: Springer, 23–45. doi:10.1007/978-3-
642-23138-4_3
Samara, Anna, Kenny Smith, Helen Brown, and Elizabeth Wonnacott (2017). ‘Acquiring
variation in an artificial language: Children and adults are sensitive to socially condi-
tioned linguistic variation’, Cognitive Psychology 94: 85–114. doi:10.1016/j.
cogpsych.2017.02.004
Sampson, Geoffrey, David Gil, and Peter Trudgill (eds) (2009). Language Complexity as an
Evolving Variable. Oxford: Oxford University Press.
Sapir, Edward (1921). Language: An Introduction to the Study of Speech. New York:
Harcourt, Brace & Co.
Sapir, J. David (1965). A Grammar of Diola–Fogny, a Language Spoken in the Basse-
Casamance Region of Senegal. Cambridge: Cambridge University Press.
Sapir, J. David (1971). ‘West Atlantic: An inventory of the languages, their noun class
systems and consonant alternation’, in Thomas Sebeok (ed.), Current Trends in
Linguistics, vol. VII: Linguistics in Sub-Saharan Africa. The Hague: Mouton, 44–112.
Sauvageot, Serge (1965). Description synchronique d’un dialecte Wolof. Le parler du Dyolof.
Dakar: Institut Français de l’Afrique Noire.
Sauvageot, Serge (1967). ‘Note sur la classification nominale en baïnouk’, in Gabriel
Manessy (ed.), La classification nominale dans les langues négro-africaines. Paris:
CNRS, 225–36.
Scalise, Sergio (1984). Morfologia lessicale. Padova: CLESP.
Schiering, René, Balthasar Bickel, and Kristine Hildebrandt (2010). ‘The prosodic word is
not universal, but emergent’, Journal of Linguistics 46: 657–710. doi:10.1017/
S0022226710000216
Schlegel, Friedrich von (1808). Über die Sprache und Weisheit der Indier. Ein Beitrag zur
Begründung der Alterthumskunde. Heidelberg: Mohr & Zimmer.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
376
Schreuder, Robert and R. Harald Baayen (1997). ‘How simplex complex words can be’,
Journal of Memory and Language 37: 118–39. doi:10.1006/jmla.1997.2510
Schwegler, Armin (2013). ‘Palenquero structure dataset’, in Susanne Maria Michaelis,
Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), Atlas of Pidgin and
Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary
Anthropology. URL: http://apics-online.info/contributions/48
Segerer, Guillaume (2010). ‘Isolates in Atlantic’. Paper given at the workshop ‘Language
Isolates in Africa’, 4 December, Lyon.
Seifart, Frank (2005). The Structure and Use of Shape-Based Noun Classes in Miraña (North
West Amazon). Universiteit Nijmegen PhD dissertation.
Seifart, Frank (2011). Bora Loans in Resígaro: Massive Morphological and Little Lexical
Borrowing in a Moribund Arawakan Language. Cadernos de Etnolingüística, Série
Monografias 2 [online publisher].
Seifart, Frank and Doris Payne (2007). ‘Nominal classification in the Northwest Amazon:
Issues in areal diffusion and typological characterization’, International Journal of
American Linguistics 73(4): 381–7. doi:10.1086/523770
Seuren, Pieter (1990). ‘Verb syncopation and predicate raising in Mauritian Creole’,
Theoretical Linguistics 1(13): 804–44. doi:10.1515/ling.1990.28.4.809
Seuren, Pieter (1998). Western Linguistics: An Historical Introduction. Oxford: Blackwell.
Seuren, Pieter and Herman Wekker (1986). ‘Semantic transparency as a factor in creole
genesis’, in Pieter Muysken and Norval Smith (eds), Substrata versus Universals in Creole
Genesis. Amsterdam: John Benjamins, 57–70.
Shalizi, Cosma Rohilla (2001). ‘Causal architecture, complexity and self-organization in the
time series and cellular automata’. University of Wisconsin-Madison PhD dissertation.
Shannon, Claude E. (1948). ‘A mathematical theory of communication’, Bell System
Technical Journal 27(3): 379–423.
Shosted, Ryan (2006). ‘Correlating complexity: A typological approach’, Linguistic Typology
10(1): 1–40. doi:10.1515/LINGTY.2006.001
Silva, Wilson de Lima (2012). A Descriptive Grammar of Desano. University of Utah PhD
dissertation.
Sims, Andrea D. (2015). Inflectional Defectiveness. Cambridge: Cambridge University Press.
Sims, Andrea D. and Jeff Parker (2016). ‘How inflection class systems work: On the
informativity of implicative structure’, Word Structure 9(2): 215–39. doi:10.3366/
word.2016.0094
Sinnemäki, Kaius (2008). ‘Complexity trade-offs in core argument marking’, in Matti
Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology,
Contact, Change. Amsterdam: John Benjamins, 67–88.
Sinnemäki, Kaius (2011). Language Universals and Linguistic Complexity: Three Case
Studies in Core Argument Marking. University of Helsinki PhD dissertation.
Sinnemäki, Kaius (2014). ‘Global optimization and complexity trade-offs’, Poznań Studies
in Contemporary Linguistics 50(2): 179–95. doi: 10.1515/psicl-2014-0013
Smith, Kenny, Amy Perfors, Olga Fehér, Anna Samara, Kate Swoboda, and Elizabeth
Wonnacott (2017). ‘Language learning, language use and the evolution of linguistic
variation’, Philosophical Transactions of the Royal Society B 372(1711): 20160051.
doi:10.1098/rstb.2016.0051
Smith, Kenny and Elizabeth Wonnacott (2010). ‘Eliminating unpredictable variation
through iterated learning’, Cognition 116(3): 444–9. doi:10.1016/j.cognition.2010.06.004
Soubrier, Aude (2013). Description de l’ikposso uwi. Lyon: Université Lumière Lyon 2
dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
377
Spencer, Andrew and Ana R. Luís (2012). Clitics: An Introduction. Cambridge: Cambridge
University Press.
Stahlke, Herbert (1970). ‘Serial verbs’, Studies in African Linguistics 1: 60–99.
Štekauer, Pavol (2015). ‘The delimitation of derivation and inflection’, in Peter O. Müller,
Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation: An
International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter
Mouton, 218–35.
Stenzel, Kristine (2008). ‘Evidentials and clause modality in Wanano’, Studies in Language
32(2): 405–45. doi:10.1075/sl.32.2.06ste
Stenzel, Kristine (2013a). A Reference Grammar of Kotiria (Wanano). Lincoln, NE:
University of Nebraska Press.
Stenzel, Kristine (2013b). ‘Contact and innovation in Vaupés possession-marking strat-
egies’, in Patience Epps and Kristine Stenzel (eds), Cultural and Linguistic Interaction in
the Upper Rio Negro Region. Rio de Janeiro: Museu do Índio-FUNAI, 353–402.
Stenzel, Kristine and Elsa Gomez-Imbert (2009). ‘Contato linguístico e mudança linguística
no noroeste amazônico: O caso do Kotiria (Wanano)’, Revista da ABRALIN 8: 71–100.
Stewart, William Alexander and William W. Gage (1970). Notes on Wolof Grammar by
William A. Stewart. Adapted by William W. Gage, in Dakar Wolof: A Basic Course
prepared by Loren V. Nussbaum, William W. Gage, and Daniel Varre. Washington, DC:
Center for Applied Linguistics, 355–412.
Stilo, Donald (2019). ‘Loss vs. expansion of gender in Tatic languages: Kafteji (Kabatei) and
Kelāsi’, in Alireza Korangy and Behrooz Mahmoodi-Bakhtiari (eds), Essays on Typology
of Iranian Languages. Berlin: De Gruyter Mouton, 34–78. doi:10.1515/9783110604443-004
Stoll, Sabine, Balthasar Bickel, and Jekaterina Mažara (2017). ‘The acquisition of polysyn-
thetic verb forms in Chintang’, in Michael Fortescue, Marianne Mithun, and Nicholas
Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press,
495–514.
Stolz, Thomas (2012). ‘Survival in a niche: On gender-copy in Chamorro (and sundry
languages)’, in Martine Vanhove, Thomas Stolz, Aina Urdze, and Hitomi Otsuka (eds),
Morphologies in Contact. Berlin: Akademie-Verlag, 93–140.
Stolz, Thomas (2015). ‘Adjective-noun agreement in language contact’, in Francesco
Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin:
Mouton de Gruyter, 269–301.
Street, Chester (1987). An Introduction to the Language and Culture of the Murrinh-Patha.
Darwin: Summer Institute of Linguistics.
Stump, Gregory (2001). Inflectional Morphology: A Theory of Paradigm Structure.
Cambridge: Cambridge University Press.
Stump, Gregory (2006a). ‘Heteroclisis and paradigm linkage’, Language 82(2): 279–322.
doi:10.1353/lan.2006.0110
Stump, Gregory (2006b). ‘Template morphology’, in Keith Brown (ed.), Encyclopedia of
Language & Linguistics. 2nd ed. Oxford: Elsevier, 559–63.
Stump, Gregory (2016). Inflectional Paradigms: Content and Form at the Syntax-
Morphology Interface. Cambridge: Cambridge University Press.
Stump, Gregory (2017). ‘The nature and dimensions of complexity in morphology’. Annual
Review of Linguistics 3(1): 65–83. doi:10.1146/annurev-linguistics-011415-040752
Stump, Gregory and Raphael A. Finkel (2013). Morphological Typology: From Word to
Paradigm. Cambridge: Cambridge University Press.
Stump, Gregory and Raphael A. Finkel (2015). ‘Contrasting modes of representation for
inflectional systems: Some implications for computing morphological complexity’, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
378
Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and
Measuring Morphological Complexity. Oxford: Oxford University Press, 119–40.
Syea, Anand (1992). ‘The short and long forms of verbs in Mauritian Creole: Functionalism
versus formalism’, Theoretical Linguistics 18: 61–97. doi:10.1515/thli.1992.18.1.61
Sylla, Yero (1982). Grammaire moderne du Pulaar. Dakar: Nouvelles éditions africaines.
Szmrecsanyi, Benedikt and Bernd Kortmann (2009). ‘The morphosyntax of varieties of
English worldwide: A quantitative perspective’, Lingua 119(11): 1643–63. doi:10.1016/j.
lingua.2007.09.016
Taft, Marcus (1979). ‘Recognition of affixed words and the word frequency effect’, Memory
& Cognition 7(4): 263–72. doi:10.3758/BF03197599
Taft, Marcus (2004). ‘Morphological decomposition and the reverse base frequency effect’,
The Quarterly Journal of Experimental Psychology 57(4): 745–65. doi:10.1080/
02724980343000477
Taft, Marcus and Sam Ardasinski (2006). ‘Obligatory decomposition in reading prefixed
words’, The Mental Lexicon 1(2): 183–99. doi:10.1075/ml.1.2.02taf
Tallman, Adam (2018). A Grammar of Chácobo, a Southern Pano Language of the Northern
Bolivian Amazon. University of Texas at Austin PhD dissertation.
Tamba, Khady, Harold Torrence, and Malte Zimmermann (2012). ‘Wolof quantifiers’, in
Edward Keenan and Denis Paperno (eds), Handbook of Quantification in Natural
Language. New York: Springer, 891–939.
Thiam, Ndiassé (1987). Les categories nominales en wolof. Aspects sémantiques. Dakar:
Centre de linguistique appliquée de Dakar.
Thomason, Sarah G. (2001). Language Contact: An Introduction. Washington, DC:
Georgetown University Press.
Thomason, Sarah G. (2008). ‘Pidgins/creoles and historical linguistics’, in Silvia
Kouwenberg and John Victor Singler (eds), Handbook of Pidgin and Creole Languages.
Malden, MA: Wiley-Blackwell, 242–62.
Thomason, Sarah G. (2015). ‘When is the diffusion of inflectional morphology not dis-
preferred?’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed
Morphology. Berlin: Mouton de Gruyter, 27–46.
Thomason, Sarah G. and Terence Kaufman (1988). Language Contact, Creolization, and
Genetic Linguistics. Berkeley, CA: University of California Press.
Thomaz, Luis Felípe (2002). Babel Loro Sa’e: O problema linguístico de Timor-Leste. Lisboa:
Instituto Camões.
Thornton, Anna M. (2005). Morfologia. Roma: Carocci.
Thornton, Anna M. (2011). ‘Overabundance (multiple forms realizing the same cell):
A non-canonical phenomenon in Italian verb morphology’, in Martin Maiden, John
C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Morphological Autonomy:
Perspectives from Romance Inflectional Morphology. Oxford: Oxford University Press,
359–82.
Thornton, Anna M. (2019). ‘Overabundance: A canonical typology’, in Franz Rainer,
Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds),
Competition in Inflection and Word-Formation. Cham: Springer, 223–58. doi:10.1007/
978-3-030-02550-2_9
Tily, Harry and T. Florian Jaeger (2011). ‘Complementing quantitative typology with
behavioral approaches: Evidence for typological universals’, Linguistic Typology 15(2):
497–508. doi:10.1515/LITY.2011.033
Timberlake, Alan (2004). A Reference Grammar of Russian. Cambridge: Cambridge
University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
379
380
van der Voort, Hein (2005). ‘Kwaza in comparative perspective’, International Journal of
American Linguistics 71: 365–412. doi:10.1086/501245
van der Voort, Hein (2016). ‘Recursive inflection and grammaticalized fictive interaction in
the Southwestern Amazon’, in Esther Pascual and Sergeiy Sandler (eds), The
Conversation Frame: Forms and Functions of Fictive Interaction. Amsterdam: John
Benjamins, 277–302.
Van Engelenhoven, Aone (2004). Leti, a Language of Southwest Maluku. Leiden: KITLV
Press.
van Gijn, Rik and Fernando Zúñiga (2014). ‘Word and the Americanist perspective’,
Morphology 24: 135–60. doi:10.5167/uzh-99717
Vanhove, Martine (2001). ‘Contacts de langues et complexification des systèmes: Le cas du
maltais’, Faits de Langues 18: 65–74.
Veenstra, Tonjes (2009). ‘Verb allomorphy and the syntax of phases’, in Enoch Aboh and
Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins,
99–114.
Veenstra, Tonjes and Angelika Becker (2003). ‘The survival of inflectional morphology in
French-related creoles’, Studies in Second Language Acquisition 25: 285–306.
doi:10.1017/S0272263103000123
Villoing, Florence and Maxime Deglas (2016). ‘La formation de verbes dénominaux en
guadeloupéen. La part de l’héritage et de l’innovation’, 5ème Congrès Mondial de
Linguistique Française 2016, Tours, France. doi:10.1051/shsconf/20162708004
Wälchli, Bernhard (2017). ‘The incomplete story of feminine gender loss in Northwestern
Latvian dialects’, Baltic Linguistics 8: 143–214.
Wälchli, Bernhard (2018). ‘The rise of gender in Nalca (Mek, Tanah Papua): The drift
towards the canonical gender attractor’, in Sebastian Fedden, Jenny Audring, and
Greville Corbett (eds), Non-Canonical Gender Systems. Oxford: Oxford University
Press, 68–99.
Walsh, Michael (1976). The Murinypata Language of North-West Australia. Australian
National University PhD dissertation.
Walther, Géraldine (2017). ‘Paradigm realisation and the lexicon’, in Ferenc Kiefer, James
P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Organization: Data and
Analyses. Leiden: Brill, 159–99.
Weinreich, Uriel, William Labov, and Marvin Herzog (1968). ‘Empirical foundations for a
theory of language change’, in Winfred Philip Lehmann and Yakov Malkiel (eds),
Directions for Historical Linguistics. Austin, TX: University of Texas Press, 95–198.
Wells, Rulon (1954). ‘Archiving and language typology’, International Journal of American
Linguistics 20(2): 101–7.
Wichmann, Søren and Eric W. Holman (2009). Temporal Stability of Linguistic Typological
Features. München: LINCOM Europa.
Wilson, William André Auquier (1989). ‘Atlantic’, in John Theodore Bendor-Samuel (ed.),
The Niger-Congo Languages: A Classification and Description of Africa’s Largest
Language Family. Lanham, MD: University Press of America, by arrangement with the
Summer Institute of Linguistics (SIL), 81–104.
Wilson, William André Auquier (2007). Guinea Languages of the Atlantic Group. Frankfurt
am Main: Peter Lang.
Wise, Mary Ruth (1971). Identification of Participants in Discourse: A Study of Aspects of
Form and Meaning in Nomatsiguenga. Norman, OK: Summer Institute of Linguistics of
the University of Oklahoma.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
381
Language Index
English 13, 18, 26, 56, 81, 84, 85, 87, 108, Haitian Creole 16, 106, 113, 114, 117–18, 120,
110, 125, 163, 166, 170, 171, 196, 208, 131–5, 270, 272, 279–80
213, 225, 267, 271, 274, 276, 277, Haitian Creole English 279
279–80, 303, 310–11, 316, 320, 326, Haro 189
332, 336 Hinuq 180, 182–3, 189
African-Amercian Vernacular 279 Hopi 180, 184–5, 191
Middle 85 Huallaga Quechua 191
Old 52, 74, 271, 274 Hungarian 84, 189
Eshtehardi 220, 227 Hunzib 180, 183, 189
Eskimo-Aleut languages 190, 248, 254 Hup 232, 238, 240, 242–4, 246, 248, 250, 254–5,
Even 190 258–9, 260–1, 263
Evenki 190 Hupa 190
Subject Index
abstractive (models, frameworks, biuniqueness 9, 54, 164, 230, 234, 247, 253, 254,
perspectives) 326–7 262, 341–2
acquisition 12, 13–14, 17, 53, 57, 75, 288, 303, borrowing 12, 16, 127, 160, 194, 205, 209, 212,
311, 326–7 215, 222, 233, 238–9, 246, 273
first language, L1 13, 61, 323–5 bound status 235, 248, 256, 257–8, 262, 263
native, see acquisition, first language, L1
non-native, see acquisition, second language, canonical typology 108, 340–1
L2, adult canonicality 163–92
second language, L2, adult 17, 111–12, 114, canonicity 9, 10, 16, 24, 163–4, 236, 238, 340–1
267–82, 286, 326 case 2–3, 82–3, 87–90, 163, 166, 171–2, 174, 175,
actualization 92, 96, 101 184, 246, 272–3, 274, 286, 343
adult acquisition, see acquisition, second Caucasus 171, 176, 177, 178, 180, 181, 182,
language, L2, adult 183, 184
agent 90 Chaco region 238, 239
agglutinative, agglutinating morphology 3, Circum-Baltic area 176
137, 141, 141–2, 143, 144–8, 158, class prefixation 151
234, 255 classifier stem 53, 59–75
agreement 173–4, 193–228, 236, 287, 288, classifiers 61–3, 167–8, 169, 236–9
291–8, 303 numeral 270, 277
default 139 closed classes 52, 53, 59, 61, 66, 68, 71, 75
redistribution of 200–4 co-exponence 71, 171, 184
subject-verb 284 complexification 16, 82, 83, 85, 88, 89, 103, 109,
agreement targets 151–8 111, 136–60, 183, 194, 285
algorithmic information content 331 complexity:
alignment 88 absolute (absolutive) 8, 24, 31, 106, 136, 195,
allomorphy 3, 7, 8, 9, 54–6, 57, 58, 59, 61–6, 306, 337
68–70, 72, 75, 89, 110, 148, 149, 170, 172–3, agent-related 306, 337
188, 230, 234, 247, 251, 252–3, 255, 261, canonical 163–92, 334, 340–2
317, 326, 327 compositional 335–7
Amazonian languages 17, 167, 230–63 constitutional 8–9, 141, 335
analogy 16, 26, 27, 52–4, 57, 61, 67, 70, 71–4, corpus 306
75, 326 descriptive 9, 14, 151, 163–4, 195, 204, 217,
analyticity 17, 110, 267–82 332, 335, 339, 340
Andean languages 231, 246 effective 6, 306
animacy 38, 39, 85, 90, 91, 92, 95, 96, 172, 174, enumerative (E-complexity) 8–9, 11, 24, 32,
197, 199, 201–4, 205, 213, 214, 217, 218, 56, 82, 85, 89, 102, 103, 106, 112, 163, 175,
219, 238 233, 334, 335, 336–7
argument relations 88, 90, 93, 102, 103 exponence 233, 234, 247, 251–5, 335
autonomous (or pure) morphology 6–7, 18, 24, formal 8, 13–14
119, 147, 230–1, 235, 247–51, 255, 256–62 generative 9, 151
auxiliary 101 integrative (I-complexity) 11, 12–13, 16, 24–5,
average conditional entropy, see entropy 27, 32, 56, 57, 59, 62, 65–6, 71, 75, 82, 85, 89,
103, 106–7, 108, 112–13, 122, 135, 233, 334,
bias amplification 304 335, 337–40, 343
bilingualism 193, 210, 211, 214, 215, 220, 222, inventory (IC) 163, 334–6
307, 308, 311 Kolmogorov 9, 163, 172, 185, 306, 331, 341
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi