0% found this document useful (0 votes)

120 views411 pages

The Complexities of Morphology

Uploaded by

alleg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views411 pages

The Complexities of Morphology

Uploaded by

alleg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 411

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Complexities of Morphology

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Complexities of
Morphology
Edited by
PETER ARKADIEV
and
FRANCESCO GARDANI

1
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© editorial matter and organization Peter Arkadiev and Francesco Gardani 2020
© the chapters their several authors 2020
The moral rights of the authors have been asserted
First Edition published in 2020
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2020932944
ISBN 978–0–19–886128–7
Printed and bound in Great Britain by
Clays Ltd, Elcograf S.p.A.
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

Contents

List of Figures and Tables vii

List of Abbreviations xi
The Contributors xvii
1. Introduction: Complexities in morphology 1
Peter Arkadiev and Francesco Gardani

I. THE LANGUAGE-SPECIFIC PERSPECTIVE

2. Irregularity, paradigmatic layers, and the complexity of
inflection class systems: A study of Russian nouns 23
Jeff Parker and Andrea D. Sims
3. Demorphologization and deepening complexity in Murrinhpatha 52
John Mansfield and Rachel Nordlinger
4. Overabundance resulting from language contact: Complex
cell-mates in Gurindji Kriol 81
Felicity Meakins and Sasha Wilmoth
5. Derivation and the morphological complexity of three
French-based creoles 105
Fabiola Henri, Gregory Stump, and Delphine Tribout
6. Simplification and complexification in Wolof noun
morphology and morphosyntax 136
Michele Loporcaro

II. THE CROSSLINGUISTIC PERSPECTIVE

7. Canonical complexity 163
Johanna Nichols
8. The complexity of grammatical gender and language ecology 193
Francesca Di Garbo
9. Morphological complexity, autonomy, and areality in
western Amazonia 230
Adam J. R. Tallman and Patience Epps
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

vi 

III. THE ACQUISITIONAL PERSPECTIVE

10. Radical analyticity as a diagnostic of adult acquisition 267
John H. McWhorter
11. Different trajectories of morphological overspeciﬁcation and
irregularity under imperfect language learning 283
Aleksandrs Berdicevskis and Arturs Semenuks
12. Where is morphological complexity? 306
Marianne Mithun

IV. DISCUSSION
13. Morphological complexity and the minimum description
length approach 331
Östen Dahl

References 345
Language Index 383
Subject Index 387
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

List of Figures and Tables

Figures
2.1. Word types per inflection class across different granularities 43
2.2. Complexity measures across granularities of Russian nouns 44
2.3. Conditional entropy of real and a hundred Monte Carlo simulations of
Russian nouns across granularities 45
2.4. Effect of the irregularity of each layer on system complexity (entropy
difference) 48
3.1. Ackerman & Malouf (2015) mechanism for predicting unknown
inflectional forms 58
4.1. Traditional languages and Aboriginal communities of the Victoria River
District 87
4.2. Fixed and random effects used to measure the use vs. non-use of subject
marking in Gurindji Kriol 92
5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a
particular derivational relation R 108
5.2. Degrees of complexity in the restrictedness of stem X in the morphology of
lexeme L, where X serves as L’s base stem in a particular derivational
relation 109
7.1. Mean CC 1 standard deviation for three areal breakdowns and selected
families 177
7.2. Complexity x longitude 179
7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three
complexity counts 181
8.1. The language sample 198
8.2. Patterns of change in the language sample 207
9.1. Western Amazonian languages sampled 249
9.2. Kernel distribution of densities across the languages of this study 255
11.1. The meaning space of the experimental languages with the corresponding
sentences from an example generation 0 language 289
11.2. A schematic representation of the chains in the normal (a), temporarily
interrupted (b), and permanently interrupted (c) conditions 290
11.3. Change of the overspecification of agreement, as measured by
expressibility, over time 294
11.4. Relative frequency of the agreement marker which denoted the round
animal in the initial language of the chain 295
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

viii     

11.5. Change of irregularity, as measured by Shannon entropy, over generations 298

11.6. Change of overspeciﬁcation and irregularity in verbal agreement over
generations in individual chains 299
11.7. Learnability as a function of irregularity 302
12.1. Mohawk verb template 317

Tables
1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’ 3
1.2. Sample paradigms of Lithuanian nouns 4
2.1. An example of morphosyntactically conditioned stress alternation in
Russian nouns 30
2.2. Illustration of the four-class system, based on inflectional suffixes 35
2.3. Illustration of stress classes of Russian nouns 37
2.4. Number of nominal inflection classes of Russian nouns as a function of
which paradigmatic layers are included 42
3.1. Warlpiri verb inflection classes 55
3.2. Examples of inflected classifier forms 62
3.3. Examples of classifier forms and their formative analyses 63
3.4. Inflectional exponence of na ‘(27)’ 64
3.5. Variably inflected classifier stem forms 68
3.6. Allomorphs selected by Ackerman & Malouf (2015) simplification
mechanism 69
3.7. Exponence probabilities of older and newer forms 70
3.8. Classifier stem paradigm for ma ‘(34)’ 73
3.9. Classifier stem paradigm for ɾa ‘(28)’ 74
4.1. Allomorphic reduction in subject marking in Gurindji Kriol 89
4.2. Comparison of case systems and allomorphy across three generations 89
4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according
to predictors 94
4.4. Output of generalized linear mixed model analysis on 3,575 tokens 95
4.5. Relative effect of the significant predictors according to dominance analysis 97
4.6. Occurrence of subject marking in child Gurindji Kriol speakers according
to predictors 98
4.7. Output of generalized linear mixed model analysis on 2,975 tokens 99
4.8. Relative effect of the significant predictors according to dominance analysis 100
5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013) 115
5.2. Comparison of  and .3 forms in French with long and short
forms in Mauritian 117
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     ix

5.3. Sample comparison of long and short forms in four French-based creoles 117
5.4. Stem space of  ‘to form’,  ‘to finish’, and ́ ‘to defend’ 119
5.5. Verb alternations in Mauritian 121
5.6. Reduplication in Mauritian 123
5.7. Deverbal nominalizations in Mauritian 124
5.8. Verb alternations in Guadeloupean 125
5.9. Deverbal nominalizations in Guadeloupean 129
5.10. Verb alternations in Haitian 132
5.11. Deverbal nominalizations in Haitian 133
5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean,
and Haitian 135
7.1. Gender unpredictability for some example languages 171
7.2. Areal and family breakdown 176
7.3. Complexity values for four historical groups of languages 180
8.1. Third person pronouns in standard Swedish 201
8.2. Clustering of patterns of change at language-family edges within Eurasia 208
8.3. Direction of change and asymmetries in the structure of the population
and/or prestige dynamics 213
9.1. Anderson’s (2015a) schematization of morphological complexity 233
9.2. Similar classifier forms in Guaporé-Mamoré languages (van der Voort
2005: 397) 239
9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593) 243
9.4. Number of morphemes coded in this study by language and functional
domain 250
9.5. Number of allomorphs per morpheme attested across the sample 251
9.6. Percentage of morphemes for each EC value across the languages sampled 254
9.7. Rank correlations between EC level and bound status values across
languages 258
9.8. Rank correlations between EC level and contiguity value across languages 259
9.9. Rank correlations between EC level and prosodic dependence across
languages 261
10.1. Wolof noun class markers 273
11.1. An example of a final language with a fully preserved agreement system 292
11.2. An example of a language with a fully lost agreement system 292
11.3. A language with a fully lost agreement system 296
11.4. A language with an irregular distribution of the agreement markers 297
13.1. Hypothetical noun inflection templates 339
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

List of Abbreviations

1 first person
2 second person
3 third person
A most agent-like or experiencer-like argument of transitive; A-class verb
 ablative
 abilitative
 absolutive
 accusative
ACLA Aboriginal Child Language (project)
 grammatical agent
 animate
. anaphoric pronoun
 antipassive
 appositional mood
 applicative
. ‘article of noun’
 aspect
 associative
 augmentative
 auxiliary
BGW Bininj Gun-Wok; Gunwingguan, northern Australia
8 noun class 8 plural
 causative
CAY Central Alaskan Yup’ik
CC canonical complexity
 cislocative
 classifier; class marker
 completive
 contrast
 comitative
 connector
 conditional
 continuative
 contrastive
 copula
 direct case marker
 declarative
 definite; default (in Mansfield and Nordlinger, Chapter 3)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xii   

 demonstrative
 desiderative
 determiner
 indexical marker
 different event
 diminutive
. direct experience evidential
 discourse marker
 discontinuitive
 dual
 duplicative
 dynamic
 E-class verb
APL applicative
EC enumerative complexity; exponence complexity
E-complexity enumerative complexity
ELAP Endangered Languages Project
 ergative
 evidential
 eyewitness
 feminine
 factual
 focus
Fr. French
 vowel frontness
 frustrative
 future
G more goal-like argument of ditransitive
 geminate
 genitive
GLMM Generalized Linear Mixed Models
GYN Gbe languages, Yoruba, and Nupe
 habitual (aktionsart)
 high vowel height
 hearsay
IALL iterated artiﬁcial language learning
I-complexity Integrative complexity
IC inﬂectional class; inventory complexity
IE Indo-European
 intransitive inanimate verb
 immediate
 imperative
 imperfective
 inanimate
 inchoative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

   xiii

 indicative
 indeﬁnite
 inﬁnitive
 intentional
 intransitive (subject orientation)
 interactional
 irrealis
 joint agency
L lexeme
 long form
 linking particle
 linker
 locative
 low vowel height
 masculine
MDL Minimum Description Length
 middle
 middle marker
 neuter
N noun
NC noun class
 negation
 non-feminine
 non-future
 nominative
 non-eyewitness evidential
NP noun phrase
 non-past
 non-singular
 nonvisual
 object; object of monotransitive
 object
 oblique
 optative
2 second position
 passive
 grammatical patient
 paucal
PCFP Paradigm Cell Filling Problem
 perfective
 peripheral
 plural
P.N. proper name
 potential
POS parts of speech
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xiv   

 possessive
Poss possessor
 process verbalization
 present
 pronoun
 progressive
 proprietive
 prothetic vowel
 presentational
 partitive
 proximate
 past
 past irrealis
 realis
 recent
 reciprocal
 reduplication
 referential focus
 relative
 remote
 respect
 reﬂexive
 reportative
 ɾ-alternation
S subject; sole argument of intransitive
 same event
SD standard deviation
 sequential
 short form
 singular
 simultaneous
 semelfactive
 same subject
 stative (aktionsart)
 strong form
 suppletive
SV subject-verb
T more theme-like argument of ditransitive
 transitive animate verb
TAM tense/aspect/mood
 topic advancing voice
 temporal
 thematic sufﬁx
 topic
 transitive
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

   xv

 translocative
UG Universal Grammar
V verb
 venitive
 locative verbalization
VN verb-noun
VS verb-subject
 weak form
Y/N yes/no
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Contributors

Peter Arkadiev holds a PhD in theoretical, typological, and comparative linguistics from
the Russian State University for the Humanities and a habilitation degree from the Russian
Academy of Sciences. Currently he is Senior Researcher at the Institute of Slavic Studies of
the Russian Academy of Sciences and Assistant Professor at the Russian State University
for the Humanities. His fields of interest include language typology and areal linguistics,
morphology, case and alignment systems, tense-aspect, Baltic and Northwest Caucasian
languages. He has co-edited Contemporary Approaches to Baltic Linguistics (with Axel
Holvoet and Björn Wiemer) and Borrowed Morphology (with Francesco Gardani and Nino
Amiridze, both published by De Gruyter Mouton in 2015).
Aleksandrs Berdicevskis is a researcher in computational linguistics at the University of
Gothenburg, Sweden. At the time of writing he was Assistant Professor at Uppsala
University. He has worked on experimental and quantitative approaches to language
change and evolution with a focus on Slavonic languages. He has also participated in the
development of TOROT (Tromsø Old Russian and Old Church Slavonic Treebank) and
related resources. In his PhD dissertation (University of Bergen) he investigated linguistic
innovations in Russian computer-mediated communication.

Östen Dahl is Professor Emeritus of General Linguistics at Stockholm University, Sweden.

He got his academic training at the universities of Gothenburg, Uppsala, and Leningrad
(St. Petersburg) and was active at the University of Gothenburg for ten years before
moving to Stockholm in 1980. In recent years, his research has mainly been typologically
oriented with a strong interest in diachronic approaches to grammar. He has published the
monographs Tense and Aspect Systems (1985), The Growth and Maintenance of Linguistic
Complexity (2004), and Grammaticalization in the North: Noun phrase morphosyntax in
Scandinavian vernaculars (2015).
Francesca Di Garbo is currently affiliated to the University of Helsinki as Postdoctoral
Research Fellow and member of the GramAdapt team, an ERC-funded project (ID: 805371)
investigating mechanisms of adaptation of language structures to social structures. Her
research interests include diachronic and synchronic typology, nominal classification,
number systems, evaluative morphology, linguistic complexity, sociolinguistic typology, and
African languages.
Patience Epps is Professor of Linguistics at the University of Texas at Austin. Her research
focuses on indigenous Amazonian languages, particularly the Naduhupan language family
of the northwest Amazon. Her work engages with language description and documenta-
tion, linguistic typology, language contact and language change, and Amazonian
prehistory. Major publications include the monograph A Grammar of Hup (De Gruyter
Mouton, 2008).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xviii  

Francesco Gardani is Professor of Romance Linguistics at the University of Zurich,

Switzerland. His research cuts across the fields of Romance and theoretical linguistics and
focuses on morphology, language contact, and linguistic typology. He is the author of
Borrowing of Inflectional Morphemes in Language Contact (2008) and Dynamics of
Morphological Productivity: The evolution of noun classes from Latin to Italian (2013) and
the co-Editor-in-Chief of the Oxford Encyclopedia of Romance Linguistics.
Fabiola Henri is Assistant Professor at the University of Kentucky and an affiliate of the
CNRS research centre, Laboratoire de Linguistique Formelle. Her recent research focuses
on the structure and complexity of morphology in creole languages. Other strands of her
research relate to creole genesis, morphology, and its interfaces, and creole syntax, among
other topics. She is the co-editor of a recent monograph Negation and Negative Concord:
The view from Creoles.
Michele Loporcaro is Full Professor of Romance Linguistics at the University of Zurich, a
Fellow of Academia Europaea and the Austrian Academy of Sciences. His research focuses
on the phonology, morphology, syntax, and lexicon of the Romance languages in
synchrony and diachrony; dialectology; linguistic historiography. He is the author of over
200 articles and seven monographs, two of which with OUP: Vowel Length from Latin to
Romance 2015; Gender from Latin to Romance 2018 (shortlisted for the Prose Awards of
the Association of American Publishers). In 2012 he received the Feltrinelli prize of the
Accademia dei Lincei.
John Mansfield is Lecturer in Linguistics at the University of Melbourne. His research
explores the typology of morphological complexity, with a particular focus on processes of
variation and change. Other strands of his research address aspects of morphological
theory, prosodic phonology, and sociolinguistics, especially with respect to the Aboriginal
languages of northern Australia.
John H. McWhorter is Associate Professor of English and Comparative Literature at
Columbia University, New York City. He specializes in language change and language
contact, in particular the development of creoles, pidgins, koines, ‘vehicular’ languages,
and non-standard dialects. Professor McWhorter is author of more than a dozen books
including Defining Creole (2005), Language Interrupted (2007), Linguistic Simplicity and
Complexity (2011), The Language Hoax (2014), Talking Back, Talking Black (2017), and
The Creole Debate (2018). A contributing editor at The New Republic and The Atlantic, he
has also hosted Slate’s linguistics podcast Lexicon Valley.

Felicity Meakins is ARC Future Fellow in Linguistics at the University of Queensland and
Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. She is a
field linguist who specializes in the documentation of Australian Indigenous languages in
the Victoria River District of the Northern Territory and the effect of English on
Indigenous languages. She has worked as a community linguist as well as an academic over
the past twenty years, facilitating language revitalization programmes, consulting on
Native Title claims, and conducting research into Indigenous languages. She has compiled
a number of dictionaries and grammars of traditional Indigenous languages, and has
written numerous papers on language change in Australia.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  xix

Marianne Mithun is Professor of Linguistics at the University of California, Santa Barbara.

Her interests range over morphology, syntax, discourse, prosody, and their interrelations;
language contact and language change; typology; language documentation and revitaliza-
tion; and the languages indigenous to North America and Austronesia.
Johanna Nichols is Professor Emeritus in the Department of Slavic Languages at the
University of California, Berkeley. She works on Slavic languages, languages of the
Caucasus, linguistic typology, and historical linguistics. She is AAAS Fellow and LSA
Fellow, and presently holds visiting positions as Helsinki University Humanities Visiting
Professor and Research Supervisor in the Linguistic Convergence Laboratory, Higher
School of Economics, Moscow. She has done extensive fieldwork on the Ingush language
of the central Caucasus.
Rachel Nordlinger is Professor of Linguistics at the University of Melbourne and Chief
Investigator in the ARC Centre of Excellence for the Dynamics of Language. Her research
centres around the description and documentation of Australia’s Indigenous languages
and their implications for linguistic typology. She has also published on topics in syntactic
and morphological theory, and in particular the challenges posed by the complex
grammatical structures of Australian languages.

Jeff Parker is Assistant Professor of Linguistics at Brigham Young University. His research
centres around better understanding inflectional structure from different methodological
perspectives, including investigations into how language specific traits contribute to the
complexity of inflection class systems, how inflectional structure affects lexical access of
inflected forms, and how computational models of learning help explain typological
tendencies in inflection class systems. He has published in journals such as Morphology,
Word Structure, and The Mental Lexicon, as well as the Slavic-focused Slavic and East
European Journal. He is also co-editor of a forthcoming volume, Morphological Typology
and Linguistic Cognition (forthcoming, with Andrea D. Sims, Adam Ussishkin, and
Samantha Wray).
Arturs Semenuks is a PhD student in the Department of Cognitive Science at the
University of California, San Diego. He uses experimental and computational methods to
investigate what sociocognitive pressures affect the structure of language, especially its
morphological complexity, as well as what constraints exist on how language can be
structured in principle, and how language affects human thought. His previous work at the
University of Essex focused on the relationship between sentence processing costs and
acceptability judgements.
Andrea D. Sims is Associate Professor at The Ohio State University, jointly appointed in
the Department of Linguistics and Department of Slavic and East European Languages
and Cultures. Much of her research focuses on the internal organization of inflection class
systems (defectiveness and irregularity, syncretism, inflection class complexity) and factors
influencing its emergence, reinforcement, and generalization. She is author of a research
monograph, Inflectional Defectiveness (2015), co-author of a morphology textbook,
Understanding Morphology (2nd edn, 2010, with Martin Haspelmath), and co-editor of
Morphological Typology and Linguistic Cognition (forthcoming, with Adam Ussishkin, Jeff
Parker, and Samantha Wray).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xx  

Gregory Stump is Professor Emeritus of linguistics at the University of Kentucky. His

research includes work on the structure of complex inflectional systems, the nature of
inflectional complexity, and the algebra of morphotactics. His research monographs
include Inflectional Morphology: A Theory of Paradigm Structure (Cambridge University
Press, 2001), Morphological Typology: From Word to Paradigm (Cambridge University
Press, 2013, co-authored with Raphael A. Finkel), and Inflectional Paradigms: Content and
Form at the Syntax-Morphology Interface (Cambridge University Press, 2016). He is a co-
editor of the journal Word Structure. He now resides in Olathe, Kansas.
Adam J. R. Tallman is Postdoctoral Researcher at Laboratoire Dynamique du Langage
(Université de Lyon II). His research focuses on the documentation and description of the
languages of the Amazon. His PhD thesis (University of Texas at Austin, 2018) was a
grammar of Chácobo (Pano) based on extensive (ELDP and NSF funded) documentation.
Currently he is undertaking the documentation of Araona (Takanan). Apart from his
primary interest in documentation and description, Tallman focuses on morphophonol-
ogy, constituency, and the application of quantitative methods to linguistic typology.
Delphine Tribout is Assistant Professor at the University of Lille, France, and member of
the CNRS research centre, Savoirs, Textes, Langage. Her main research interests are
derivational morphology, especially conversion, and lexical semantics.

Sasha Wilmoth is a PhD candidate at the Centre of Excellence for the Dynamics of
Language at the University of Melbourne, Australia, working on intergenerational
variation and change in Pitjantjatjara. She completed her BA (Hons) degree at the
University of Melbourne. She was previously a Research Assistant at the University of
Queensland, and Linguistic Project Manager at Appen, a Sydney-based company which
provides specialized linguistic data and services for speech and language technologies. Her
research interests include morphology, syntax, and digital methods for language
documentation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

1
Introduction
Complexities in morphology
Peter Arkadiev and Francesco Gardani

1.1 Setting the scene

Morphological and, broadly, linguistic complexity has become a popular topic in

linguistic typology and theorizing, as several recent publications testify to, such as
McWhorter (2001, 2005, 2018); Kusters (2003); Dahl (2004); Hawkins (2004,
2014); Trudgill (2004a, 2011); Shosted (2006); Miestamo et al. (2008); Sampson
et al. (2009); Dressler (2011); Kortmann & Szmrecsanyi (2012); Newmeyer &
Preston (2014); Baerman et al. (2015b, 2017); Reintges (2015); Baechler & Seiler
(2016); Mufwene et al. (2017); among many others. While this large body of work
has contributed to significantly improving our understanding of morphological
complexity, a number of key issues remain unsettled. They are of both theoretical
and empirical nature and pertain to the domain of morphology and morphosyn-
tax as well as to the ways language use and its socioecological conditions influence
linguistic structure. Undoubtedly, the most pressing question is what morpho-
logical complexity actually is. There is no straightforward answer to this question,
as we will see. The issue of how to define ‘morphological complexity’ is of central
importance to us and will be treated in detail in the course of this Introduction and
of the volume. To properly frame this central issue, however, we can anticipate
that the notion of ‘complexity’ in morphological systems is often revealed and
investigated through a set of relative measures that attempt to quantify the extent
of morphology in a language, the predictability of the morphological system, and
the pressures this places on processing and acquisition. The goal of the present
volume is to build upon previous work on morphological complexity and to
provide a crosslinguistic view on the key problems of its investigation seen from
the perspective of a variety of current approaches.
In the heart of all discussions of linguistic complexity, and especially of
morphological complexity, lies the idea that complexity itself is a parameter of
crosslinguistic variation. The history of this line of thought (see Joseph &
Newmeyer 2012 for an excellent overview) shows some non-trivial swings of the
pendulum ranging from the pre-theoretical assumptions of the linguists and

Peter Arkadiev and Francesco Gardani, Introduction: Complexities in morphology In: The Complexities of Morphology. Edited
by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Peter Arkadiev and Francesco Gardani.
DOI: 10.1093/oso/9780198861287.003.0001
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

2     

philosophers of the early nineteenth century about the ‘complex’ classic Indo-
European languages as opposed to the ‘primitive’ languages of ‘uncivilized people’
to explicit statements that all languages are equally complex. The latter view,
which is known under the label of ‘equicomplexity hypothesis’, takes into account
obvious differences between languages in the mere degree of elaboration of
different structural subdomains (such as, e.g., vowels vs. consonants or nominal
vs. verbal morphology); it states that ‘these isolable properties may hang together
in such a way that the total complexity of a language is approximately the same for
all languages’ (Wells 1954: 104; see also Hockett 1958: 180). Such a position, which
is still commonly held by linguists of different backgrounds and theoretical
persuasions (see, again, Joseph & Newmeyer 2012: 348–9; and Miestamo 2017),
has been challenged by others, who have shown that ‘complexity in one area of
grammar [correlates] positively with complexity in another area’ (Sinnemäki
2014: 190).
With the development of contact linguistics and especially of pidgin and creole
studies in the second half of the twentieth century, claims started being made that
pidgins and creoles are structurally overall simpler than languages with a ‘regular’
sociolinguistic history (see, e.g., such work as Bickerton 1984; McWhorter 2001,
2005; Parkvall 2008; Bakker et al. 2011; Good 2012b, 2015), and, more generally, it
has been claimed that linguistic complexity is subject to diachronic change and the
effects of language contact (see Dahl 2004 and Trudgill 2011). As a matter of fact,
statements to the effect that sociolinguistic parameters such as the number of
speakers and degree of contact with other languages affect the complexity of
linguistic (sub)systems go back as early as Jakobson (1929) and Trudgill (1983).
Once it had been recognized that morphological complexity is a parameter of
crosslinguistic variation, the urge arose to develop non-impressionistic and cross-
linguistically applicable ways of measuring and quantifying the degree of mor-
phological complexity of individual languages. The most important proponent of
this line of thought is certainly Greenberg (1954), who developed a methodology
of quantitative measurement of different types of morphological structure, the
most famous of which is the ‘synthetic index’ (p. 185), that is, morpheme-to-
word¹ ratio in a sample of texts, which arranges languages into a continuum
spanning from radically isolating to polysynthetic. This simple metric, however, is
clearly insufﬁcient for the assessment of morphological complexity, since morph-
ology is much more than mere arrangement of morphemes into words. As a
simple illustration, consider the case-number paradigms of Turkish (Lewis 2001:
28) and Lithuanian (P.A.’s own knowledge) nouns in Table 1.1.
Both Turkish and Lithuanian have two number and six case values, yielding
twelve word forms. However, while in Turkish case and number are expressed

¹ ‘Word’ is intended as ‘word form’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 3

Table 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’

   

 ev ev-ler  miest-as miest-ai

 ev-i ev-ler-i  miest-ą miest-us
 ev-in ev-ler-in  miest-o miest-ų
 ev-e ev-ler-e  miest-ui miest-ams
 ev-de ev-ler-de  miest-e miest-uose
 ev-den ev-ler-den  miest-u miest-ais

separately by dedicated sufﬁxes in a compositional way, Lithuanian has cumula-

tive (fused) exponence of both features. Under Greenberg’s morpheme-per-word
ratio, Turkish nominal word forms are more complex than Lithuanian ones just
because Turkish may have three (and in fact much more) morphemes per
nominal word form (e.g., ev-ler-de house--), while Lithuanian has only
two (miest-uose city-.). However, if we consider the total number of differ-
ent affixes occurring in the given paradigms, we find that Turkish with its six overt
affixes is actually simpler than Lithuanian with its twelve affixes (see, e.g., Plank
1986 for an early attempt to assess the complexity of morphological systems in
such terms). Things become even more complicated if we go beyond Table 1.1 and
consider the existence of at least five arbitrary inflectional classes of nouns in
Lithuanian intersected by four partly arbitrary accentual classes, also called
‘accentual paradigms’ (a.p.), in Table 1.2 (from Arkadiev et al. 2015: 16; ‘hard’
and ‘soft’ refers to subdeclensions with non-palatalized and palatalized stem-final
consonant, respectively; for more details on Lithuanian declension classes, see
Ambrazas et al. 2006: 107–33).
This example suggests that along with morphological complexity on the syn-
tagmatic axis (something that can be measured by the morpheme-to-word ratio)
there exists morphological complexity on the paradigmatic axis, the two being
logically and empirically independent of one another. Thus understood, morpho-
logical complexity becomes a composite notion and does not admit of such simple
measurement as syntagmatic complexity (see more on this issue below), therefore
an unbiased and non-reductionist crosslinguistic empirical investigation of mor-
phological complexity itself becomes a fairly complex problem.²
All in all, it seems to us that the most urgent still unsolved issues in morpho-
logical complexity can be captured in terms of the following questions:

² In this connection, Haspelmath (2009) has shown that parameters traditionally attributed to
‘ﬂexion’, as opposed to ‘agglutination’, such as cumulation, stem allomorphy, and afﬁx allomorphy,
are logically and empirically independent of each other.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 1.2. Sample paradigms of Lithuanian nouns

I hard ‘man’ () I soft ‘horse’ () II hard ‘day’ () II soft ‘bee’ () III hard ‘son’ () IV (soft) ‘night’ ()
I a.p. III a.p. IV a.p. II a.p. III a.p. IV a.p.

  výras arklỹs dienà bìtė sūnùs naktìs

 výro árklio dienõs bìtės sūnaũs naktiẽs
 výrui árkliui diẽnai bìtei sūń ui nãkčiai
 výrą árklį diẽną bìtę sūń ų nãktį
 výru árkliu dienà bitè sūnumì naktimì
 výre arklyjè dienojè bìtėje sūnujè naktyjè
 výre arklỹ diẽna bìte sūnaũ naktiẽ
  výrai arkliaĩ diẽnos bìtės sūń ūs nãktys
 výrų arklių̃ dienų̃ bìčių sūnų̃ naktų̃
 výrams arkliáms dienóms bìtėms sūnùms naktìms
 výrus árklius dienàs bitès sūń us naktìs
 výrais arkliaĩs dienomìs bìtėmis sūnumìs naktimìs
 výruose arkliuosè dienosè bìtėse sūnuosè naktysè
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 5

1. The hypothesis that morphology and syntax represent distinctly different,

but interdependent types of grammatical organization has been challenged
by scholars such as Haspelmath (2011), claiming that the divide between
morphology and syntax is not clear-cut and hence irrelevant for typology.
Given this, are there theoretical and methodological tools suitable to deﬁne
morphological complexity and if yes, which ones?
2. If we, however, accept the hypothesis that the morphology vs. syntax divide
is crosslinguistically and theoretically valid (see Arkadiev & Klamer 2019;
Arkadiev 2020)—a view which we espouse—can we arrive at a uniform
notion of morphological complexity given the diversity of morphological
phenomena?
3. In direct connection to the former question, can we arrive at a single and
straightforward measure of complexity that applies to languages that display
radically different morphological encoding strategies?
4. What is the role of sociolinguistic, psycholinguistic, and diachronic factors
in affecting morphological complexity?

These problems constitute the main research questions of this volume, which
aims to tackle them in a principled way, by presenting a collection of original
research papers on different aspects of morphological complexity. This introduc-
tory chapter is meant to outline the ﬁeld and take the reader through the volume,
and it is organized as follows: section 1.2 pursues the question of the scope of
‘morphological complexity’; section 1.3 surveys several conceptions and meth-
odological approaches to morphological complexity distinguishing between two
main types: formal approaches (section 1.3.1) and psycholinguistic approaches
(section 1.3.2). Section 1.4 presents the structure of the volume and summarizes
the contributions to it.

1.2 What is complex?

In all discussion on morphological complexity, a question hangs in the air. Is

morphology complex in its own right? This question is partly rhetorical, maybe
trivial, but still central, as it concerns the theoretical demarcation of the object of
investigation. The widespread expression ‘morphological complexity’ has at least
two readings. It can refer to the overall contribution of morphology to complexity
in grammar or it can mean complexity inside morphology.
The ﬁrst reading, viz. morphology as a source of complexity for the overall
language system, would be justiﬁed by the fact that languages can do (almost)
entirely without morphology and that ‘a language can persist for a long time with
little or no morphology’ (Aronoff 2015: 282). In this vein, Carstairs-McCarthy
(2010: ch. 2) and Anderson (2015a: 12–13) conceive of morphology as a
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

6     

redundant architectural quirk added to the logically necessary systems of syntax

and phonology, and Aronoff goes so far to declare: ‘morphology is inherently
unnatural. It’s a disease, a pathology of language’ (Aronoff 1998: 413). Such a view
apparently entails that languages without morphology (e.g., Yoruba) are less
complex than languages with at least a little morphology (e.g., Tok Pisin). This
type of morphological complexity could then be paraphrased as ‘complexity
induced by morphology’. The assumption that morphology per se is a complica-
tion resonates with the terminological use of ‘morphological complexity’ to define
the property of words having an internal morphological structure, being, so to say,
morphologically complex, as we find in some authors concerned with word
recognition (e.g., Fiorentino & Poeppel 2007; Bozic & Marslen-Wilson 2010),
sign linguistics (Zwitserlood 2003), and rarely word formation (Hay 2003).
Clearly, in this usage, complexity means the presence of internal structure, and
claiming that a formally complex (i.e., composite) word is in itself complex, as
opposed to a simplex word, amounts to saying that morphology as such is
complexity. That would imply that morphology makes the language system
more complex—an observation that is relative to other components of a lan-
guage’s grammar. Adopting the concept of ‘effective complexity’ by Gell-Mann
(1995), Moscoso del Prado Martín (2011) performs a corpus-based measure of the
inflectional complexity of six European languages and claims that there is a ‘strong
degree of mutual dependence between morphological and syntactic information.’
As he shows, when information on word order is explicitly factored in, the
apparent gradation in complexity across languages, as calculated on the basis of
the number of inflected forms per word, disappears. He arrives at the conclusion
that ‘inflectional morphology serves a role in reduction of uncertainty, simplifying
the description of the whole grammar’ (p. 3528). Whether or not this be the case,
this question—although of great importance also for cognitive approaches to
complexity—is not within the scope of the present book. Rather, we are concerned
with the second reading of morphological complexity, that is, complexity inside
morphology.
Taking an inner-morphological perspective, we focus on which morphological
phenomena can be considered complex or more complex than others and look at
different degrees of complexity within morphology. Some authors have swiftly
found an answer to this question, by identifying the core of morphological
complexity in phenomena currently running under the heading of autonomous
(or ‘pure’) morphology—including morphological entities and processes that are
not extramorphologically motivated in a straightforward way, such as, for
example, inflectional classes, allomorphy, patterns of syncretism, suppletion, etc.
(Aronoff 1994; Maiden et al. 2011; Cruschina et al. 2013). For example, Baerman
et al. (2015b: 4) consider morphological complexity as ‘the additional structure
that cannot readily be reduced to syntax or phonology’. This extra layer of purely
morphological structure, such as inflection classes in the Lithuanian example in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 7

section 1.1, may attain an astonishing degree of gratuitous complexity, whereas

the mere presence of (possibly elaborate) transparent and regular affixal expres-
sion of grammatical meaning, such as exemplified by Turkish, is of least relevance
for the study of morphological complexity (see also a discussion of different
aspects of complexity in the polysynthetic languages, traditionally assumed to be
the hallmark of morphological complexity, by Dahl 2017 and Sadock 2017).
Of course, the decision to only focus on autonomous morphology has a great
methodological advantage, as it provides a clear answer to the question we
formulated in section 1.1, concerning the problematic demarcation of morph-
ology and syntax. However, while we acknowledge that phenomena of pure
morphology (‘morphology by itself ’) do increase the complexity of morphology
as a whole because they have no external motivation, morphology by itself, as it
has been theorized, only includes inflection. This would imply that only inflection
counts as the locus of complexity and it is a matter of fact that most of the
literature published on this topic is exclusively devoted to inflection (see Baerman
et al. 2015a, 2017; Baechler 2017). Definitions of morphological complexity (in
quantitative terms) such as the number of morphosyntactic features that a lan-
guage has and the morphological means that are used to realize these features (see
below) conform to this view, for morphosyntactic features are typically realized by
inflection.
As a matter of fact, work on the complexity of word formation processes is
virtually missing in the literature, the only two exceptions known to us being a
one-paragraph section in Nichols et al. (2006: 101–3) and Stump (2017: 70), each.
Therefore, there is no study investigating whether inflection or word formation
differ in their degree of complexity along one or another parameter. As Franz
Rainer (personal communication, 2017) observes, ‘a great number of asymmetries
emerge between word formation and inflection with respect to different dimen-
sions of complexity’, such as the number of elements in the system, number of
affixes in a word, or the complexity of allomorphy, among others. However, he
notices, ‘in the literature on the inflection-derivation divide (cf. Štekauer 2015),
complexity has not been identified up to now as a possible dimension along which
these two subcomponents of morphology might differ’. Lack of work on this
specific topic might be due to multiple reasons: first, the boundaries between
inflection and word formation are often fuzzy; second, word formation, with
lexical enrichment as its central function and all its corollaries (e.g., importance
of encyclopedia, semantic drift), is less neat and less automatic than inflection and
more difficult to grasp (see Kusters 2003: 14–16); third—and crucially—the
generally adopted metrics of morphological complexity (see section 1.3) mostly
focus on formal criteria, thus lumping together categories of inflection and those
of word formation under the general heading of morphological complexity. As we
will see in more detail below, research in particular by Dahl (2004, 2009) and
Trudgill (2009, 2011) has identified three major ingredients of synchronic
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

8     

morphological complexity, which seem to apply to both inﬂection and word

formation: (a) irregularity (e.g., allomorphy); (b) morphosemantic and morpho-
tactic opacity (such a fusion of formatives, cumulative or portmanteau formatives,
suppletion and non-linear suprasegmental feature realizations); and (c) syntag-
matic redundancy (e.g., pleonastic afﬁxation, see Gardani 2015).

1.3 How many complexities?

As we have seen in section 1.1, the linguistic literature on complexity is abundant,

not least because ‘[h]ow to measure morphological complexity is itself an issue of
some complexity’ (Nichols 1992: 64). As Miestamo (2017: 229) has appropriately
noticed, complexity refers either to ‘something that is rich in internal composition
(i.e. contains many parts as well as multiple and intricate connections between
them), or to something that is difficult to do or to understand.’ In the first case,
complexity is an objective property of a linguistic system and therefore labeled
‘objective complexity’ (Dahl 2004: 2) or ‘absolute complexity’ (Miestamo 2008) or
‘formal complexity’ (Stump 2017); in the second case, complexity is conceived as
cost/difficulty that a given linguistic system or structure causes to language users
and labeled ‘relative complexity’ (Miestamo 2008, 2017) or ‘psycholinguistic
complexity’ (Stump 2017). In the following, we will adopt Stump’s terminology.

1.3.1 Formal morphological complexity

Formal complexity can be subsumed under the following general deﬁnition of

complexity provided by the philosopher Nicholas Rescher: ‘Complexity is ﬁrst and
foremost a matter of the number and variety of an item’s constituent elements and
of the elaborateness of their interrelational structure, be it organizational or
operational’ (Rescher 1998: 1). In linguistics, we identify three principal directions
in research on formal complexity, in terms of how it is conceptualized and measured:
(1) quantitative approaches; (2) qualitative approaches; and (3) information-
theoretic approaches.
Quantitative approaches conceive complexity in terms of the number of elem-
ents of which a given morphological entity consists, mainly inventory size and
string length, or alternatively, the length of the rules necessary to describe a form.
This quantitatively construed type of complexity, dubbed ‘enumerative complex-
ity’ by Ackerman & Malouf (2013), is detectable both syntagmatically and para-
digmatically. On the syntagmatic axis, it can be the before-mentioned average
number of morphemes per word form (Greenberg 1954, 1960) or the maximal
number of inﬂectionally expressed categories per verb (Bickel & Nichols 2005);
this type corresponds to Rescher’s constitutional complexity, viz. the ‘[n]umber of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 9

constituent elements or components’ (Rescher 1998: 9). On the paradigmatic axis,

enumerative complexity relates to the number of distinct inflectional classes for a
given part-of-speech (i.e., allomorphy) or the number of cells in a paradigm
corresponding to the realizations of different values of a given morphological
feature (e.g., case); this type of complexity corresponds to Rescher’s taxonomical
complexity, the ‘[v]ariety of constituent elements, i.e., number of different kinds of
components in their physical configuration’ (Rescher 1998: 9). Up to fairly recent
times, only enumerative complexity had featured prominently in the literature,
especially in typologically oriented research; for example, it is only this kind of
complexity that is represented in WALS (Haspelmath et al. 2005; Dryer &
Haspelmath 2013), certainly due to practical reasons. In this respect, it is worth
mentioning several works specifically addressing the issue of enumerative para-
digmatic complexity, such as Rhodes (1987) on the different morphological
makeup of large and small paradigms and a whole series of works by Carstairs-
McCarthy, whose aim was to find constraints on enumerative complexity of
inflectional classes in terms of the number of affixal allomorphs and their prop-
erties (see Carstairs 1983; Carstairs-McCarthy 1994, 1998, 2010). Another type of
quantitative measure concerns not the number of the elements composing a
morphologically complex form but rather the (minimum) size (or length) of the
rules required to describe and generate such a form. This type of qualitative
approach, often referred to as Kolmogorov complexity, resonates with the
Rescher’s concepts of both descriptive complexity (the ‘[l]ength of the account
that must be given to provide an adequate description of the system at issue’) and
generative complexity (the ‘[l]ength of the set of instructions that must be given to
provide a recipe for producing the system at issue’, Rescher 1998: 9) (cf. Dahl’s
‘minimum description length’, Chapter 13, this volume).
Qualitative approaches conceive complexity in terms of identifying those mor-
phological patterns/elements that are complex or more complex than others.
Proponents of qualitative approaches need to stipulate an unmarked,
complexity-neutral ideal—a canon, often conceived as an isomorphic relation of
content to form—upon which to construe hierarchies of complexity in terms of
degrees of deviation from it. Most notably, work by Corbett (e.g., 2007, 2015) has
propagated the notion of non-canonicity (both in inflection and derivation),
which can be defined as any deviation from properties such as transparency,
regularity, and form-function biuniqueness, as is manifested, for example, in
non-phonological allomorphy of affixes and stems (Baerman et al. 2017: 100–7),
overabundance (Thornton 2019), multiple (extended) exponence (Harris 2017),
syncretism (Baerman et al. 2005), defectiveness (Baerman et al. 2010), and poly-
functionality (Stump 2016: 228–51), let alone more dramatic deviations such as
suppletion (Stump 2006a; Corbett 2007) or deponency (Baerman et al. 2007).
Early discussions of non-canonicity and its possible interactions with enumerative
complexity can be found in Plank (1986) and Carstairs (1987) in addition to
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

10     

works already mentioned, while recently, Johanna Nichols (2009) has hinted at a
possible metric of morphological complexity related to non-canonicity (a proposal
she fully develops in Chapter 7, this volume). Most studies of non-canonical
phenomena in morphology have focused on the paradigmatic axis; however,
nothing per se precludes the application of this notion to syntagmatic phenomena,
such as combinatorics and mutual order of affixes (here comes to mind the
distinction between semantically driven layered organization of morphology vs.
opaque templatic morphology; see Stump 2006b, Good 2016), concatenative vs.
non-concatenative exponence, morphophonological transparency vs. opacity and
other issues belonging to the domain of morphotactics. It remains an empirical as
well as a conceptual question, though, which kind of morphotactic organization
should be considered ‘canonical’ and ‘less complex’. For instance, in languages
where affix order directly reflects semantics, it is usually possible to permutate
certain affixes depending on their mutual scope (Rice 2011; Mithun 2016);
whether such deviations from fixed ordering constitute additional complexity is
not at all obvious.
While teleologically different, also Natural Morphology (Dressler et al. 1987;
Dressler & Kilani-Schoch 2016; Dressler 2019) is centered on the idea of deviation
from a core.³ Aiming at accounting for morphological preferences based on
extralinguistic motivations, it theorizes a semiotically derived notion of natural-
ness, defined as the immediate, most unmarked, cognitively easiest, and thus
universally preferred option. Conversely, naturalness-defining criteria determine
deviation from the (most) natural option. This framework makes clear that other
factors come to play a role in the conception and interpretation of morphological
complexity, such as, for example, transparency vs. opacity of forms or morpho-
tactic rules. As Hengeveld & Leufkens (2018: 141) observe, ‘languages may be
complex, yet transparent, or simple, yet opaque’. To take the concrete case, the
Turkish vs. Lithuanian data in Table 1.1 show that Turkish morphology is more
complex in the sense that a single word form may potentially contain a high
number of morphemes. At the same time, however, it is transparent in that every
morpheme corresponds to one fixed meaning, while Lithuanian morphology is
more opaque. In the framework of Natural Morphology, Dressler (2011) views
unnaturalness as a source of complexity and morphological complexity as the sum
of all morphological categories, rules, and inflectional classes of a language,
including both productive and unproductive patterns. Distinguishing between
productive and unproductive patterns, he considers morphological complexity a
hyperonym of morphological richness, which is conceived only in terms of
productive patterns (Dressler 2003: 47; see also Dressler, Kononenko, et al.

³ Note that, while qualitatively oriented, both Natural Morphology and Canonical Typology are
implicitly able to quantify degrees of complexity, computing the degree of deviation from the natural
core or canon, respectively.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 11

2019). This distinction between active and static parts of morphology, is, in our
view, not only of crucial importance with respect to psycholinguistic approaches
to complexity but also foundational of approaches focused on predictability, as we
will see below.
Finally, information-theoretic approaches play down the role of combinatorics
and construe morphological complexity in terms of predictability and entropy.
Their development is intimately related to word-and-paradigm models of morph-
ology, which consider inflectional systems as networks of implicative relations
holding between fully-inflected word forms. Consequently, they aim to under-
stand to what extent the choice of exponence for a given cell is predictable from
any other information available to the speaker, with complexity being in an
obvious inverse relation to predictability (cf. Finkel & Stump 2007, 2009; Stump
& Finkel 2013). Ackerman & Malouf (2013) propose the term ‘integrative com-
plexity’, based on the notion of entropy as ‘a measure of the reliability of guessing
unknown forms on the basis of known ones’, that is, a measure of predictability.
They move from the intuition that ‘speakers must generalize beyond their direct
and limited experience of particular words’ (p.436) and posit a ‘Low Entropy
Conjecture’: morphological systems, such as paradigms, in which conditional
entropy among related word forms is low, are more efficient, as they ‘permit
these crucial inferences to be made easily’ (p. 436) (cf. ‘Paradigm Structure
Conditions’ of Wurzel 1989).⁴ In other words, complexity derives from opaque
intraparadigmatic relations, for opacity hampers the predictability and predictive-
ness among word forms in a lexeme’s paradigm. The ‘Low Entropy Conjecture’ is
supported by recent studies on inflection class systems clearly violating the
enumerative complexity-based constraints of the kind proposed by Carstairs-
McCarthy (see Baerman 2012, 2016; Sims 2015).⁵
The approaches to formal morphological complexity surveyed thus far share
the potential to seize the degree of complexity. However, some typological studies
have pursued the topic without a focus on metrics. One line of investigation, for
example, has concerned the relation of (certain aspects of) morphological com-
plexity to any other typological parameters such as phonological systems (Shosted
2006; Fenk-Oczlon & Fenk 2008, 2014), word order (e.g., Sinnemäki 2008; Bentz
& Christiansen 2013), among others. Other studies have focused on the differen-
tial elaboration of nominal and verbal morphology (e.g., Nichols 1986, 1992;
Mithun 1988; Kibrik 2012). In this domain, there are still more open questions
than established answers, partly because of the lack of consensus as regards the

⁴ Also morphomic stem distributions have been interpreted in terms of predictive relations by
Blevins (2016b: 123), a view partly criticized by Maiden (2018: 23–4).
⁵ It is likely that a conception of complexity based on entropy applies better to inﬂection than word
formation because inter-word relations are generally much more complex in inﬂectional than in
derivational paradigms.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

12     

definition of the relevant aspects of complexity and the adequate ways of its
measurement.
Still another line of research is concerned with the relation between morpho-
logical complexity and sociolinguistic typology. In section 1.1, we already men-
tioned the idea that pidgins and creoles are in general less complex than languages
with a long history and uninterrupted transmission. More generally, in recent
work (e.g., Trudgill 1997, 2009, 2011, 2017; Kusters 2003, 2008; McWhorter 2007,
2008; Lupyan & Dale 2010; Bentz & Winter 2013; Bentz et al. 2015; Bentz 2016),
claims have been advanced that the overall degree of complexity as well as certain
particular types of grammatical complexity correlate with such socioecological
conditions of language use as high vs. low degree of contact, number of adult
learners, size and geographic expansion of the speaker population, and some
others (see also Tinits 2014 for a behavioural experiment with a miniature
artificial language). Significantly, most of such studies have focused on simplifi-
cation caused by language contact (see Dorian 1978; McWhorter 2001; among
many others), emphasizing that morphological complexity requires long-term
periods of socioecological stability to develop (Dahl 2004). Nevertheless, studies
exist showing that certain types of language contact (e.g., those involving stable
childhood multilingualism) can contribute to preserve complex patterns (Trudgill
2011; Mithun 2015) and even result in increase rather than loss of morphological
complexity due to borrowing and contact-induced grammaticalization (see
Vanhove 2001; Aikhenvald 2002, 2003a; de Groot 2008; Loporcaro 2018;
Loporcaro et al. forthcoming). Also processes of language genesis brought about
by language contact do not necessarily come along with morphological simplifi-
cation. In a study on the rapid birth of a new mixed language in Australia,
Gurindji Kriol, from the admixture of Gurindji and Kriol, Meakins et al. (2019)
demonstrate that there was no preferential adoption into Gurindji Kriol of less
complex variants and that, in fact, complex Kriol variants were more likely to be
adopted than simpler Gurindji equivalents. Given that Gurindji Kriol is the
primary language of the younger generation in the Gurindji community,
Meakins et al. interpret these results in light of the fact that the acquisition of
morphology in morphologically complex languages is less challenging for children
than for adults (cf. also Miestamo 2008). The issue of ease vs. difficulty of process-
ing in language acquisition leads us over to the second main type of morphological
complexity introduced in section 1.3, viz. psycholinguistic morphological
complexity.

1.3.2 Psycholinguistic morphological complexity

As we have seen in the previous section, also Natural Morphology and Ackerman
& Malouf’s (2013) integrative complexity appeal to ease in processing and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 13

production, as a key to the interpretation of what is complex in morphology.

These models build a bridge to the second type of approach to morphological
complexity, psycholinguistic morphological complexity, that focuses on the cost/
difficulty that a given linguistic system or structure causes to language users, that
is, computational effort. Psycholinguistic approaches to morphological complex-
ity assume that the degree of ease vs. cost of a morphological pattern in processing
and production correlates with its degree of complexity. This line of research
draws evidence from three areas of study: adult processing, L1 and L2 acquisition,
and the performance of artificial automatic learning.
One line of investigation within this field has developed around the equation of
complexity with low parsability (Stump 2017). In this respect, the debate on the
balance between memory retrieval and online computation in language produc-
tion is particularly relevant. In the context of the debate on lexical access and
specifically of the so called English past-tense debate (for references, cf. Ambridge
& Lieven 2011: 169–87), Pinker & Prince (1988) argued for a ‘dual-route’ model
that could account for both irregular forms (feel/felt), which are memorized as
wholes in the mental lexicon, and an online rule of default responsible for
morphemic concatenation (walk/walked) (see also Gardani et al. 2019: 24–7). At
the same time, it was observed that regular forms with high frequency can also be
stored in the mental lexicon (Alegre & Gordon 1999a: 56). However, the fact that
both morphologically less complex (i.e., highly parsable) and morphologically
complex (i.e., low parsable) word forms can be lexically stored leads to concluding
that complexity qua parsability does not correlate with processing cost. The role of
frequency in lexical access has been stressed by nobody else as vigorously as by
Joan Bybee (1985, 1995, 2007). Consequently, the conception of complexity
focusing on system complexity, in which irregularity is viewed as an ingredient
of complexity, is incompatible with the results of studies on processing complex-
ity, which have shown that irregularity does not per se constitute an obstacle for
the language user, as it can be defeated by frequency.
Studies in language acquisition, too, do not necessarily support the hypothesis
that psycholinguistic complexity and formal complexity coincide. For example, in
a crosslinguistic study on the relationship between the morphological complexity
of child-directed speech and the speed of morphological acquisition in children,
Xanthos et al. (2011) found a strong positive correlation between inflectional
complexity of the input and the speed of acquisition. This result seems to suggest
that the more morphology in the input, the easier the morphology is to acquire.
According to Kelly et al. (2014), formal complexity such as heavy synthesis in
polysynthetic languages is not a challenge for L1 acquisition if the templatic
sequence in which formatives are used is regular, and Allen (2017) also reports
longitudinal studies showing that Inuit children acquire elaborate derivational and
inflectional morphology early and with ease. (See also Stoll et al. 2017, on the
acquisition of verb morphology in polysynthetic Chintang.) Other acquisitional
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

14     

studies construe formal complexity not as constitutional complexity but as descrip-

tive complexity. For example, in a crosslinguistic study on the emergence and early
development of synthetic compounds, Dressler, Sommer-Lolei, et al. (2019) pro-
vide evidence that synthetic compounds (i.e., compounds in which the head is
derived from a verb and the non-head is an argument of this verb) such as
German Nussknacker ‘nutcracker’ are acquired later than comparable three-
constituent compounds. They interpret this later acquisition as a sign of higher
complexity: equating the degree of complexity with the number of rules involved,
synthetic compounds, which are derived by both a rule of compounding and a rule
of derivation, are more complex than words derived either only by compounding
or only by derivation rules.
Besides that, numerous studies, both typological and experimental (e.g., Wray
& Grace 2007; Lindström 2008; Trudgill 2011; Bentz et al. 2015; Bentz &
Berdicevskis 2016; Atkinson et al. 2018), show that morphological complexity,
while being an obstacle to L2 acquisition in adults and hence subject to erosion,
regularization, and loss in those situations of language contact that involve
massive adult acquisition, does not, in fact, constitute a severe challenge for L1
acquisition in children. Moreover, Lupyan & Dale (2010) have hypothesized that
infants, in fact, benefit from the increased redundancy brought about by morpho-
logical complexity in languages used in small groups.
Psycholinguistic approaches to morphological complexity have attracted criti-
cisms mainly of two sorts. One problem is that the perception of ease or,
conversely, difficulty, might vary among language users, and therefore might
not be an objective metric; the other problem is that ‘psycholinguistic background
research on the processing cost and learning difficulty of a given grammatical
phenomenon’ might not be enough (Miestamo 2017: 232). As a matter of fact, the
correlation between ‘our intuitive notion of morphological complexity and actual
evidence of the pace of acquisition of more or less complex inflectional systems in
child language’ (Marzi et al. 2018) seems to be poor. In order to solve at least the
objectivity issue, recent research in morphological complexity has expanded into
the field of neurobiologically inspired computational models of processing and
learning. In one such study, Marzi et al. (2018) have focused on the performance
of recurrent self-organizing neural networks trained to learn languages, in order to
understand how degrees of inflectional complexity affect word processing strat-
egies. They found a significant systematic correlation between regularity and
predictability of verb forms and interpret the evidence ‘as the result of a balancing
act between two potentially competing communicative requirements’, viz. recog-
nition (leading to a maximally contrastive system) and production (leading to
maximally predictable forms).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 15

1.4 About this volume

In section 1.1, we identified four issues we deem among the most urgent to solve in
research on morphological complexity. In order to tackle these issues in a prin-
cipled way, we convened a dedicated workshop ‘Morphological Complexity:
Empirical and Cross-Linguistic Approaches’ at the 48th Societas Linguistica
Europaea (SLE) meeting in Leiden in 2015. The present volume is a collection
of original research papers consisting in equal measure of papers delivered at the
workshop and of invited contributions. (Each chapter was subject to a threefold
reviewing process consisting of an anonymous external reviewing, a non-
anonymous internal review performed by a fellow contributor, and comments
by the editors.) The volume features: (a) various theoretical, methodological, and
typological perspectives on morphological complexity (from ‘classic’ morpho-
logical description to experimental and information-theoretic approaches); (b)
both detailed investigations of individual languages and wider crosslinguistic
studies; (c)synchronic and diachronic analyses; (d) a broad coverage of topics
including structural and sociolinguistic issues, such as the development of mor-
phological complexity under different sociohistorical conditions (prominently,
language contact); (e) empirical evidence drawn from languages from all contin-
ents and belonging to a number of typologically diverse language families.
Unfortunately, the volume does not cover the complexity of word formation
and the complexity of sign language morphology. We hope that future research
will take care of these issues.
The volume, introduced by the present chapter, consists of three parts organ-
ized according to the chapters’ main focus and scope, and is closed by a discussion
in Chapter 13 by Östen Dahl on the volume’s contributions and on the minimum
description length approach. Part I includes five chapters dealing with issues of
morphological complexity from a language-specific perspective. Jeff Parker and
Andrea Sims’s Chapter 2, ‘Irregularity, paradigmatic layers, and the complexity of
inflection class systems: A study of Russian nouns’ follow Stump & Finkel’s (2013:
55) definition of complexity of an inflection class system as ‘the extent to which
the system inhibits motivated inferences about a lexeme’s full paradigm of realized
cells [ . . . ]’. Using data from Russian, the authors explore the implications of
gradient (ir)regularity for measuring and comparing the complexity of inflection
class systems. They find that some, but not all, less regular inflectional patterns
significantly increase the complexity of the system, but that the increased com-
plexity is mitigated by structural and distributional properties of the inflectional
system. In Chapter 3, ‘Demorphologization and deepening complexity in
Murrinhpatha’, John Mansfield and Rachel Nordlinger investigate diachronic
changes in the complexity of verb inflection in Murrinhpatha, a polysynthetic
non-Pama-Nyungan language of northern Australia, which displays a high level of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

16     

complexity in terms of unpredictable analogical relations in inﬂectional expo-

nence. The authors demonstrate that recent changes in inflection allomorphy blur
the boundaries of stem and affix, resulting in gradual demorphologization and
increasingly unpredictable exponence. Felicity Meakins and Sasha Wilmoth’s
Chapter 4, ‘Overabundance resulting from language contact: Complex cell-mates
in Gurindji Kriol’ examines the development of overabundance (see above) in the
subject-marking system of Gurindji Kriol, an Australian mixed language. By
means of generalized linear mixed models, which probabilistically measure the
use vs. non-use of a feature, the authors interpret the insurgence of overabundance
as an instance of complexification, providing a counterexample to the commonly
held view that contact always results in reduction of morphological complexity. In
Chapter 5, ‘Derivation and the morphological complexity of three French-based
creoles’, Fabiola Henri, Gregory Stump, and Delphine Tribout take a fresh look
at a controversial assumption in creole research, namely the widespread claim of
poverty of creole morphology (see references in section 1.1). Analysing deverbal
nominalizations via conversion in Mauritian, Guadeloupean, and Haitian, and
assessing the integrative complexity of the respective morphological systems’
derivational relations, the authors demonstrate that the complexity of the deriv-
ational relations in these creoles attains the same degree as those of the lexifier,
French. Finally, in Chapter 6, ‘Simplification and complexification in Wolof noun
morphology and morphosyntax’, Michele Loporcaro explores the diachronic
dynamics of morphological complexity in the nominal morphology and morpho-
syntax of Wolof, an Atlantic language of Senegal. Loporcaro shows that, while
changes such as the emergence of inflectional irregularities produced a local
increase in complexity in noun and determiner morphology, overall the morph-
ology of Wolof is less complex than that of closely related Atlantic languages.
Loporcaro provides an explanation of the simplifying tendencies in sociolinguistic
terms, referring to the correlation between simplification and prestige in the
Wolof speech community. Here, speaking correctly is associated with low-caste
in rural settings, while linguistic prestige is achieved through language mixing,
extensive borrowing, and, crucially, the simplification, via paradigmatic leveling,
of inherited alternations impacting on both the morphology and the morphosyn-
tax of the language.
Part II consists of three chapters approaching morphological complexity from a
crosslinguistic perspective. Johanna Nichols’s Chapter 7, ‘Canonical complexity’
considers not size but non-transparency the locus of morphological complexity
and adopts the notion of (non-)canonicity to define crosslinguistically comparable
variables, capture non-transparency, and restrict the comparanda to a manageable
sample. Francesca Di Garbo’s Chapter 8, ‘The complexity of grammatical gender
and language ecology’ is a crosslinguistic investigation of the evolution of gender
agreement patterns, which are viewed as an instance of morphological complexity,
and its ties to sociohistorical factors. Analysing a sample of thirty-six languages in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 17

a qualitative fashion, the author is able to establish association between multiple

patterns of change, such as loss, reduction, emergence, and expansion of gender,
on the one hand, and various sociohistorical situations, ranging from demo-
graphic structure (population size) to language policies and language attitudes,
on the other. In Chapter 9, ‘Morphological complexity, autonomy, and areality in
western Amazonia’, Adam Tallman and Pattie Epps investigate the relationship
between morphological complexity and areality-building processes across
Amazonia. The authors observe (a) morphological proliferation in four domains
(nominal classification, tense, evidentiality, and valency-adjusting mechanisms)
across unrelated western Amazonian languages; (b) high system complexity
across these domains; and (c) a link between complexity and language contact.
They conclude that factors often associated with morphological complexity are in
fact not necessarily morphological, as a large percentage of bound morphemes in
these languages display ambiguity between morphology and syntax.
The three chapters in Part III address the problem of morphological complexity
from an acquisitional perspective. In Chapter 10, ‘Radical analyticity as a diag-
nostic of adult acquisition’, John McWhorter proposes that languages can become
radically analytic, that is, completely or near-completely void of inflectional
morphology, only via incomplete acquisition. He draws evidence from West
Africa and Southeast Asia and shows that the relevant languages score more like
creoles than like older languages. In McWhorter’s view, second-language acquisi-
tion decisively reduces grammatical complexity (in terms of bound inflection) to a
degree that ordinary language change cannot. The author suggests that radical
analyticity can be treated as evidence that such second-language acquisition
occurred in the history of the language, and thus, synchronic morphological
complexity can serve as a clue to the past of a language, in the absence of historical
documentation. Also Chapter 11, ‘Different trajectories of morphological over-
specification and irregularity under imperfect language learning’ by Aleksandrs
Berdicevskis and Arturs Semenuks deals with imperfect language learning, partly
supporting McWhorter’s conclusion. By reference to the editors’ fourth question
(see section 1.1), the authors investigate how morphological complexity is related to
socioecological parameters. They run an iterated artificial language learning experi-
ment, tracing the change of two facets of complexity: overspecification and irregu-
larity. They find that the presence of imperfect learners in a transmission chain leads
to a much stronger decrease in morphological overspecification. Overspecification,
however, is not usually fully eliminated, and its partial decrease often leads to
increased irregularity, thus making languages simpler in one respect, but more
complex in another. Additionally, higher irregularity decreases learnability, and
this effect is stronger for imperfect learners compared to normal learners. Thus,
the relationships between these two facets of morphological complexity and language
learnability have their own complexities. Finally, Marianne Mithun’s Chapter 12,
‘Where is morphological complexity?’ is firmly anchored in the debate on the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

18     

psycholinguistic reality of complexity. Examining the speech of native speakers of

two North American languages influenced to varying degrees by contact with
English, Mithun observes that even native speakers with limited proficiency produce
morphological structures that are highly complex for the analyst, with large numbers
of morphemes per word, fusion, and irregularity. She argues that the distinction
between what linguists consider complex and what speakers find difficult (or easy) to
acquire or preserve, is not surprising if one takes the view that morphology in these
languages is not processed and learned online, but rather in chunks.
As we said, Östen Dahl closes the volume by critically reviewing the volume’s
chapters and seeing how the concepts of morphological complexity applied
therein relate to the ‘minimum description length approach’.
Turning now to the four research questions (section 1.1) the contributors to
this volume focused on, we observe that (question 1) it is possible to define
morphological complexity, even though the demarcation between morphology
and syntax is in many cases fuzzy (see Tallman & Epps, Chapter 9, this volume).
At the same time, however, we observe that different authors provide and apply
different definitions, also within this volume. Seemingly, the very existence of
multiple definitions of morphological (and morphosyntactic) complexity is
related not only to the collocation of a specific linguistic feature along the
grammar continuum (from pure morphology to morphosyntax), but also to the
diversity of phenomena and types of complexity. This observation leads us to
answer question 2, namely whether is it possible to arrive at a uniform notion of
morphological complexity. We concur with Dahl (Chapter 13, this volume), that a
set of shared notions and standard works that everybody refers to has not yet been
reached. Thus our answer to question 2 is no, and the motivation for it is that the
linguistic facts are so multifarious and diverse that not one, but many different
complexities can be detected (whence the plural in this chapter’s title).
Then we asked (question 3) whether it is possible to arrive at a crosslinguisti-
cally applicable and theoretically founded measure of morphological complexity.
Berdicevskis et al. (2018) have recently pointed to the absence of a gold standard.
We, too, have observed that there exists neither a commonly accepted definition of
morphological complexity nor a uniform measure thereof. Admittedly, the grow-
ing understanding of the multifaceted nature of morphological complexity is
much in line with the mutivariate nature of typological comparison. So, perhaps
we asked the wrong question. Probably, the quest for a unique measure is an
epistemological fallacy. Once we have acknowledged that there is not one mor-
phological complexity, but many morphological complexities, we should identify
a set of complementary specific measures to apply crosslinguistically. Then, the
only reasonable typological approach to morphological complexity is to break it
down into individual variables (if necessary, each with its quantitative measure)
and then look for mutual correlations between such variables or for their connec-
tions with other parameters of crosslinguistic variation. Of course, cumulative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 19

measures such as the one developed by Nichols (Chapter 7, this volume) are also
possible, but they are not holistic, either, and in many cases are based on a
signiﬁcant reduction of empirical data.
In conclusion (question 4), we wanted to investigate the role of such extra-
morphological factors as diachronic development and (in)stability, susceptibility
to loss vs. spread in situations of language contact, and, generally, of sociolinguis-
tic and socioecological parameters, in affecting morphological complexity. As
several chapters in this volume have demonstrated, in spite of at times diverging
results, the study of the correlation between morphological complexity and
extralinguistic factors such as the role of language contact or speakers’ sociolin-
guistic attitudes, is fruitful and promising.
Of course, the answers we have provided here are per force partial and by far not
deﬁnitive, as much more case studies and comparative evidence are necessary to
get to a reliable picture of such complex phenomena as morphological complex-
ities. We hope that future research will pursue these pathways.

Acknowledgements

The volume’s editors wish to thank the authors, the external reviewers, and our editors
at OUP. The support of the Swiss National Science Foundation (SNF CRSII1_160739)
is gratefully acknowledged. Besides that, we thank Aleksandrs Berdicevskis, Wolfgang
Dressler, Michele Loporcaro, and Franz Rainer for their insightful comments on a
preliminary version of this introductory chapter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

I
THE LANGUAGE-SPECIFIC
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

2
Irregularity, paradigmatic layers, and the
complexity of inﬂection class systems
A study of Russian nouns
Jeff Parker and Andrea D. Sims

2.1 Introduction

The extent to which morphological patterns are included in analyses of inﬂection

class systems tends to be strongly influenced by what is considered to be a ‘regular’
or ‘irregular’ pattern in a language. The number of classes and their definitional
properties reflect the assumptions and analytical choices of the investigator. Two
such choices are particularly notable. First, patterns that are reflected in few
lexemes or unproductive tend to be labeled as ‘irregular’ and considered to be
outside of the system. Second, where inflectional properties are correlated with
both affixal and non-affixal exponence (e.g., stress, stem alternations), the affix
tends to be treated descriptively and theoretically as the exponent of the proper-
ties, with non-affixal marking often treated as a kind of irregularity, or simply
ignored. Some approaches explicitly choose to focus only on regular affixal
patterns (e.g., Cameron-Faulkner & Carstairs-McCarthy (2000)). Others handle
stem alternations as phonological readjustments, denying them status as expo-
nents of morphosyntactic properties; see Halle (1994) for this idea as applied to
Russian nouns. Even within the Word and Paradigm framework, which explicitly
rejects the classical notion of the morpheme as a bundling of (affixal) form and
meaning (see Stump 2001: ch. 1 for an overview of arguments), linguists sometimes
ignore non-affixal dimensions in their analyses as a practical matter, showing how
deeply ingrained the privileged status of affixal patterns is in linguistics. For
example, in their study of inflection class system complexity, Ackerman and
Malouf (2013: 434f) acknowledge that the description of Greek nominal inflection
they adopt abstracts away from ‘many relevant complexities,’ including inflectional
stress.¹ (So does their description of Russian nominal inflection.)

¹ As another example, even PARSLI (PARadigm Shape and Lexicon Interface), which is designed
to explicitly represent non-canonical inﬂectional properties like stem change, defectiveness, overabund-

Jeff Parker and Andrea D. Sims, Irregularity, paradigmatic layers, and the complexity of inﬂection class systems:
A study of Russian nouns In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Jeff Parker and Andrea D. Sims.
DOI: 10.1093/oso/9780198861287.003.0002
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

24     . 

In this chapter, we explore the role that irregularity and non-affixal exponence
play in the complexity of inflection class systems.² Recent typological studies of
inflection class complexity have focused on the implicative structuring of inflec-
tion classes and the extent to which this structure is informative about the
exponence of inflected forms (Ackerman et al. 2009; Ackerman & Malouf 2013;
Blevins et al. 2017; Bonami & Beniamine 2015; Sims 2015; Sims & Parker 2016;
Stump & Finkel 2013). This is reflected in the way that Stump & Finkel define the
complexity of an inflection class system as ‘the extent to which the system inhibits
motivated inferences about a lexeme’s full paradigm of realized cells from subsets
of its cells’ (Stump & Finkel 2013: 55; emphasis ours). Throughout this chapter we
will assume a similar definition; see (1).

(1) Complexity of an inﬂection class system: the average extent to which the
system inhibits motivated inferences about the realized form of a lexeme,
given one or more other realized forms of the same lexeme.

We make this notion more precise and operationalize it as average conditional

entropy in section 2.5 below.
Implicative definitions of complexity as in (1) represent a step in the direction
of crosslinguistic comparison based on the internal structuring of inflectional
systems, rather than measures like the number of inflection classes or the size of
paradigms.³ The former is what Ackerman & Malouf (2013) call ‘Integrative’
complexity; the latter they call ‘Enumerative’ complexity. Integrative complexity
measures represent a productive development to the extent that they better reflect
the ways in which inflectional systems pose challenges for speakers.⁴ While it is
not clear to us that any particular notion of complexity within morphology will be
adequate for the variety of questions that morphology poses, the implicative-based
notion of complexity adopted here also has the potential to emerge as an

ance, etc., does not include non-segmental information like stress as a possible deviation from canonicity
(Walther 2017).

² Since inflection classes are an example of a purely morphological phenomenon, that is, not
syntactically relevant, this type of complexity seems to avoid the problematic questions about the
division between morphology and syntax (see discussion in Arkadiev & Gardani, Chapter 1, this
volume).
³ For a distinct but somewhat related notion, see the discussion of ‘relative’ and ‘absolute’ measures
of complexity in Miestamo (2008) inter alia. Miestamo’s discussion of relative approaches focuses on
psycholinguistic and acquisition-oriented approaches/evidence. While our information-theoretic
measures are not psycholinguistic in nature, they (and their use in previous work, for example,
Ackerman et al. 2009) could be classified as relative in terms of their focus on the potential ‘cost and
difficulty to language users’ (Miestamo 2008: 24). (See also discussion in Arkadiev & Gardani,
Chapter 1, this volume; Dahl, Chapter 13, this volume.)
⁴ See section 2.5 for some justification of this claim, and for defining inflection class complexity in
terms of the predictability of individual forms, rather than the lexeme’s class membership (i.e., its entire
paradigm of forms).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     25

important way to uncover crosslinguistic tendencies in the complexity of inﬂec-

tion class systems (see questions 2 and 3 in Arkadiev & Gardani, Chapter 1, this
volume).
At the same time, the fact that much previous work within such notions of
complexity has been based on descriptions of inflectional systems that include
only affixes, and sometimes only the most regular patterns, leaves it unclear
whether claims about limits on inflection class complexity (e.g., the Low
Conditional Entropy Conjecture, Ackerman & Malouf 2013) apply to all inflec-
tional patterns in a language or only those that are most regular. More generally, it
raises questions about how patterns that are typically excluded from consideration
interact with other elements in the system, and the role they play in determining
the complexity of inflection class systems. Brown & Hippisley (2012) are a notable
exception to this tendency to focus just on affixal exponence. We follow them in
using the term ‘paradigmatic layers’ (2012: 71) of exponence (or just ‘layers’ for
short) for dimensions of inflectional form (e.g., stress, suffixes, stem alternations)
that have their own, independent distributions but which jointly realize the
inflectional information of a word.
We use Russian nouns to investigate these issues. We consider how patterns
that are often excluded from consideration affect the complexity of the system and
how they are integrated into the implicative structure of the system. The core
questions that we ask are: How do interactions between component parts of the
Russian nominal inflection class system shape the complexity of that system as a
whole? In particular, are less-regular and non-affixal layers of exponence disrup-
tive to an inflectional system, disproportionately increasing its complexity? Or,
alternatively, is their disruptive potential mitigated by the way elements in the
system interact? Little work has compared implicative structuring within sub-
components of the lexicon—an issue that is potentially important for understand-
ing the internal structuring of inflectional systems. By looking at the inflectional
structure of Russian nouns in this way, we aim to promote a fuller understanding
of how inflectional organization determines the complexity of inflection class
systems. We do not assume that every language is alike, or that Russian is
representative. But we use Russian as a way to explore and illustrate the issues
involved.

2.2 Regularity, paradigmatic layers, and inﬂection classes

We focus on irregularity and non-afﬁxal layers of exponence because the repre-

sentation of a system can affect the assessment of its complexity. For example,
Sagot & Walther (2011) compare four descriptions of French verbs. The descrip-
tions range from a system with many classes and no lexically speciﬁed stem
allomorphy (139 classes) to lexically specifying all stem allomorphy (one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

26     . 

inflection class), with two other descriptions that split the burden of explanation
between the inflection class system and lexical specification. As they observe
(p. 42), it makes little sense to evaluate an inflectional system based only on the
morphological description and not what is lexically specified, since a morpho-
logical description can always be made simpler by positing more lexical specifi-
cation. They thus evaluate the analyses in terms of description length, including
both the morphological description and lexically specified information. Equating
the degree of complexity of the system with the length of its description, they show
that the complexity of the different analyses differs significantly; a description with
twenty classes and up to twelve lexically specified suppletive stems for some
lexemes results in the shortest length.⁵ The point here is that degree of complexity
is a property of a particular description of French verbs.⁶ This makes it particularly
important to examine and justify the description itself.
Stump & Finkel (2015) make a similar point along a different dimension of
description. They contrast two potential representations of the same set of English
verbs, one based on acoustics alone (what they call ‘hearer-oriented’) and one
based on structure known to a speaker that does not surface in the production of
forms (‘speaker-oriented’). For example, the exponence of the past participle(s) of
 and  are identical in a hearer-oriented representation, that is, /εnt/, but
a speaker knows that they contain different structure, that is, /εn-t/ vs. / εnd-t/.
Stump & Finkel show that the two representations exhibit differences in their
complexity based on various information-theoretic and set-theoretic measures.
(See also Bonami 2013 for similar issues with French verbs.)
Mansfield & Nordlinger (Chapter 3, this volume) also draw attention to how
systems are represented. Investigating Murinhpatha (non-Pama-Nyungan,
Northern Australia), they show that speakers have made analogical changes to
the verbal system which, surprisingly, do not lead to greater predictability among
allomorphs. They suggest that using existing measures of conditional entropy to
calculate the complexity of the system would be misrepresentative because verbs
in the language are a closed class with largely idiosyncratic exponence. The
exponence for the verbs is made up of intersecting formatives that are partially

⁵ See also Goldsmith (2001, 2011) for arguments for description length-based evaluation metrics in
morphological analysis.
⁶ In employing an evaluation metric based on description length, Sagot & Walther (2011) argue that
descriptions of shorter length (i.e., of less complexity in their sense) are more adequate. However, it is
not obvious to us that for a given inﬂectional system, the description with the shortest description
length should be taken to be the most adequate one. This is a question of the evaluation metric. For
instance, see Derwing (1990) for arguments against evaluation metrics based on economy of storage
(incl. minimum description length) and for metrics based on economy of processing speed and Dahl
(Chapter 13, this volume) for discussion on the relationship between Minimum Description Length
and other notions/metrics of complexity. It is not a foregone conclusion that a description that is most
cognitively realistic will be the description with the lowest estimated complexity in terms of either
description length or the implicative notion outlined in (1) above. This is a question for investigation,
but beyond the scope of the present work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     27

predictive of each other. If the exponents are represented as unanalysable wholes

(as they sometimes are in the literature), the subregularities among intersecting
formatives, which may help explain the analogical changes, are obscured.
Finally, Cotterell et al. (2019) note that the information-theoretic measure used
in, for example, Ackerman & Malouf (2013), is highly sensitive to the particular
descriptive analysis that is made of an inflectional system. They propose an
alternative measure of Integrative complexity in terms of joint entropy—a calcu-
lation based on the joint distribution over all cells, with complexity defined as the
entropy of the distribution.⁷ However, even if joint entropy is less sensitive to the
representation of the system, this does not eliminate the need to investigate how
analytic assumptions about that representation affect calculations of the complex-
ity of inflection class systems.
These studies highlight how the description of a system can affect calculations
of its complexity. Given that inclusion or exclusion of irregularity and non-affixal
exponence can substantially change the description of an inflection class system,
we should ask in what ways they affect the complexity of that system. It is beyond
the scope of this chapter to argue for one particular representation of Russian
nouns as being more adequate than another. But roughly similarly to the approach
of Sagot & Walther (2011), we explore the effect of different descriptions of the
Russian nominal inflection class system for estimates of its complexity.⁸

2.2.1 Regularity and inﬂection classes

It has long been known that high type frequency inflection classes create ana-
logical pressure on irregular patterns. When irregular patterns resist regulariza-
tion, the most common argument for their persistence despite analogical pressure
is that they are lexically stored, leaving them relatively impervious to regulariza-
tion. The typically high token frequency of such lexemes also makes lexical
specification psycholinguistically plausible. This and other evidence of lexical
storage is sometimes taken as a basis for treating irregulars as falling outside of
the grammatical system—in this case, the inflectional system.

⁷ Cotterell et al.’s work was presented at the Society for Computation in Linguistics just as we were
completing final revisions to this chapter, so did not have the opportunity to apply their joint entropy
metric to our data, nor to explore whether it produces estimates of system complexity that are less
dependent on the particular descriptive analysis that is made of an inflectional system. However, we see
this as a promising avenue for investigation.
⁸ Unlike Sagot & Walther (2011), we do not offer a formal analysis of Russian nouns, and make no
particular assumptions about what inflectional information is part of the grammatical system, and what
is lexically specified. However, like them, we include both regular, productive forms, and also ones that
analyses might treat as lexically specified. And of course, their paper and our chapter are similar in
investigating how different analytic assumptions affect assessments of the complexity of the inflection
class systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

28     . 

However, a categorical division into regular and irregular types has long been
recognized as problematic. First, the scope of a form’s irregularity can range from
having an exponent associated with a different class to having a fully suppletive
form. The extent to which a lexeme is irregular can also range from a single cell to
the majority of the paradigm. (See Corbett et al. 2001 for examples from Russian.)
Aside from the most extreme cases of suppletion, irregular lexemes exhibit
irregularity in only a subset of their paradigms’ cells. And even in suppletion,
stem distributions are often shared with regular patterns (Aski 1995; Bonami &
Boyé 2002; Hippisley et al. 2004; Boyé & Cabredo Hofherr 2006). Thus, even the
most irregular lexemes frequently overlap with regular ones and tend to exhibit at
least some degree of systematicity (Brown & Hippisley 2012). In fact, Brown &
Hippisley argue that ‘there is no hard-and-fast contrast between rules and lexical
specification. Rather, we must make a distinction between the rule on the one
hand and how the lexeme accesses that rule’ (p. 80). In their theory, Network
Morphology, rules are information held at nodes in an inheritance hierarchy. This
information is inherited ultimately by individual lexemes, defining their patterns
of inflectional exponence. However, lexemes may inherit information by default
or by direct specification of the node from which the lexeme should inherit. This
means that within their theory, regularity is defined in terms of how a lexeme
accesses a rule, and a single rule may represent regularity in some lexemes and
irregularity in others.
Second, speakers draw on their knowledge of irregular patterns when general-
izing to new lexemes (Bybee & Slobin 1982; Albright & Hayes 2002, 2003). Words
that are traditionally categorized as irregular play a crucial role in predicting how
speakers generalize morphological patterns to new words. Irregular inflectional
patterns can be more reliable in certain contexts (e.g., phonological neighbor-
hoods) than more regular patterns. Correspondingly, inflectional patterns that are
highly irregular can be extended. The athematic 1 marker -m in Common Slavic
spread from just a handful of verbs to become the dominant 1 marker in some
West and South Slavic languages (Janda 1994). Thus, even highly irregular
patterns can exhibit a degree of productivity.
Third, it is now generally accepted that both irregularly and regularly inflected
words are stored in the mental lexicon and leave traces in memory (Alegre &
Gordon 1999a; Baayen 2007 inter alia). Baayen et al. (2007), among many others,
find a surface frequency effect for regularly inflected words in a lexical decision
task even with low frequency lexemes. Starting with Taft (1979), such a frequency
effect has been widely interpreted as reflecting direct lexical storage of the forms,
rather than storage via component morphemes.⁹ Thus, showing that irregulars are

⁹ See Taft (2004) and Taft & Ardasinski (2006) for more recent, sceptical interpretations of surface
and base frequency effects. Models with different primitive assumptions about representational
structure also interpret surface frequency effects somewhat differently, for example connectionist
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     29

subject to lexical storage is not a sufficient basis on which to argue that irregular
items are not part of the system of inflectional patterns.
Evidence of this sort blurs the binary classification of inflectional patterns into
‘regular’ and ‘irregular’ types and undermines any concomitant claim that there is
a categorical distinction between patterns generated by the inflectional rule system
(and thus appropriately described in terms of inflection classes) and those that are
lexically-stored exceptions. Yet in the context of knowing that the description of
an inflection class system makes a big difference for calculations of its complexity,
analytic assumptions that place irregulars outside of the inflectional system are
pernicious because they preclude even asking important questions about how
irregulars interact with regulars and the consequences of this for the complexity of
the system.

2.2.2 Paradigmatic layers and inﬂection classes

Similar observations can be made about paradigmatic layers of inﬂection.

Linguistics has a deep-rooted tradition of thinking of words as combinations of
linearly (and perhaps hierarchically) ordered morphemes. As noted at the begin-
ning of the chapter, there is a philosophical preference for concatenative patterns
that manifests in a privileged status for affixes both descriptively and theoretically.
Nonetheless, different layers of exponence can exhibit distinct structural organ-
ization. For example, a subset of Russian nouns exhibits fixed stress on the ending
and has a stress retraction in the nominative plural, and also in accusative plural
when syncretic with nominative (’ ‘nail’ and  ‘lip’ in Table 2.1). This is
one of several morphosyntactically conditioned stress alternations in Russian
nouns (see Zaliznjak 1967 for a description of stress patterns; Brown et al. 1996
offers an overview in English). The alternations define a set of structured stress
classes that partly crosscut the suffix-based classes and form an inheritance
hierarchy that is distinct from the one defined by inflectional suffixes (Brown
et al. 1996).
The point here is that the stress and suffix patterns both are informative about
and conditioned by morphosyntactic values. For some classes, represented here by
 ‘lip’ and  ‘window’, stress placement is the only thing that distin-
guishes nominative/accusative plural from genitive singular. In practice, however,
virtually all analyses of Russian nominal inflection focus on classes as defined by
(regular) suffixal groups, even though inflectional stress exhibits its own,

models (Daugherty & Seidenberg 1994) and discriminative learning models (Baayen et al. 2011).
However, the important thing in the present context is that none of these models posit that irregular
and regular inﬂected forms are processed and stored in the mental lexicon in categorically different
ways (an idea put forward most famously by Prasada & Pinker (1993) and advocated for from a
neurolinguistic perspective by Ullman (2001, 2004), but now widely rejected).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

30     . 

Table 2.1. An example of morphosyntactically conditioned stress alternation in

Russian nouns

’ ‘nail’  ‘lip’  ‘window’

  gvozd’ gubá oknó

  gvozd’ gubú oknó
  gvozdjá gubý okná
  gvozdjú gubé oknú
  gvozdé gubé okné
  gvozdjóm gubój oknóm
  gvózdi gúby ókna
  gvózdi gúby ókna
  gvozdéj gúb ókоn
  gvozdjám gubám óknаm
  gvozdjáx gubáx óknаx
  gvozdjámi gubámi óknаmi

independent organization into classes. And this choice is rooted, ultimately, in

analytic assumptions of the linguist that give a privileged status to affixes in the
description of inflectional systems.
Another argument comes from the fact that layers of exponence may offer a full
picture of the organization and complexity of a system only when considered
jointly. Chiquihuitlán Mazatec (Oto-Manguean, Mexico) verbs are marked for
person and aspect by a combination of tones, final vowel, and stem formative
(Jamieson 1982). The uncertainty associated with predicting the tone, final vowel,
and stem formative for a paradigm cell in isolation is high. Moreover, knowing the
full paradigm for one of the layers of exponence (tone, final vowel, or stem
formative) does little to help predict the pattern for other layers of the same
lexeme (Ackerman & Malouf 2013: 448). However, the uncertainty associated
with predicting the exponence of any given cell knowing one other cell in the
paradigm is surprisingly low because each word form carries some information
about the possible tone, final vowel, and stem formative of other cells; there is
strong implicative structure between individual cells, which crosscuts the three
layers of inflectional exponence (see average conditional entropy in Ackerman &
Malouf 2013: 443). Similarly, Sims (2015: ch. 5) shows that the distribution of
genitive plural defectiveness in Greek nouns is predictable from the relationship
between affixal patterns and inflectional stress. When these layers of inflection are
taken together, the picture that emerges is that the genitive plural in some classes
is implicatively stranded in the paradigm, causing defectiveness.
This kind of evidence undercuts any attempt to exclude non-affixal paradig-
matic layers. In the Greek example, the paradigmatic layers reveal aspects of
inflectional organization that cannot be discerned from affixal structure alone.
The Mazatec example is similar with the addition that including all of the layers of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     31

inflection actually leads to less complexity than would be expected given each
layer independently. The inclusion of stress information in Russian nouns neces-
sitates a second, distinctly structured inheritance hierarchy. Ultimately, paradig-
matic layers can reveal organizational properties of inflectional systems that are
otherwise hidden. Thus, as with irregularity, analytic assumptions that exclude
non-affixal paradigmatic layers from consideration preclude important questions
about how elements in an inflectional system interact to determine its overall
complexity.

2.2.3 Interim Summary

In summary, estimates of the complexity of inﬂectional systems depend on the

representations of the systems under investigation. While there has been a ten-
dency to exclude irregular inflectional patterns and non-affixal layers of expo-
nence from these representations, doing so is not well justified on empirical or
theoretical grounds. Both irregulars and non-affixal layers have the potential to
reveal structural properties of the system that are otherwise obscured. The ques-
tion becomes whether a broader understanding of what belongs to ‘the system’
makes a difference for calculations of its complexity, and how.

2.3 Inﬂection class complexity

Inflection classes are a layer of structure that mediates between form and meaning,
without bearing meaning directly (they are morphomic in Aronoff’s 1994 terms),
and some languages do not have inflection classes, showing that classes are not
‘needed’. These observations have led to the idea that inflection classes create
unnecessary complexity in morphological systems and have raised the question of
whether there are limits on that complexity.
As noted in the introduction, the focus of this question has shifted away from a
notion of complexity defined in terms of absolute number of inflection classes/
exponents/cells and towards one that is rooted in implicative paradigmatic struc-
ture. Stump & Finkel (2013) define the complexity of an inflection class system as
‘the extent to which the system inhibits motivated inferences about a lexeme’s full
paradigm of realized cells from subsets of its cells’ (2013: 55; emphasis ours). When
defined in this way, the complexity of an inflection class system may, but need not,
be related to the absolute size of the system. Systems with a large number of
inflection classes and/or in which lexemes have a large number of paradigm cells
can exhibit low complexity if there is strong implicative structure within the
paradigm. Likewise, small inflectional systems can be highly complex if inflected
forms are not held together by strong implicative relations (Sims 2015: ch. 5).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

32     . 

Stump & Finkel operationalize their deﬁnition primarily in terms of set-theoretic

principal part sets—a set of realized cells from which a lexeme’s full inflection class
membership can be determined. The concept of a principal part set is, by its very
nature, concerned with implicative paradigmatic structure, giving a way to com-
pare the complexity of different inflection class systems. Somewhat similarly,
Ackerman et al. (2009) use information-theoretic tools to ask how much surprisal
is associated with the inflected form realizing one paradigm cell, given the form
associated with another cell, and define the complexity of an inflection class
system in terms of its average conditional entropy. Ackerman & Malouf (2013)
use the same information theoretic tools to compare the complexity of a set of
typologically diverse languages.
Stump & Finkel (2013) and Ackerman & Malouf (2013) both find that when
complexity is defined in terms of implicative structure, individual forms tend to be
predictable on average. In a survey of ten languages, Ackerman & Malouf calculate
the average conditional entropy associated with the realization of a set of mor-
phosyntactic values given knowledge of one other form of the same lexeme and
show that it is uniformly relatively low, despite diversity in the size of the
languages’ inflectional systems.¹⁰ They focus on the idea that implicative structure
allows even large systems to exhibit low average conditional entropy and present
their results as a typological tendency, the Low Conditional Entropy Conjecture:
‘enumerative morphological complexity is effectively unrestricted, as long as the
average conditional entropy, a measure of integrative complexity, is low’ (2013:
436). Stump & Finkel (2013: 215) offer a similar generalization in the form of the
Depth-of-Inference Contrast: ‘languages show a high degree of uniformity in
allowing a given form in a lexeme’s paradigm to be deduced from a low number
of dynamic principal parts (the average number being not much more than
one)’.¹¹ Thus, both find evidence that even inflectional systems that vary widely
in size tend to allow for well-motivated inferences when it comes to the task of
inferring one inflected form from another. The idea that inflectional systems must
maintain low complexity in this way is intuitive given that speakers must learn
inflection classes for them to persist. Also, speakers must be able to generalize
morphological patterns because not all inflected forms are attested even in large
corpora (Baayen 2001; Blevins et al. 2017), and the need to predict unknown
forms remains crucial throughout the lifespan (Bonami & Beniamine 2015).

¹⁰ However, at least for the languages that we are most familiar with (Russian, Greek), they base
their analyses on grammatical descriptions that exclude irregularities and non-affixal layers of expo-
nence. See Sims (2015: ch. 5) for a comparison between their analysis of Greek nouns and one based on
a more robust representation of the nominal system.
¹¹ In a dynamic principal parts analysis, the principal parts need not reflect the same morphosyn-
tactic properties from one inflection class to another. Stump & Finkel primarily differentiate this from a
static principal parts analysis, in which the set of principal parts is required to correspond to the same
morphosyntactic properties for all lexemes in a given syntactic category, and thus all inflection classes
within that category.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     33

At the same time, Stump & Finkel observe a difference in complexity between
predicting one inflected form and predicting class membership (i.e., all forms). In
contrast with the relatively uniform ease with which a single inflected form can be
deduced, ‘Languages vary widely in the number of dynamic principal parts they
require to distinguish a given I[nflection] C[lass]’ (Stump & Finkel 2013: 215).
Similarly, Ackerman & Malouf find greater crosslinguistic differences in average
declensional entropy (an unconditioned entropy measure of inflection class pre-
dictability) than in average conditional entropy (a conditional entropy measure of
inflected form predictability). This suggests that the complexity of an inflection
class system as a whole is not necessarily a direct product of the complexity of the
individual exponents. It is therefore important to investigate how the complexity
of the system as a whole relates to the complexity of the component elements of
the system.
A few steps have been taken in this direction. Sims & Parker (2016) find that
nine investigated inflection class systems show roughly similar degrees of overall
complexity, when calculated over pairs of forms using conditional entropy,
consistent with the Low Conditional Entropy Conjecture. Crucially, however,
they also show that implicative structure does very different amounts of ‘work’
in the languages to produce this result. In some languages, knowledge of one
inflected form is crucial to predicting another. In other languages, inflected forms
are independently fairly predictable, and knowledge of another form does little or
nothing to improve that predictability. Thus, paradigmatic implication is not
always an important determinant of the complexity of inflectional systems.
Additionally, based on data from Icelandic and French, Stump & Finkel (2013)
propose the Marginal Detraction Hypothesis: ‘[m]arginal I[nflection] C[lasse]s
tend to detract most strongly from the IC predictability of other ICs’ (p. 225).
Marginal classes here are defined as ones with few lexemes. The Marginal
Detraction Hypothesis thus asks whether the internal structure of inflection
class systems is homogeneous. The hypothesis is that the implicative structure
of low type frequency classes may differ from that the most frequent classes. (See
also Sims & Parker 2016 for a similar idea.) Related to this, Blevins et al. (2017)
argue that the Zipfian distribution of morphological patterns helps balance two
opposing pressures: the importance of predicting forms and the importance of
discriminating forms. Frequently occurring patterns facilitate prediction.
Suppletive patterns, which are likely to belong to low type frequency classes,
may detract from predictability but at the same time have benefits like being
highly discriminative. Both types of patterns contribute, in different ways, to
ensuring the patterns in the language are usable by speakers.
Together these studies explore the idea that competing pressures may lead
different components of inflectional systems to exhibit different properties. They
also suggest that if there is a strong crosslinguistic tendency for languages to
exhibit low inflection class complexity, this both results from and occurs despite
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

34     . 

structural aspects of inﬂectional systems. But so far there is little understanding of

how the elements of inﬂectional systems interact to determine inﬂection class
complexity, so further work is needed in this area, especially work comparing
implicative structuring within subcomponents of the lexicon.

2.4 Russian nouns

We now turn to the example of Russian nouns. Our approach to investigating

Russian is to divide inflectional exponence into its subcomponents and to inves-
tigate the effect of each on the complexity of the inflection class system. We do this
in two ways. First, starting with a baseline description of the Russian nominal
system that consists only of classes as defined by inflectional suffixes, we add in
further information about exponence—additional paradigmatic layers—and look
at the effect of this on the complexity of the inflection class system (section 2.6).
Second, to look more directly at irregularity, we classify the individual exponents
within each paradigmatic layer as regular or irregular. We then investigate the
extent to which this (ir)regularity contributes to the complexity of the inflection
class system (section 2.7). This idea is conceptually close to the Marginal
Detraction Hypothesis, given the close connection between the irregularity and
type frequency of inflection classes. However, quantifying the regularity of inflec-
tion classes’ layers directly allows us to take a closer look at whether layers are
making distinct contributions to the complexity of the system as a whole. But first,
in this section we describe the data sets that we work with.
Various proposals have been made regarding the number of Russian noun
classes. The four-class system of Corbett (1982), shown in Table 2.2, is a typical
representation of the Russian nominal system, but it is also coarse-grained. It may
be an appropriate basis for some kinds of linguistic investigation but questions of
inflection class complexity benefit from a more granular representation. We
therefore consider a fuller set of suffixal patterns and three additional layers of
inflectional exponence. Here we are interested in how different aspects of inflec-
tional exponence affect the complexity of a system without making any claims
about which granularity is the ‘right’ or ‘best’ representation (cf. the earlier
discussion of Sagot & Walther 2011).
Suffixes constitute one layer of exponence. In addition to the four suffix sets
illustrated in Table 2.2, we consider ten other patterns of suffixes:

1. Indeclinable nouns, for example,  ‘(movie) theater’;

2. Neuter nouns like  ‘time’. In the plural these behave like Class IV
nouns. In the singular they have an -a in the nominative (like Class II) and
accusative, -i in the genitive, locative and dative (like Class III nouns), and
-om in the instrumental (like Class I nouns);
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     35

3. Nouns that belong to Class I except that they have a null genitive plural, for
example, : raz ‘time..’;
4. Nouns that belong to Class I except that they have -a in the nominative
plural, for example, : goroda ‘city..’;
5. Nouns that belong to Class IV, except for -ov in the genitive plural, for
example, : oblakov ‘cloud..’;
6. Nouns that belong to Class IV, but with nominative plural -i, for example,
: jabloki ‘apple..’;
7. Nouns that belong to Class II except they have an overt genitive plural, for
example, : rasprej ‘strife..’;
8. Nouns that belong to Class IV, but have a nominative plural -i and genitive
plural -ov, for example, č: očki ‘point..’ and očkov ‘point..’;
9. Nouns that belong to Class I, but have a nominative plural -e and a null
genitive plural, for example, ’: krest’jane ‘peasant..’ and
krest’jan ‘peasant..’;
10. Nouns that belong to Class I but have a nominative plural in -a and a null
genitive plural, for example, ̈: teljata ‘calf..’ and teljat ‘calf.
.’.¹²
Table 2.2. Illustration of the four-class system, based on inﬂectional sufﬁxes

I II III IV
 ‘law’  ‘map’ ’* ‘bone’  ‘place’

  zakon karta kost’ mesto

  zakon kartu kost’ mesto
  zakona karty kosti mesta
  zakone karte kosti meste
  zakonu karte kosti mestu
  zakonom kartoj kost’ju mestom
  zakony karty kosti mesta
  zakony karty kosti mesta
  zakonov kart kostej mest
  zakonax kartax kostjax mestax
  zakonam kartam Kostjam mestam
  zakonami kartami Kostjami mestami

Note:
* Here and throughout the chapter we use scientiﬁc transliteration, rather than transcription. This is
a convenience that accommodates Russian speakers and makes it easier to check the examples in
a dictionary (because the spelling is maintained). However, the transliteration is sometimes
misleading with regard to the phonological (or morphological) shape of words. Although it is not
clear in the transliteration of this example, the stem-ﬁnal consonant cluster in ’ is [sjtj] throughout
the paradigm (e.g., nominative singular [kosjtj], genitive singular [kosjtj-i], instrumental singular
[kosjtj-ju]).

¹² Nouns like ’ and ̈ also exhibit changes in their stems. See discussion below.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

36     . 

A second layer of exponence consists of stem distributions. Here we deﬁne a

stem as the segmental material left when the suffix sets discussed above are
removed from inflected forms (as in, e.g., Aronoff 1994: 31). In total, 80.8% of
lexemes have a consistent stem throughout the paradigm (data from Zaliznjak
1977). In addition to this, we consider five types of stem change that are mor-
phologically patterned:

1. Vowel-zero alternation in the nominative singular (and accusative singular

when syncretic) but not elsewhere, for example, ’: den’ ‘day..’
~ dnja ‘day..’;
2. Vowel-zero alternation in the genitive plural (and accusative plural when
syncretic) but not elsewhere, for example, ’: pis’mo ‘letter..’
~ pisem ‘letter..’;
3. A stem extension -in in the singular, for example, ’: krestjanin
‘peasant..’ ~ krestjane ‘peasant..’;
4. A stem extension -en in all forms but the nominative and accusative
singular, for example, : vremja ‘time..’ ~ vremeni ‘time..
’;
5. Extensions -ёnok in singular forms and -jat in plural forms, for example,
̈: telёnok ‘calf..’ ~ teljata ‘calf..’.

A third layer of exponence is stress. In total, 91.6% of nouns have consistent

stem stress throughout the paradigm, and an additional 6.1% have
consistent stress on the inflectional suffix throughout the paradigm (data from
Zaliznjak 1977, reported in Brown et al. 1996).¹³ The remaining nouns have some
type of stress shift. While they represent only a small percentage of total types,
they tend to be among the words with the highest token frequency. Stress
alternations fall into six patterns, shown in Table 2.3. With one exception, the
shift is between the first syllable of the stem and the inflectional ending:

1. Two patterns involving a shift according to number, for example, 

‘place’ and ̌ ‘number’;
2. Fixed stress on the inﬂectional ending, but with stem-initial stress in
nominative plural (and accusative plural when syncretic), for example,
 ‘lip’;

¹³ Russian nouns usually have zero exponence in either the nominative singular or genitive plural,
depending on class; see Table 2.2. When a form has no overt inflectional suffix in a given paradigm cell,
lexemes that otherwise would have stress on the suffix have stress on the last syllable of the stem instead
(see  in Table 2.3).
Table 2.3. Illustration of stress classes of Russian nouns

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 ‘place’ ̌ ‘number’  ‘lip’  ‘beard’  ‘portion’ ̌ ‘soul’

  mésto čisló gubá borodá dólja dušá

  mésto čisló gubú bórodu dólju dúšu
  mésta čislá gubý borodý dóli duší
  méstu čislú gubé borodé dóle dušé
  méste čislé gubé borodé dóle dušé
  méstom čislóm gubój borodój dólej dušój
  mestá čísla gúby bórody dóli dúši
  mestá čísla gúby bórody dóli dúši
  mést čísel gúb boród doléj dúš
  mestám číslam gubám borodám doljám dúšam
  mestáx číslax gubáx borodáx doljáx dúšax
  mestámi číslami gubámi borodámi doljámi dúšami
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

38     . 

3. Fixed stress on the inﬂectional ending, but with stem-initial stress in both
nominative plural (and accusative plural when syncretic) and accusative
singular, for example,  ‘beard’;
4. Two patterns that combine a shift according to number with retraction in
the nominative plural (and accusative plural when syncretic) and accusative
singular, for example,  ‘portion’ and š ‘soul’.¹⁴

The fourth and ﬁnal layer of exponence reﬂects patterns of defectiveness.¹⁵ In

total, 97.7% of nouns have a form for each cell in the paradigm (data from
Zaliznjak 1977). However, some lack forms for a subset of the paradigm. Most
of these are singularia or pluralia tantum nouns, for example, ̌ ‘pants/
trousers’ has no singular forms. Russian also has a well-known pattern of genitive
plural defectiveness that affects a few dozen nouns, for example,  ‘reward’
has no genitive plural, and a handful of (diminutive) nouns occur in only the
nominative and accusative singular, for example, razok ‘time.’.
Within each layer of exponence we do not include patterns that are represented
in only one lexeme, nor do we include alternate patterns of stress. However, many
lexemes in our data are nonetheless unique in their morphological exponence
because they exhibit a unique combination of layers. For example, 
‘lord/sir’ has a stem extension in the singular like ’ ‘peasant’ but it has
the same set of sufﬁxes and stress pattern as  ‘city’. It is the only lexeme to
exhibit this particular combination of patterns.
We also abstract away from properties that are not related to inﬂection class
membership. Some lexemes exhibit the same exponence but are not identical in
other morphosyntactically-relevant traits like gender and animacy. For example,
’ ‘drunkard’ and š ‘girl’ have the same pattern of exponence but

¹⁴ Due to the stress shift between singular and plural, the distribution of the retraction of stress onto
the stem is ambiguous. Nouns like  are consistent with stress shift in both nominative plural and
accusative singular, but since there is stem stress throughout the singular, the accusative singular is
ambiguous. Conversely, nouns like š are also consistent with both stress shifts, but since there is
stem stress throughout the plural, the nominative plural is ambiguous. Except for ambiguous instances
of this sort, accusative singular stress retraction never occurs unless nominative plural stress retraction
also does, so it seems safe to analyse š as having both stress retractions, with the nominative plural
one being opaque. The proper analysis of  is less clear. Stress retraction in the accusative singular
happens (unambiguously) only in nouns with the Class II suffix pattern. While  belongs to this
class, other nouns with the same stress pattern do not (e.g.,  ‘tooth’ (Class I), šč’ ‘city square’
(Class III)). An alternative possibility is therefore to analyse these nouns as having only the nominative
(and accusative) plural stress retraction, since it occurs in combination with a wider range of stem
classes. We do not have a firm opinion about which analysis is ultimately the right one, or even whether
speakers themselves make only one or the other analysis. But it also makes no difference in the present
context. Since our analysis of implicative relations in the following section is based on surface patterns,
all six patterns in Table 2.3 are treated as distinct in the analysis.
¹⁵ Walther (2017) distinguishes between ‘deficient’ and ‘defective’ lexemes where the former are
lexemes for which a speaker could determine what forms would fill the cells but does not use those
forms, and the latter are lexemes for which there is uncertainty about which form would fill missing
cells. We include both types of lexemes in our category of defectiveness.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     39

the former is masculine while the latter is feminine. They are treated in our
analyses as belonging to the same class since gender is not expressed inflectionally
in nouns. We also abstract away from predictable phonologically-conditioned
variation and predictable semantically-conditioned variation. For example, vowels
reduce when not stressed, but given information about stress, vowel quality is fully
predictable and purely phonological. Thus, we abstract away from vowel reduc-
tion in our class representations. Some genitive plural forms have no overt
exponent, for example, kart ‘map..’, and others have an overt suffix, for
example, zakon-ov ‘law-.’ and učitel-ej ‘teacher-.’. Whether a lexeme
has a zero genitive plural form or an overt ending is morpholexically conditioned
and thus depends on its inflection class, so we include this distribution in our
description. However, which of the two overt exponents will occur is fully
predictable from the phonology of the stem: -ej occurs with morphologically
soft stems and -ov occurs elsewhere (Timberlake 2004: 84–5).¹⁶ Thus, we represent
-ov and -ej as a single exponent. Similarly, we do not include differences in
accusative marking that are predictable based on animacy (see Corbett & Fraser
1993: 129–30 for justification).¹⁷ Thus, our analysis reflects only information
about exponence that is directly a property of inflection class membership.
See Parker (2016) for a more complete description of the patterns and para-
digmatic layers of Russian nouns.

2.5 Quantifying complexity

We adopt a deﬁnition of complexity rooted in the predictability of individual

forms, rather than entire classes, because it reﬂects a type of unpredictability
speakers must overcome to use an inﬂectional system (Ackerman et al. 2009).
When speakers need to express a combination of lexeme and grammatical

¹⁶ In Russian it is necessary to distinguish phonological softness (secondary palatalization) and

morphological softness. The phonological softness of consonants is relevant to phonological processes,
for example, conditioning of unstressed vowel reduction. Morphological softness is relevant to allo-
morph selection in genitive plural. In Russian, consonants that are pronounced with secondary
palatalization are soft both phonologically and morphologically. Most of the consonants that are
pronounced without secondary palatalization are hard both phonologically and morphologically.
However, there are six consonants (traditionally called the ‘unpaired’ consonants) that fall outside of
this system in various ways. Three of them differ in softness depending on the level of structure. The
consonant /j/ is phonologically soft (it conditions unstressed vowel reduction in the same way as other
soft consonants) but morphologically hard (stem-finally it conditions genitive plural -ov, like other
hard consonants). Conversely, the consonants /ʃ/ and /ʒ/ are phonologically hard but morphologically
soft (stem-finally they condition genitive plural -ej, like other soft consonants). However, the behaviour
of these three phonemes is the same across all inflection classes, so we still consider this to be
predictable phonological conditioning.
¹⁷ The analysis/number of classes in this chapter differs from that in Parker (2016) and Sims &
Parker (2016). This primarily reflects the fact that the earlier work did not abstract away from animacy-
conditioned exponence in accusative.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

40     . 

properties, the predictability of the corresponding individual form is more rele-

vant than the predictability of that lexeme’s class membership—its entire para-
digm of forms—for the simple reason that speakers only ever need to produce one
inflected form at a time. Moreover, as noted above, recent work suggests that
individual form predictability is a relevant level of generalization for statements
about the complexity of inflection class organization crosslinguistically
(Ackerman & Malouf 2013).
Our definition of inflection class complexity is repeated as (2).

(2) Complexity of an inﬂection class system: the average extent to which the
system inhibits motivated inferences about the realized form of a lexeme,
given one or more other realized forms of the same lexeme.

We operationalize this deﬁnition using information-theoretic tools. We use

conditional entropy to estimate the complexity of the system and use the (non-
conditioned) entropy of the system to estimate the potential complexity of the
system.¹⁸ The potential complexity of an inflection class system is the amount of
complexity it would exhibit if the exponents of the various paradigm cells of a
lexeme were logically independent of each other, since this would maximally
inhibit motivated inferences. A key question is the extent to which the actual
complexity of an inflection class system is lower than its potential complexity,
since the difference between these reflects the ‘work’ done by inflectional structure
to minimize the complexity of the system.
Entropy represents the average surprisal associated with the outcome of a
random variable A. In the context of inflectional systems, A is a paradigm cell
(or more accurately, a set of morphosyntactic properties) and the possible out-
comes are the different exponents that realize that cell in each class. Thus, entropy
represents the average surprisal associated with the exponents of a given morpho-
syntactic property set.

(3) Entropy
X
HðAÞ ¼ pðaÞlog2 pðaÞ
a∈A

¹⁸ We recognize that these measures do not capture all aspects of a system’s complexity, especially
because they are limited to comparisons between individual cells (as opposed to larger subsets of the
paradigm). See, for example, Stump & Finkel (2013) and Bonami & Beniamine (2015) for investigations
that consider complexity based on predictiveness/predictability of multiple paradigm cells. Expanding
the current work to take account of paradigm structuring would be valuable. However, our focus here is
on comparing across different descriptions of the Russian nominal system, and the importance of the
description for estimates of inﬂection class complexity. A simple measure gives us the best perspective
on this issue.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     41

Conditional entropy H(AjB) represents the average surprisal associated with the
outcome of a random variable A, given knowledge of the outcome of another
random variable B. In the present context, A and B are paradigm cells in which
A ≠ B. Implicitly conditioned on the lexeme, the outcomes of A and B are two
inﬂected forms of the same lexeme. Conditional entropy thus represents the
average surprisal associated with the exponent that realizes a given morphosyn-
tactic property set, knowing the exponence of another inﬂected form of the same
lexeme.

(4) Conditional Entropy

X pðbÞ
HðAjBÞ ¼ pðb; aÞlog2
a∈A;b∈B
pðb; aÞ

Averaging across the entropy values H(A) for all licensed morphosyntactic
property sets produces an estimate of the potential complexity of the system as a
whole. This mean entropy value represents the average uncertainty associated
with predicting the exponent of a paradigm cell knowing only the possible
exponents that realize that cell in different classes. Exponents of different mor-
phosyntactic property sets are thus treated as independent of each other. By
comparison, averaging across the conditional entropy values H(AjB) of all
licensed combinations of morphosyntactic property sets A and B produces an
estimate of the complexity of the inﬂectional system as a whole, taking into
account implicative relations holding between pairs of cells. This represents the
uncertainty associated with a given cell of a lexeme knowing the exponence of one
other cell of the same lexeme.
The conditional entropy H(AjB) will never be higher than the entropy H(A)
and will be lower whenever the exponent that realizes B is informative about the
exponent that realizes A. Knowing one form of a lexeme cannot increase the
surprisal associated with another form, but it can lower it. The extent to which
knowing one cell reduces the uncertainty associated with another cell (the differ-
ence between entropy and conditional entropy) represents how much ‘work’ is
being done by the implicative structure of the system.

2.6 Granularity and system complexity

We now turn to the primary questions of this chapter, starting with: To what
extent does including more paradigmatic layers into the system affect its com-
plexity? Our approach is to develop multiple parallel descriptions of Russian
nominal inﬂectional structure based on the paradigmatic layers. Each description
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

42     . 

is based on the same set of lexemes but the lexemes are distributed across classes
differently depending on which layers are included in the analysis. This allows us
to investigate how paradigmatic layers interact, and speciﬁcally, how those inter-
actions inﬂuence the complexity of the system as a whole.

2.6.1 Granularity of inﬂection class information

We determined the number of distinct patterns that result from combinations of

paradigmatic layers. We took each morphological noun in an exhaustive gram-
matical dictionary of Russian, Zaliznjak (1977), and created multiple parallel
representations of the system by including increasingly more paradigmatic layers.
Each representation of the system includes the same 43,486 lexemes distributed
among the number of distinct patterns/classes that arise based on the layers
considered. In general, as more layers are combined, more classes are needed to
describe Russian nominal inflection. In Table 2.4 we provide the number of classes
that result when suffix sets are considered independently and in combination with
one, two or three additional paradigmatic layers. Note that even the least granular
representation here exhibits more classes than the traditional four classes argued
for in Corbett (1982) and used in other complexity studies where Russian nouns
were considered (e.g., Ackerman & Malouf 2013).
We will refer to the different parallel descriptions as ‘granularities’. In Figure 2.1
we show the distribution of word types per inflection class in each of the
granularities presented in Table 2.4. The distribution of lexemes across classes is
roughly exponential in every granularity, resulting in a more or less linear trend
when displayed in log space (Figure 2.1). In other words, there are many lexemes
in a small number of classes and few lexemes in many classes. This is not
surprising; distributions of this sort are ubiquitous among frequency counts in

Table 2.4. Number of nominal inﬂection classes of Russian nouns as a function of

which paradigmatic layers are included

Number of classes Sufﬁxes Stem changes Stress Defectiveness

14 +
21 + +
22 + +
33 + + +
42 + +
57 + + +
64 + + +
82 + + + +
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     43

8 14 inflection classes 8 21 inflection classes

6 6
4 4
2 2
0 0

8 22 inflection classes 8 33 inflection classes

6 6
Log Type frequency

4 4
2 2
0 0

8 42 inflection classes 8 57 inflection classes

6 6
4 4
2 2
0 0

8 64 inflection classes 8 82 inflection classes

6 6
4 4
2 2
0 0
Inflection Classes

Figure 2.1. Word types per inﬂection class across different granularities

natural languages, including word frequencies (see Baayen 2001 for detailed
discussion).

2.6.2 Paradigmatic layers and inﬂection class complexity

To assess how the complexity of the system changes with granularity, we calcu-
lated the mean entropy (= estimated potential complexity) and mean conditional
entropy (= estimated actual complexity) of each representation of the system
presented in Table 2.4. In light of the type frequency distribution of classes
shown in Figure 2.1, we calculated mean conditional entropy both with and
without type frequency weighting. In the weighted condition, the probabilities
of each exponent were weighted by the type frequency of the exponent. This
measure represents the complexity of the system when both implicative structure
and the uneven distribution of lexemes across classes are taken into account.
Figure 2.2 shows that as granularity increases, and more paradigmatic layers are
included in the system, the entropy and unweighted conditional entropy of the
system tend to increase. This is unsurprising from the perspective of information
theory—as more elements are present in the system, there will be greater surprisal
associated with those elements on average. More interestingly, the weighted
conditional entropy values remain low regardless of inﬂection class granularity;
the weighted conditional entropy only increases 0.12 bits from a representation of
the system that includes only sufﬁxes (fourteen classes) to one with all
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

44     . 

Entropy (unweighted)
2.5 Conditional Entropy (unweighted)
Weighted Conditional Entropy
Complexity Measures in Bits

2.0

1.5

1.0

0.5

0.0
14 21 22 33 42 57 64 82
Number of Classes

Figure 2.2. Complexity measures across granularities of Russian nouns

paradigmatic layers together (eighty-two classes). This means that the uncertainty
associated with a large number of classes is mitigated by a combination of the
implicative structure of the system and the unequal distribution of lexemes across
classes. Implicative structure and the distribution of lexemes across classes con-
spire to maintain low systemic complexity.
However, even a random distribution of exponents will tend to produce a
system with lower mean conditional entropy than mean entropy, because some
of the exponents will be accidentally informative about other exponents. Thus, we
should ask whether the implicative structure of the system minimizes the com-
plexity of the inﬂection class system in each granularity more than is expected by
chance. Employing Monte Carlo simulation, we created a hundred simulated data
sets for each granularity. In each granularity the simulated data sets contained the
same exponents and the same number of classes as in the real granularity, but the
exponents were randomly distributed across the classes.¹⁹ The mean conditional
entropy of the simulated data sets represent the amount of complexity we expect
in systems of this size based on a random distribution of exponents. If the actual
complexity falls outside of the simulated values, we can conclude that the ‘work’
done by the implicative structure in that granularity is signiﬁcant at a level of
p<0.01.
As can be seen in Figure 2.3, in every granularity the actual mean conditional
entropy of the system is lower than that of all of the simulated data sets, and as the

¹⁹ Here we calculate the mean conditional entropy of the system without weighting classes by type
frequency. Weighting classes equally approximates a random distribution of lexemes across classes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     45

2.5
Simulated min/mean/max
Actual mean
2.0
Conditional Entropy in Bits

1.5

1.0

0.5

0.0
14 21 22 33 42 57 64 82
Number of Classes

Figure 2.3. Conditional entropy of real and a hundred Monte Carlo simulations of
Russian nouns across granularities; for the ‘simulated’ series, vertical bars indicate
maximum and minimum values

granularity of inﬂectional information increases, the difference between the simu-

lated mean conditional entropy and the actual mean conditional entropy of the
system increases. This suggests that the implicative structure of the system does an
increasing amount of work to minimize the complexity of the system as granu-
larity increases. This contrasts with the findings of Ackerman & Malouf who
suggest the ‘there is no need for such systems [as Russian] to rely on implicative
organization’ (2013: 451). Remember that they focus on a limited number of
classes based on affixes alone. The importance of implicative organization (and
uneven distribution of lexemes) is apparent when a more granular representation
of the system is used, suggesting that Ackerman & Malouf’s claim about Russian is
an artefact of their choice of representation.
At the same time, as an anonymous reviewer noted, if these authors are right
about systemic organization being an emergent property that reflects pressures
related to the need to be able to predict inflected forms, then more data should
strengthen this core claim. And, indeed, it does. Our data are consistent with the
claim that the predictability of individual inflected forms given some knowledge of
a lexeme is relatively high across languages, that is, the Low Conditional Entropy
Conjecture (Ackerman & Malouf 2013) and the Depth of Inference Contrast
(Stump & Finkel 2013). Given that the difference between the potential complex-
ity and actual complexity is greatest in the presence of all layers, we are inclined to
interpret even the highest mean conditional entropy values in Figure 2.2 as
support for the idea that low relative complexity is an emergent property of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

46     . 

inﬂectional systems. By showing that the basic claim is robust to assumptions

about the structure of the data (in Russian), the present analysis strengthens
support for it.
In summary, our ﬁndings are consistent with the idea that the sum complexity
of the system is not simply determined by the complexity its parts. The interaction
between the paradigmatic layers leads to lower than expected complexity despite
the fact that each layer adds elements to the system. As more elements are
included in the system, we expect its complexity to increase; however, the impli-
cative structure among the paradigmatic layers (and the unequal distribution of
lexemes across classes) leads the system to maintain relatively low complexity
despite its size.

2.7 Regularity and system complexity

We now turn to our ﬁnal question about complexity and regularity: Do regular
and irregular classes contribute similarly to the overall complexity of the inﬂec-
tional system?

2.7.1 Deﬁning class (ir)regularity

To investigate how irregularity affects the complexity of the system, we took the
most granular representation (eighty-two classes) and within each class classified
the pattern found in each layer of exponence as regular or irregular.²⁰ We assume
any effects of irregularity will be most evident in the most granular representation
of the system.
We base our definition of (ir)regularity on the type frequency of particular
patterns within the Russian nominal system.²¹ For instance, since the large
majority of Russian nouns (80.8%) do not exhibit stem alternation, we define

²⁰ We originally classified irregularity within each layer on an ordered non-binary scale; however,
due to how few types occur at some points of the scale, we were forced to adopt a binary classification to
avoid data sparsity.
²¹ An anonymous reviewer asked what justifies only consistent stems or fixed stress being counted as
regular, noting that stem alternations or variable stress can be considered regular if either is the most
frequent pattern in a language or class. We agree that stem alternations or variable stress can be regular
in a language (see discussion in Sagot & Walther 2011: 5–7 for examples); however, neither are regular
in Russian nouns, which is evident given the large percentage of nouns that exhibit fixed stress and do
not have stem alternations. One might also argue that, for example, stem and/or stress alternation
should be considered regular because they are the most frequent pattern within a particular affixal class.
This is true of some Russian nouns, for example, all nouns with the affixal pattern exemplified by
 ‘time’ (see section 2.4) have both stem extensions and stress alternations. However, defining
regularity in this way relies on a privileged status for affixal exponence—a position we reject (see
discussion in section 2.1). Furthermore, our notion of granularity is based on the idea that affixal
and non-affixal exponence co-determine the number of inflection classes. When both affixal and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     47

non-alternation (i.e., a consistent stem throughout the paradigm) as regular.

Likewise, since the large majority of Russian nouns (97.7%) do not exhibit stress
alternations, we define fixed stress as regular. For each of the layers, the following
were classified as regular (with all other patterns being classified as irregular):

• Sufﬁxes: the four sufﬁx sets in Table 2.2.

• Stem: A consistent stem throughout the paradigm.
• Stress: Fixed stress throughout the paradigm, whether on the stem or ending.
• Defectiveness: A form in each cell of the paradigm, that is, no defectiveness.

When classified in this way, the majority of Russian nouns exhibit no irregu-
larity (33,144 lexemes in eight classes); many exhibit some type of irregularity in
one layer (9,709 lexemes in forty-three classes); some exhibit irregularity in two
layers (607 lexemes in twenty-five classes); and a few exhibit irregularity in three
layers (twenty-six lexemes in six classes). No lexemes exhibit irregularity in all
four layers, at least partly because irregularity in one layer can limit the possibility
of irregularity in another, for example, defectiveness in the singular or plural makes
stress shift between numbers impossible. Thus, the majority of lexemes are fully
regular, but the majority of classes have irregularity in some subset of their layers.
Importantly in the present context, this frames the question of inflection class
complexity as one having to do, in part, with the extent to which the large number
of classes that exhibit some degree of irregularity detract from the predictability of
the small number of high type frequency classes that exhibit no irregularity. Do the
many lower type frequency irregular classes contribute disproportionately to the
complexity of the whole inflection class system in Russian?

2.7.2 Regularity and system complexity

We calculated the complexity of the system (mean conditional entropy) and

compared it to the complexity of the system with a single class removed from
the data, iterating this process over all classes. The difference between these two
measures (entropy difference) represents the unique contribution of the removed
class to the complexity of the system. We then performed multiple linear regres-
sion with the irregularity of each layer of the removed class as independent
variables and entropy difference as the dependent variable.²² If irregular patterns

non-afﬁxal exponence determine classes, all lexemes of a single class exhibit the exact same patterns,
making any attempt at class-speciﬁc determinations of regularity meaningless.

²² We found no signiﬁcant interactions between layers. We also ran the same model with class type
frequency as an additional independent variable; class type frequency was not signiﬁcant in the model.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

48     . 

disproportionately increase the complexity of the system, we expect a positive

correlation between the change in complexity of the system (entropy difference)
and irregularity.
Figure 2.4 shows that not every paradigmatic layer contributes equally to the
complexity of the system as a whole. With all four independent variables included
the model is significant (F (4.77) = 4.386, p = 0.003, Adjusted R² = 0.14). Among
the independent variables, the irregularity associated with defectiveness and stress
exhibit significant positive correlations with the change in complexity of the
system (p = 0.01 and p < 0.001, respectively) whereas the irregularity of suffixes
and stems do not (p >.05). Figure 2.4 shows the effect size of each independent
variable when others are kept constant. Irregular patterns of defectiveness and
irregular patterns of stress thus increase the complexity of the system, but
irregular patterns of suffixes and stems do not. Irregularity does not inherently
make the system more complex; only some types of irregularity do.

0.007 0.007
0.006 0.006
0.005 0.005
Entropy Difference

Entropy Difference

0.004 0.004
0.003 0.003
0.002 0.002
0.001 0.001
0.000 0.000
–0.001 –0.001
Reg Irreg Reg Irreg
Suffixes Stems

0.007 0.007
0.006 0.006
0.005 0.005
Entropy Difference

Entropy Difference

0.004 0.004
0.003 0.003
0.002 0.002
0.001 0.001
0.000 0.000
–0.001 –0.001
Reg Irreg Reg Irreg
Stress Defectiveness

Figure 2.4. Effect of the irregularity of each layer on system complexity (entropy
difference); the vertical bars show 95% conﬁdence intervals
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     49

2.8 Discussion and conclusions

This study highlights the need for caution in interpreting results from data whose
representations include only affixal and regular inflectional patterns, since they
may misrepresent the complexity of inflectional systems and/or obscure import-
ant aspects of inflectional structure. For example, the four most granular repre-
sentations of Russian nouns in our study (forty-two, fifty-seven, sixty-four, and
eighty-two classes) have an unweighted average conditional entropy that exceeds
the largest unweighted average conditional entropy value among the ten languages
investigated by Ackerman & Malouf (2013),²³ even though the conditional
entropy of a four-class system of Russian falls in the middle of the range for
languages they investigate. The mean conditional entropy of our most granular
representation (eighty-two classes) is twice as high as the value for the four-class
Russian system in Ackerman & Malouf’s paper. This raises questions about the
extent to which typologically low systemic complexity is a reflection of assump-
tions adopted when creating representations of those systems.
At the same time, it is equally important to point out that for every represen-
tation of the Russian nominal inflectional system that we investigated—that is,
every granularity—the estimated complexity of the Russian noun class system was
substantially lower than the potential complexity of the system, as shown in
Figure 2.2 in section 2.6.2. The estimated complexity of the system was also
significantly lower than would be expected by chance (Figure 2.3 in section
2.6.2). This indicates that a significant amount of ‘work’ is done by implicative
structure, regardless of the particular representation that is assumed. The latter
result contradicts Ackerman & Malouf’s (2013: 451) speculation that Russian has
no need to rely on implicative organization. However, arguably the more import-
ant conclusion is that in the end, our results are consistent with their Low
Conditional Entropy Conjecture, if it is interpreted as a claim that inflection
class systems self-organize to minimize the amount of complexity embodied in
the system (rather than as a claim about a particular maximum possible condi-
tional entropy value). No matter what particular representation we assume,
Russian nouns show a pattern that is consistent with low systemic complexity,
suggesting that a typological tendency towards low systemic complexity may
extend beyond affixal and highly regular patterns.
While the Low Conditional Entropy Conjecture focuses on a global measure of
the complexity of inflection class systems, an equally interesting question has to
do with how the component parts of the system shape this global complexity.
From this perspective, an important result in this chapter is that the estimated
actual complexity of the system changes very little, despite the fact that the

²³ Amele, with a conditional entropy of 1.105 bits; Ackerman & Malouf (2013: 443, table 3).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

50     . 

potential complexity of the system tends to increase as information about inﬂec-

tional exponence (paradigmatic layers) is added (Figure 2.2 in section 2.6.2). This
means that the importance of implicative structure to the organization of the
Russian nominal system emerges most clearly when irregular and non-affixal
patterns are considered. The data presented here thus suggest that inflection
class systems self-organize to minimize the potentially disruptive effects of irregu-
larity and to maintain low complexity overall. This is an important aspect of the
organization of the nominal system that would be hidden in a more coarse-
grained representation.
In a similar vein, we also showed that irregularity in some paradigmatic layers
(stress, defectiveness) increases the complexity of the system, but in others it does
not (Figure 2.4 in section 2.7.2). This suggests that the system as a whole is not
simply a function of the complexity of its parts. It is instead a product of the way
the parts are distributed—that is, how the component elements are related. This
should hardly be a surprise, but the data in this chapter highlight that these sorts
of local relations, and how they lead to complexity in an inflection class system (or
don’t!), are at least as important to focus on as the complexity of the system
overall. To the extent that languages universally or predominantly exhibit low
systemic complexity, the question becomes why. At a broad level, the answer likely
has to do with learnability (Ackerman et al. 2009), but to get beyond general
formulations of this idea, it will be necessary to dive into the learnability of specific
inflection class configurations, and to carefully examine local relations among the
component parts of individual inflection class systems.²⁴ In this chapter, we have
contributed towards this goal.
Finally, we consider our results in the more general context of linguistic
complexity. Studies on the overall complexity of languages suggest that there
may not be any typological limits on linguistic complexity (see Miestamo 2008
for discussion of global vs. local complexity). Trudgill (2011) argues that small
communities with dense social networks and little linguistic contact with other
communities promote the development and preservation of complexity. Similarly,
McWhorter (2007) suggests that diminished linguistic complexity in a language is
often the result of an influx of large groups of adults that learn the language. These
studies undermine the intuitive idea that complexity in one area of a language
leads to diminished complexity elsewhere in the language (see Hockett 1958:
180–1 for an early vocalization of this idea) and challenge any type of typological
limit on linguistic complexity. The search for typological similarities in linguistic
complexity is elusive enough to have been called a ‘wild goose chase’ (Deutscher
2009). It is thus somewhat surprising that inflection class systems, as particular
local domains of complexity, seem to exhibit systemically low complexity. We

²⁴ See Parker et al. (to appear) for computational modelling of inﬂection class learning that moves in
this direction.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     51

think that investigation of interactions between elements in the system is a

promising avenue for understanding and testing this issue. Whether similar
patterns to what we ﬁnd in Russian exist in other languages is an empirical
question that we feel merits further investigation.

Acknowledgements

We thank Peter Arkadiev, Gregory Stump, and an anonymous reviewer for their
helpful comments. All errors remain entirely our own. This work was supported in
part by The Ohio State University, through a Presidential Fellowship awarded to Jeff
Parker and a sabbatical granted to Andrea Sims.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

3
Demorphologization and deepening
complexity in Murrinhpatha
John Mansﬁeld and Rachel Nordlinger

3.1 Introduction

Linguistic complexity is often associated with morphology, but it may also be

associated with the unravelling of morphology. Hopper (1990) observes that
elements of form that were once morphological exponents may over time lose
their morphological status and become unanalysable subparts of lexical stems. For
example, the final rime of seldom was once an Old English dative suffix *-um
(Hopper 1990: 154). Hopper labels the outcome of this process ‘demorphologisa-
tion’, and we here adapt his usage to conceptualize demorphologization as a
gradient phenomenon, in which morphological structure becomes gradually
blurred over time by the accretion of lexically specific modifications.¹ Our focus
is not on the end-point of this process but the mid-point, where there are
morphological ‘semi-regularities’ that help speakers and learners predict
unknown word forms, but which also leave a residue of unpredictability. This
type of analogical unpredictability has become a major focus in research on
morphological complexity (e.g., Ackerman et al. 2009; Ackerman & Malouf
2013; Parker & Sims, Chapter 2, this volume). Other studies have focused on the
problem of predicting inflectional exponence for unencountered forms in an open
lexical class, though as we argue below, there are some unexamined conceptual
issues with the open-/closed-class distinction. In the current study, we focus on
predictability in a closed class of finite verb stems, albeit one in which there are large
inflectional paradigms, and demorphologization has advanced to the point where
analogical predictability from one stem to another is highly attenuated.
Murrinhpatha finite verb stems, known in the literature as ‘classifier stems’,
exhibit semi-regular patterns associated with demorphologization (Walsh 1976;

¹ ‘Demorphologization’ is used rather differently by Joseph & Janda (1988), who use it in reference
to regularization of phonological processes such that they become independent of an erstwhile
morphological context.

John Mansﬁeld and Rachel Nordlinger, Demorphologization and deepening complexity in Murrinhpatha In: The Complexities
of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John Mansﬁeld and
Rachel Nordlinger.
DOI: 10.1093/oso/9780198861287.003.0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   53

Street 1987; Nordlinger 2015; Forshaw 2016; Mansfield 2016). We present data on
analogical changes observed by comparing recent fieldwork documentation with
forms documented some forty years earlier, showing that the process of demor-
phologization is still underway. Analogical changes show that classifier stem forms
are not learnt and memorized as isolated units, but rather that speakers draw on
paradigmatic semi-regularities to predict unknown forms. Though the system does
not exhibit regular, productive inflection, neither can it be characterized as a set of
‘frozen forms’. Rather it is a relational system, and one that is in flux. We treat
analogical predictability as a form of linguistic complexity, and show that through
ongoing demorphologization, the complexity of Murrinhpatha classifier stems is
increasing. We quantify this unpredictability by adapting probabilistic tools devel-
oped by Ackerman et al. (2009) and Ackerman & Malouf (2015). However, while
the latter hypothesize limits of complexity for systems of productive inflection, the
Murrinhpatha classifier stems are a closed-class system of 1,638 inflectional forms,
where semi-regularities aid acquisition and processing, but whole-form memor-
ization may mitigate the requirement for analogical predictability.
Murrinhpatha is a non-Pama-Nyungan polysynthetic Australian language of
the Daly River region of the Northern Territory. It has maintained a vibrant
speech community some eighty years after its speakers shifted to settled life under
the influence of Catholic missionaries (Pye 1972). Murrinhpatha has some of the
characteristics, both linguistic and social, that might associate it with the ‘isolated,
complex’ language type proposed in sociolinguistic typology (Kusters 2003;
Lupyan & Dale 2010; Trudgill 2011: 136; Bentz et al. 2015). However it is doubtful
that notions of sociolinguistic ‘isolation’ or ‘low-contact’ apply in this instance,
since evidence points to a tradition of regional multilingualism (Falkenberg 1962:
13; Dixon 2002: 674). A crucial distinction for sociolinguistic typology is that
between child-acquired versus adult L2-acquired multilingualism: child multilin-
gualism has been argued to maintain or increase complexity, and adult acquisition
to reduce complexity (Thomason & Kaufman 1988: 65ff; McWhorter 2007 and
Chapter 10, this volume; Trudgill 2011: 34). In the case of Murrinhpatha, we know
too little of traditional multilingualism to know which is more applicable. However
in the post-settlement era (1930s–present) a large number of people from Marri
Ngarr, Marri Tjevin, and other language groups have shifted to Murrinhpatha, in
some cases learning both languages as children but switching to Murrinhpatha
during adolescent years spent in a multi-ethnic school dormitory established by
the missionaries (Mansfield 2014: 98). This influx of new speakers has not brought
about any drastic simplifications or other language contact effects in the contem-
porary grammar of Murrinhpatha, although it has led to the demise of the other
languages of the region.² In this chapter, we demonstrate more specifically that

² Note however that the influx of speakers from other language groups may have had some influence
on the distribution of sociolinguistic variables (Mansfield 2015a, 2015b: 183).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

54     

inﬂectional changes observed in the post-settlement period do not constitute sim-

plifications, given a definition of complexity as allomorphic unpredictability, and a
model of allomorph prediction based on analogical comparison with other lexemes.
Given the ambiguity of Murrinhpatha with respect to sociolinguistic typological
hypotheses, we do not here pursue the question of whether inflectional complexity
depends on social characteristics of the speech community.
The structure of the chapter is as follows. In section 3.2 we outline the
phenomenon of lexically specified inflectional allomorphy, which is the specific
type of morphological complexity discussed in this chapter. In section 3.3 we
discuss hypothesized limits to this type of complexity when applied to large
lexical classes. In section 3.4 we provide an overview of the Murrinhpatha verb
and introduce the relevant aspects of Murrinhpatha verb inflection, which
involves exponence by multiple phonological increments which we label ‘inter-
secting formatives’ (cf. ‘paradigmatic layers’ in Parker & Sims, Chapter 2, this
volume). Intersecting formatives are independent of one another in their para-
digmatic patterns, and most of these patterns are not consistently applied to all
verb stems, making exponence highly unpredictable. This also means that the
formatives are generally not in biunique relations with inflectional categories.
Section 3.4 describes the paradigms as documented in the 1970s (Walsh 1976;
Street 1987), as well as changes to the paradigms observed in our work with a
new generation of speakers since 2010. In section 3.5 we compare the observed
changes with the types of changes predicted by a model of complexity limitation
in large lexical classes (Ackerman & Malouf 2015), showing that none of the
observed changes match the model. In section 3.6 we focus on two of the
observed changes in particular, arguing that they diverge from the complexity-
limitation mechanism because of incremental demorphologization, a process
that is both analogical and destructive of existing analogies. In section 3.7 we
summarize our findings.

3.2 Complexity in lexically speciﬁed allomorphy

There are several distinct dimensions of morphology that can be treated as forms
of linguistic complexity (Kusters 2003, 2008; Anderson 2015a), but in this chapter
we focus solely on (lexically specified) inflectional allomorphy. For example, in the
Australian language Warlpiri verbs are suffixed with one of four lexically specified
past tense allomorphs, -ca, -ŋu, -ɳu, -nu (Hale 1969; Nash 1980: 40). Where
lexemes share the same allomorph selection in all their forms, the shared para-
digms are usually referred to as ‘inflection classes’. Inflectional allomorphy of this
type can be seen as prototypical morphological complexity, since it directly
reduces form:meaning transparency (Aronoff 1998).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   55

The type of complexity instantiated by inﬂectional allomorphy can be

conceptualized in terms of degrees of predictability in allomorph selection. For
example, in a language where almost all verbs take -ak 1., and just a handful
instead take -iq 1., the exponence is mostly predictable; only a small degree of
complexity is involved. But in a language with several lexically-conditioned allo-
morphs, all more or less likely, there is low predictability, or high complexity. The
larger the inflectional paradigm involved, the more that this problem of prediction
becomes a real one for speakers of the language (or indeed linguists attempting to
accurately document the lexicon and morphology), because where large paradigms
are involved there is a more frequent and persistent requirement to produce
previously unencountered forms (Bonami & Beniamine 2016; Blevins et al. 2017).
Degrees of inflectional predictability can be formalized and quantified using
entropy, the weighted average of the log probabilities of all possible outcomes
(Shannon 1948). Entropy can be taken as a measure of the unpredictability of a
set of possible outcomes. The application of entropy as a measure of paradigmatic
implicational structure was proposed by Ackerman et al. (2009).
Work on predictability of allomorphy has proceeded from the insight that the
inflection of a lexeme is not predicted in an informational vacuum, but rather is a
problem of predicting unknown inflectional forms, given one or more forms of the
lexeme that have already been encountered. This has been labelled the ‘Paradigm
Cell Filling Problem’ (Ackerman et al. 2009; Stump & Finkel 2013; Bonami &
Beniamine 2016; Sims & Parker 2016). The paradigmatic structure of inflection is
thus crucial: typically, we expect that paradigmatic patterns are shared by lexemes
in a language, with those lexemes that share a paradigm belonging to a common
inflectional class. The known inflectional forms of a lexeme narrow the possibil-
ities of which class the lexeme might belong to, thus reducing unpredictability of
other forms. For example, the past tense suffix allomorphs mentioned above for
Warlpiri can usually be predicted based on other inflectional forms. All verbs with
imperative in -nta take the past allomorph -nu, licensing an inference from known
form jinta ‘scold.’ to the predicted form jinu ‘scold.’ (Nash 1980, p. 40).
However there are other instances where allomorphy for a particular tense/aspect/
mood (TAM) category does not uniquely identify an inflection class, leaving some
unpredictability in the allomorphy of other forms. Table 3.1 shows the TAM

Table 3.1. Warlpiri verb inﬂection classes (Hale 1969; Nash 1980: 40)

  -   .  

I -mi -ca -ja ~ -ka -ju -ɲa

II -ɳi ~ -ni -ɳu -ka -ku -ɳiɲa
III -ɲi -ŋu -ŋka -ŋku -ŋaɲa
IV -ɳi ~ -ni -ɳu -ɲa -lku -ɳiɲa
V -ni -nu -nta -nku -naɲa
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

56     

allomorphs for all Warlpiri inflection classes. The syncretism between some
classes for some tense categories makes these inflectional forms less than fully
predictive of other inflectional forms of the same lexeme. For example, knowing
that the presentational form is in -ɳiɲa narrows the range of possible imperative
allomorphs, but does not help us to decide between the two possibilities -ka, -ɲa.
Residual uncertainty in predicting an inflectional form, given knowledge of other
forms of the same lexeme, has been labelled integrative complexity (Ackerman &
Malouf 2013).
Integrative complexity meets several of the desiderata enumerated in Arkadiev
& Gardani’s Introduction to this volume (Chapter 1). First, it is quantifiable and
can be used to compare typologically diverse languages. Second, its conceptual-
ization in terms of speaker inferences from known to unknown forms gives it a
clear basis in psycholinguistic processing. Finally, whereas enumerative complex-
ities lean heavily on the distinction between morphology and syntax, integrative
complexity is relatively independent of this issue. Lexical selection of allomorphs
generally occurs within units that are identified as words, but if a similar phe-
nomenon occurred in phrase-like structures (e.g., periphrastic inflections with
allomorphy on the auxiliary), this would have no real effect on the modelling of
integrative complexity in the paradigm.

3.3 Complexity, predictability, and language change

In this chapter, we focus on the effects that language change may have on
inflectional predictability. It has been shown that inflection class structure may
persist in a language over long time periods (e.g., Maiden 2005; Gardani 2013), but
even if it may in some instances be relatively stable, it is of course not completely
static. The inflectional allomorphs selected by lexemes exhibit synchronic vari-
ation, with fluctuating variation rates over time leading to language change
(Weinreich et al. 1968). The long-term patterns of changing allomorph selection
have been studied in historically documented languages such as Latin (Gardani
2013: 201–28) and English (Jespersen 1949; Bybee & Moder 1983). An interesting
question is whether the direction of such change reflects limits on overall com-
plexity and, conversely, what mechanisms lead to an increase in complexity.
There must be some upper limit of unpredictability at which inflectional
systems remain learnable. If allomorphic distributions were too unpredictable,
their prospects of being stably transmitted from one generation to the next would
become rather slim. The obvious way to reduce unpredictability is to replace
improbable allomorphs with more probable ones. We have little idea of how
much unpredictability is too much, though crosslinguistic studies by Ackerman
& Malouf (2013, 2015) and Stump & Finkel (2013) have documented the range
of unpredictability found in genetically and typologically diverse samples.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   57

Ackerman & Malouf (2013) compare synchronic inﬂectional systems in ten

languages, showing that in all cases the average conditional entropy of one
inflectional form of a lexeme, given knowledge of one other form, is between
zero and 1.1 bits, the latter being approximately equivalent to a choice between
two equally likely outcomes. Moreover, the languages in the sample that have the
most allomorphs, and therefore risk the greatest unpredictability, are also the
languages that make the most use of paradigmatic structure to mitigate unpre-
dictability (Ackerman & Malouf 2013: 443). In other words, paradigmatic struc-
ture of the type illustrated for Warlpiri above exhibits a strong crosslinguistic
tendency to maintain a reasonable level of predictability for unknown inflectional
forms. While Ackerman & Malouf (2013) do not propose a specific numeric limit
for how much integrative complexity learners can deal with, their study provides a
principled method of quantification, and an initial sample of measurements,
against which apparently complex languages such as Murrinhpatha can be
compared.
A simulation of how language change might reduce unpredictability
(Ackerman & Malouf 2015) provides a useful model for considering the mech-
anism of analogical extension. Ackerman & Malouf (2015) model diachronic
change in an inflectional system based on the principle that, given a known
inflectional form of a lexeme, and the requirement to predict an unknown form
of the same lexeme, a speaker identifies lexemes that share allomorphs with the
known form. Change proceeds by revising paradigm-internal relations to match
the same morphosyntactic relations in other paradigms. We will henceforth use
the terms ‘source form’ for the known form, ‘target form’ for the unknown form,
‘comparable lexemes’ for other lexemes that share allomorphy with the source
form, and ‘comparable source, comparable target’ for the comparable forms that
correspond in morphosyntactic category with the source and target forms respect-
ively. Given the array of comparable lexemes, the speaker establishes which
allomorph occurs most frequently among the comparable targets, and predicts
this to be the allomorph for the target form. Predictions of this type are taken as a
model for language change, because in the next iteration of the simulation, it is the
predicted form that is now taken as the allomorph for the target cell, rather than
the previous incumbent form. This is a hyperactive model of change, where
overgeneralization errors go uncorrected. The model is not specific to either
child acquisition or adult usage, which in any case may not be a sharp distinction
in large inflectional systems, where some inflected forms must be guessed by
speakers even after many millions of words of input (Blevins et al. 2017).
In Figure 3.1 we show the process of analogical induction, and replacement of
the target form with a predicted form. A, B, etc., represent lexemes, with inflec-
tional categories Ai, Aii, Bi, Bii, etc., while x, y represent exponence candidates. Ai is
the source form and Aii is the target form. B, C, D are comparable lexemes
(sharing exponence with source form), while E, F are disregarded since they do
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

58     

Aii = unknown
Ai = x1 compare Bi = x1, Ci = x1, Di = x1, Ei = x2, Fi = x2
relate
Aii = y2 induce Bii = y1, Cii = y2, Dii = y2

Figure 3.1. Ackerman & Malouf (2015) mechanism for predicting unknown
inﬂectional forms

not share exponence with the source form. The comparable lexemes analogically
present both y₁ and y₂ as exponence candidates for the target form, but y₂ wins out
because it occurs more frequently in this distribution. If Aii is used as a source
form or comparable lexeme form in a subsequent iteration, it will have the
exponence y₂.
Ackerman & Malouf (2015) computationally simulate this model of inflectional
change based on a ‘highly unrealistic language’ in which allomorphy is almost
completely unpredictable in the initial state. The simulation language has a
hundred lexemes, each of which inflects for eight morphosyntactic categories,
giving a total of 800 forms in the system. Each morphosyntactic category has
three allomorphs, which are randomly assigned to each lexeme. Thus there are
3⁸ = 6,561 possible inflectional paradigms, so that most of the hundred lexemes
have an idiosyncratic paradigm, that is, not shared with any other lexeme. In this
initial state, there are no inflectional classes. As the simulation iterates, replace-
ment of unknown allomorphs with the most predictable allomorph leads to
massive convergence of lexemes towards shared inflectional paradigms. The
simulation ends when allomorphy stabilizes (i.e., the unknown form already is
the most predictable form) for twenty-five consecutive iterations. Given hundreds
of trials of the simulation, in a large proportion of simulations (no exact figure is
given), all lexemes converge on a single set of allomorphs (i.e., no allomorphy),
creating a single inflectional paradigm. In the remaining simulations, lexemes
converge on between two and eighty-eight inflectional classes, the median number
being twelve (Ackerman & Malouf 2015: 8).
In terms of inflectional predictability, the initial random distribution of allo-
morphs [x₁, x₂, x₃] for each inflected form means that knowledge of other inflected
forms does not offer any reduction to uncertainty (except by occasional accident
of the distribution), and conditional entropy is therefore only marginally less than
unconditional entropy, that is, H(a, b, c) = 1.58 bits. But the replacement by most
predictable allomorph mechanism in the simulated language change reduces this
entropy to 0 bits in the instances where all lexemes converge on a single paradigm,
and an average of 0.64 bits in the instances where the simulation converges on a
set of inflectional classes (Ackerman & Malouf 2015: 9). The average conditional
entropy found in these simulated inflectional systems sits neatly within the range of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   59

0–1.1 bits found in the study of natural languages (Ackerman & Malouf 2013). This
provides support for the notion that the model’s simplification mechanism may
have something in common with mechanisms deployed in natural language.
One issue that has been insufficiently addressed in work on integrative com-
plexity is the question of open versus closed lexical classes. The Ackerman &
Malouf (2015) simulation works with a set of a hundred lexemes, that is to say a
finite set, and therefore a closed class. The basic formulation of the Paradigm Cell
Filling Problem (PCFP; Ackerman et al. 2009) presumes that unknown inflec-
tional forms must be predicted by a speaker, but also that the correct inflectional
exponence is in some way defined—perhaps by a dictionary, or a more erudite
speaker. Now, if we take ‘open class’ to mean a lexical class to which entirely new
words can be added, then there must be a point at which inflectional forms of
these words are not pre-defined, and there is no correct or incorrect selection of
exponence. In other words, for truly open-class lexemes, the PCFP is undefined. In
the next section, we will see that Murrinhpatha classifier stems are a closed class,
with rather fewer members than may be intended in the original PCFP formula-
tion. However, we argue that the model is still relevant, as Murrinhpatha speakers
are not born with complete knowledge of the classifier stem paradigms, and must
therefore use predictive mechanisms to extrapolate from known to unknown
forms.

3.4 Unpredictable exponence in Murrinhpatha classiﬁer stems

Murrinhpatha is a polysynthetic language with complex verbal structures includ-

ing agreement morphology, nominal incorporation, adverbial modifiers, and
complex predicates (Nordlinger 2017). Verbs are built on a finite stem element
known as a ‘classifier stem’,³ which may either form a complete verb on its own
or form the basis for a complex predicate. Classifier stems encode predicate
semantics, subject person and number, and tense/aspect/mood marking. All
Murrinhpatha verbs require a classifier stem in first position (bolded in the
examples below). There are thirty-nine classifiers, each of which appears in
forty-two inflected forms, thus giving a total of 1,638 inflected forms. Eleven of
the thirty-nine classifiers can form a verb on their own (1), the remaining twenty-
eight are only ever found in combination with a second, uninflecting stem element
later in the verbal word (underlined in the examples below) with which they
jointly determine the predicate semantics (2)–(5). The only allomorphy in the
verb is in the classifier stem element—all other elements have a single exponence,
subject only to phonologically motivated alternations. For more discussion of the

³ In other work these have been called ‘auxiliaries’ (Walsh 1976), ‘classifier-subject pronominals’
(Nordlinger 2011), and ‘finite verbs’ (Mansfield 2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

60     

details of the system the reader is referred to Blythe (2009), Nordlinger (2011,
2015), and Mansﬁeld (2016, 2019) among others.⁴

(1) wuɾan
3S.(6).
‘She goes.’

(2) muŋam-paɭ
3S.(11).-break
‘She broke it off.’

(3) pam-ŋin̪t̪a-nu-ma-ɻaʈal
3S.:(24).-.---tear
‘They (two female non-siblings) tore the (cloth) from each other.’
(RN-20070531-002:011)

(4) piɾim-nin̪t̪a-nu-bu-wuj-waɖa-ya
3S.(3).-.--thigh-put.into--
‘They put them in their pockets.’ (JB 43JBc743652_747130)

(5) puddan-wunku-ɭaɭ-dejida-ŋime=pumpan-ka
3S.(29).-3O-drop-in.turn-.=3S.(6).-
‘They (dual, sibling) are dropping them (paucal, female, non-sibling) off,
one after the other, as they go along.’ (Blythe 2009: 134)

For most classifier stems the exponence pattern making up the paradigm of forty-
two inflected forms is unique to that stem. Thus the concept of ‘inflectional
classes’—a set of exponence paradigms shared by many lexemes—is not directly
applicable to Murrinhpatha. (1)–(5) show classifier stems as unsegmented wholes,
and this has been the representation used in most work on Murrinhpatha.
However there are semi-regular subcomponents evident in these stems, and it is
these that we treat as exponents of inflectional categories. These are not product-
ive morphs that are applicable to new lexemes in an open class, however they do
constitute morphology in the sense of form:meaning associations between sys-
tematically related forms (Anderson 2015b).

⁴ In the Appendix we have provided paradigms for five classifier stems, to exemplify the complexity
amongst them. Previous descriptions of the Murrinhpatha verbal system (e.g., Blythe et al. 2007;
Nordlinger 2011, 2015) have tended to treat these classifier stem paradigms as consisting of synchron-
ically unanalysable portmanteau forms, due to the substantial amounts of unpredictability and
suppletion within the paradigms. The full set of thirty-nine paradigms as analysed in this chapter is
available at http://langwidj.org/Murrinhpatha-inflection.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   61

We assume that Murrinhpatha speakers to some extent store classiﬁer stems as

whole forms, rather than composing them online from the elements of exponence
(cf. Mithun, Chapter 12, this volume). However this does not mean that inflec-
tional exponence has no role in acquisition or processing. We do not know how
much input is required for a Murrinhpatha speaker to encounter all 1,638 forms
enough times that they can all be memorized, but the available evidence on corpus
distribution of inflected forms suggests that many years of input are required to
offer complete coverage of large paradigms (Blevins et al. 2017). As with other
inflectional systems, when Murrinhpatha speakers parse or produce forms that
they have not yet encountered, the recurrent patterns of exponence offer predict-
ive clues. Indeed, the evidence of analogical change presented in this chapter
shows that classifier stem forms are not acquired and stored as isolated forms:
speakers draw on the exponence of one classifier to produce the exponence of
another. Research on child acquisition of Murrinhpatha verb inflection also shows
that children make occasional errors in allomorphy selection, revealing morpho-
logical structure in the acquisition of classifier stems (Forshaw 2016). Therefore,
both the PCFP (Ackerman et al. 2009) and the Ackerman & Malouf (2015)
simplification mechanism are relevant to Murrinhpatha. The fact that
Murrinhpatha classifier stems constitute a closed class does not disqualify them
from applicability of these models, since, as we observed above, the PCFP is only
strictly defined for a closed class.

3.4.1 Intersecting formatives and unpredictable allomorphy

Inﬂectional allomorphy in Murrinhpatha classiﬁer stem paradigms is both highly

complex and typologically unusual, meaning that a detailed exposition is beyond
the scope of this chapter.⁵ Murrinhpatha’s thirty-nine classifiers each appear in
forty-two inflected forms (and never in non-finite form). The ‘inner stems’ upon
which these forms are built are highly mutable, creating much of the complexity in
the system (cf. Parker & Sims, Chapter 2, this volume). Table 3.2 illustrates some
sample classifier forms,⁶ representing two classifiers (la ‘(26)’, ma ‘(34)’)
that have fairly clear phonological stems, one classifier (ɾu ‘(6)’) that has a
highly mutable stem, and one classifier (i ‘(1)’) that has a vowel-only stem,

⁵ Fuller description is available in Mansfield (2016, 2019), drawing on earlier partial analyses (Walsh
1976: 224; Green 2003; Forshaw 2016: 37). As shown in the examples above, there is also further
inflectional morphology in the verb that is not part of the classifier stem paradigms, and can be applied
equally to verbs based on any classifier stem (Nordlinger 2015, 2017). This morphology has no bearing
on the issues discussed in this chapter and will therefore not feature in our remaining discussion.
⁶ The full paradigms are provided in the Appendix.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

62     

Table 3.2. Examples of inﬂected classiﬁer forms

3. 3. 3.

la ‘(26)’ kila dilam pilla

ma ‘(34)’ ma mam pume
ɾu ‘(6)’ kuɾu wuɾan puɳi
i ‘(1)’ ki dim piɾini

which only surfaces phonologically when it syllabiﬁes with a consonantal stem

alternation, and otherwise results in a phonologically empty stem.⁷
The challenge for segmentation and analysis of classifier forms lies in the fact
that each form combines several independent dimensions of allomorphy. Each
inflected classifier form selects a prefix consonant allomorph, an orthogonally
distributed prefix vowel allomorph, and an orthogonally distributed suffix allo-
morph (any of which can be zero). We label this combination of orthogonal
allomorphs inflection by intersecting formatives (Mansfield 2016). Intersecting
formatives appear to be a recurrent feature of highly complex verbal inflection
systems, such as Mazatec (Ackerman & Malouf 2013), Greek (Sims 2015: 143ff),
Saami (Feist 2015: 140ff), and Seri (Baerman 2016). Intersecting inflectional
formatives are given an explicit formulation in Network Morphology, where
they are represented as multiple inheritance of inflection class nodes (Brown &
Hippisley 2012: 71ff), and there is further discussion of the phenomenon with
respect to complexity in Parker & Sims (Chapter 2, this volume), where intersec-
tional inflection is labelled ‘paradigmatic layers’. Intersectional inflection often
combines concatenative and supra-segmental morphology, and this is also the
case in the Murrinhpatha verb forms.
The Murrinhpatha classifier stem is built on a phonologically minimal ‘inner
stem’ of the shape (C)(C)V, which alternates in three orthogonal dimensions:
stem consonant mutation, vowel height, and vowel frontness. Each inflected form
of a classifier stem is therefore determined by six dimensions of intersecting
allomorphy: PrefC, PrefV, StemC, StemVH, StemVF, and Suffix. Table 3.3 illus-
trates the intersecting formative analysis of the forms shown above in Table 3.2.
Formatives exhibit ‘semi-regularities’ that appear in some but not all exponents of
a morphosyntactic cell, for example, PrefC k- in 3., Suffix -m in 3..
Other (semi-)regularities attach to particular classifier stems, for example PrefV i- in

⁷ The description of ‘vowel-only stems’ is somewhat different from Mansﬁeld (2016), where they are
simply labelled ‘phonologically empty stems’. The analysis there nonetheless depends on underlying
‘theme vowels’ in such stems, though this is not explicitly discussed. An alternative analysis would
propose a zero theme vowel, to avoid the use of unrealized underlying vowels. We have experimented
with calculation of Murrinhpatha integrative complexity using both analyses, and found that the
difference is very small (< 1%). The unrealized vowel alternative produces slightly lower complexity
measurements, and we therefore select this option to keep our complexity measurements conservative.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   63

Table 3.3. Examples of classiﬁer forms and their formative analyses

3. 3. 3.


. - . - [, , ] - 
la ‘(26)’ kila dilam pilla
k-i-la[]-∅ d-i-la[]-m p-i-lla[:]-∅
ma ‘(34)’ ma mam pume
∅-u-ma[]-∅* ∅-u-ma[]-m p-u-me[:]-∅
ɾu ‘(6)’ kuɾu wuɾan puɳi
k-u-ɾu[]-∅ w-u-ɾa[:]-n p-u-ø[:]-ɳi
i ‘(1)’ ki dim piɾini
k-i-∅[]-∅ d-i-∅[]-m p-i-ɾi[:]-ni

Notes: * PrefV, like the stem vowel, does not surface unless it can syllabify with an onset consonant.
Thus we can analyse a PrefV u- formative in ø-u-ma-m 3.(34)., in keeping with this
classiﬁer’s overall paradigmatic pattern, though the surface form is mam.
 = default;  = geminate;  = ɾ-alternation.

(26) and (1), but PrefV u- in (34) and (6). Importantly, these patterns
are often orthogonal—for example, the PrefV selection is independent of the PrefC
selection in 3..
As shown in the full paradigm examples in the Appendix, the complete morpho-
syntactic paradigm of a classifier stem consists of forty-two inflectional forms.
Subjects are distinguished for 1/2/3 person, cross-cutting a three-way //
number distinction (although / is consistently collapsed in  tense, and
in all tenses for some paradigms).⁸ There is also a 1+2 ‘we inclusive’ person category,
which has no number distinctions. These are the core number/person categories of
Murrinhpatha, but more specific subcategories can be encoded using various pre-
dictable suffixes not discussed here (Nordlinger 2015). There are four basic tense/
modality categories (henceforth ‘tenses’): non-future (), irrealis (), past (),
and past irrealis (), as well as ‘subtense’ distinctions between  vs presen-
tational (), and  vs future indicative (), which apply only to third-person
forms. Again, these core categories can be further specified by predictable suffixes
encoding tense, modality, and aspect (Nordlinger & Caudal 2012). Table 3.4 illus-
trates a complete paradigm of inflected forms for one of the more regular classifiers,
na ‘(27)’, with both surface forms and intersecting formative analysis.
Some formatives in some cells have a consistent form (i.e., no allomorphy), such
as PrefC p- in 3.. More typical is a selection between a handful of formative
allomorphs, for example Suffix -m, -n, -ŋam, -ŋan in , or PrefV a-, e-, i, u- for
all cells. A particularly wide selection of allomorphs is PrefC p-, w-, d-, n-, j-, k-,

⁸ The category here labelled  is used for both dual and paucal referents; it is labelled PAUCAL
(PC) in Mansﬁeld (2016) and DAUCAL in Blythe (2009).
Table 3.4. Inﬂectional exponence of na ‘(27)’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

na ‘(27)’ INNER na 
STEMS: nna :
∅ :
NFUT (/PRSL) IRR (/FUT) PST PSTIRR

SG 1 ŋinaŋam SG 1 ŋina ŋinaŋa ŋinaŋi

ŋ-i-[]-ŋam ŋ-i-[]-∅ ŋ-i-[]-ŋa ŋ-i-[]-ŋi
2 t̪inaŋam 2 t̪ina t̪inaŋa t̪inaŋi
t̪-i-[]-ŋam t̪-i-[]-∅ t̪-i-[]-ŋa t̪-i-[]-ŋi
3 ninaŋam/ kinaŋam 3 nina/ kina niŋa niŋa
n-i-[]-ŋam k-i-[]-∅ n-i-[]-ŋa n-i-[]-ŋi
/ k-i-[]-ŋam / p-i-[]-∅
INCL 1+2 t̪inaŋam INCL pina t̪inaŋa t̪inaŋi
t̪-i-[]-ŋam 1+2 p-i-[]-∅ t̪-i-[]-ŋa t̪-i-[]-ŋi
PL/DU 1 ŋinnaŋam PL 1 ŋinna ŋinna ŋinnaŋi
ŋ-i-[]-ŋam ŋ-i-[]-∅ ŋaŋ-i-[]-ŋa ŋ-i-[]-ŋi
2 ninnaŋam 2 ninna ninnaŋa ninnaŋi
n-i-[]-ŋam n-i-[]-∅ n-i-[]-ŋa n-i-[]-ŋi
3 pinnaŋam / kinnaŋam 3 kinna / pinna pinnaŋa pinnaŋi
p-i-[]-ŋam k-i-[]-∅ p-i-[]-ŋa p-i-[]-ŋi
/ k-i-[]-ŋam / p-i-[]-∅
DU 1 ŋinna ŋinnaŋa ŋinnaŋi
ŋ-i-[]-∅ ŋ-i-[]-ŋa ŋ-i-[]-ŋi
2 ninna ninnaŋa ninnaŋi
n-i-[]-∅ n-i-[]-ŋa n-i-[]-ŋi
3 kinna / pinna pinnaŋa pinnaŋi
k-i-[]-∅ p-i-[]-ŋa p-i-[]-ŋi
/ pi-[]-∅
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   65

ø- 3., and StemC allomorphy also has a large selection of allomorphs, once
we take into account various suppletive (i.e., altogether unpatterned) consonant
alternations.
From the point of view of integrative complexity, that is, the predictability of an
inflected form given knowledge of some other form, the formatives individually
have an intermediate degree of predictability. In certain dimensions there is very
high predictability: for example, if one  form takes Suffix -ŋam, there is a very
high likelihood (though not quite categorical) that any other  form of the
same verb will take Suffix -ŋam. This is illustrated in the consistent tense pattern-
ing of Suffix allomorphs in Table 3.4. Among cells that have the same tense and
number categories but differ for 1/2/3 person, the only difference of exponence is
usually PrefC; these triplets of cells are therefore tightly integrated in terms of
implicational structure. However, when we consider the implicative relationship
between cells from different tenses, we find that, say, knowing  -ŋam
provides little information about the Suffix allomorph for  cells. Allomorph
selection across tenses is strongly orthogonal. Other formatives have generally
high degrees of integrative complexity, that is to say, inconsistent paradigmatic
patterning. This is especially true of the stem formatives StemC, StemVH, and
StemVF, and also to some extent of PrefV.
The problem of predicting an unknown inflected form of a Murrinhpatha
classifier stem therefore involves predicting allomorph selection for six intersect-
ing formatives, based on knowledge of such an intersection for some other form
of the classifier stem. Some formatives provide good chances of correct prediction,
while others are rather less helpful. This situation is not as extreme as the
completely random paradigmatic distribution of allomorphs in Ackerman &
Malouf (2015)’s ‘unrealistic language’, though the presence of six different dimen-
sions of allomorphy in Murrinhpatha nonetheless leads to a high degree of
complexity, since the unpredictability of the allomorphs is compounded.
Because Murrinhpatha classifier stems often have idiosyncratic exponents, that
is, allomorphs not shared by any other classifier stem, the entropy calculations
used in Ackerman & Malouf (2013) are not directly applicable. The latter’s
allomorphic entropy method assumes that all possible exponents have been
encountered in other lexemes, so that allomorphy prediction involves a distribu-
tion of possible outcomes. But in a system with idiosyncratic exponents, the
unknown target exponent may be one that has not previously been encountered
(cf. Dahl, Chapter 13, this volume). The speaker’s challenge is not one of entropy
in the distribution of previous observations, but of attempting to predict an
outcome that may or may not match any previous observation. Thus the math-
ematical analysis calculates chance of correct prediction (including zero chance
for a previously unencountered paradigmatic relation), rather than degrees of
entropy. Nonetheless, we can make a notional comparison of Murrinhpatha with
the crosslinguistic findings on entropy in Ackerman & Malouf (2013). The latter
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

66     

finds average conditional entropy between 0 and 1.1 bits, and 1 bit of entropy
equates to a randomized prediction having 50% chance of matching the
outcome. Mansfield (2016) calculates that the average chance of correct prediction
from one Murrinhpatha classifier stem form to another is 43%, comparable to 1.22
bits of entropy.⁹ This is slightly outside the range of the Ackerman & Malouf
sample, suggesting that Murrinhpatha’s closed-class classifier stems have an
integrative complexity at the upper end of the scale found for open-class systems
in other languages. As far as we know, the only language that has been analysed as
having clearly higher integrative complexity is Seri (isolate, Mexico), which has
almost 2 bits average conditional entropy (Baerman 2016).

3.4.2 Variation and change

With 39 x 42 = 1,638 inﬂectional cells to be learnt, and implicational relations

proving only moderately helpful in deducing unknown forms, it would be sur-
prising if all Murrinhpatha speakers selected the same allomorphs all the time.
The presence of allomorphic variation in Murrinhpatha classifier stem forms has
previously been explored only to the extent that some paradigm cells are docu-
mented with two or more variants, for example nuɻa ~ na 3S.(7). (Street
1987: 84). The 1,638 cells of the full classifier stem paradigms have been docu-
mented based on a limited set of spontaneous speech data, with gaps filled by
systematic elicitation of paradigms by multiple researchers over a number of years
of descriptive work. These collective findings are collated as Blythe et al. (2007), and
since then have been further revised and reanalysed in Mansfield (2019) although
many questions still remain. Understanding the extent of allomorphic variation, the
proportion in which variants are used, and any conditioning factors on the variation
requires much more data. Investigation of such variation in Murrinhpatha is still a
work in progress, but after forty years of intermittent research on this language,
there are now some inflectional variables for which we have enough corpus tokens
to begin proposing patterns of variation and implicit diachronic change.
For this study we have identified seven inflected forms with attested variation.
These are the complete set of forms that fulfil the following criteria:

(a) Variation attested in the corpora of adult speech recorded by Blythe,

Mansﬁeld, Nordlinger, Street, & Walsh;¹⁰
(b) Allomorphic variants are attested with multiple corpus tokens for each
variant;

⁹ That is, log₂(1/0.43) = 1.22.

¹⁰ Much of this corpus material is stored in public archives at the Australian Institute of Aboriginal
and Torres Strait Islander Studies (Walsh), the Max Planck Institute Language Archive (Blythe), and
PARADISEC (Mansﬁeld, Nordlinger).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   67

(c) The variation is morphological, rather than purely phonological. For

example, pujemam ~ pijemam 3S.(34). is purely phonological
variation based on assimilation of the vowel to the following glide, and is
therefore not an instance of lexically speciﬁed allomorphy.

None of the seven variables thus identified have enough corpus tokens to support
a rigorous variationist analysis. Nor is there sufficient data to permit differentiation
between contextual factors such as phrasal context, speech style, speaker gender, etc.
Rather, in this study we focus purely on the distribution of variants among
speakers born in the first half of the twentieth century (‘older speakers’) versus
those born in the second half (‘younger speakers’). This method allows us to detect
proportions suggestive of change in progress in inflectional variants, and thereby
to search for signs of the Ackerman & Malouf (2015) simplification mechanism in
effect. In fact, for all seven of the variables, there is a striking difference between
variant distributions among older and younger groups, with the younger moving
strongly towards the variant not attested in earlier documentation.¹¹ This is likely
not an accident: the fact that these seven inflected forms were noted as variable is
primarily because they stood out in Mansfield’s fieldwork as conflicting with
earlier grammatical descriptions of the language. On the other hand, though
speakers showed clear awareness of social indexicality in phonological and lexical
variation among the generations, they were unaware of the intergenerational
variations in inflectional morphology (Mansfield 2014: 469ff).
It has often been observed that less frequent inflectional forms are more suscep-
tible to analogical change in morphology, though frequent forms may also undergo
such changes (e.g., Fertig 2000: 125). Since our method for identifying changes in
Murrinhpatha depends on the salience of these changes in fieldwork, these can all be
said to occur in fairly frequent forms. We presume that further analogical changes
occur in less frequent forms, though we have not had the opportunity to observe
these, and the corpus data drawn upon for this study does not permit robust
estimates of inflectional form frequency.
Table 3.5 lists the seven observed variables, with variants preferred by older
and younger speakers respectively according to the corpus evidence. Note
that where regular triplets of 1/2/3 person inflections are all involved, these
are treated as a single variable in view of their tight mutual implications.
Token numbers in parentheses indicate the number of tokens found for the
older:newer variants among that speaker group. For example, for 1S.(34).,
older speakers were found to have five tokens of me and one token of ŋeme,

¹¹ Some of the sources for older speakers are written (e.g., Bible; Street 1987) and do not have
accompanying audio sources. It is possible that these sources underreport use of innovative variants, by
correcting them to what may have been seen as the ‘correct’ form. This may account for some of the
strength of the swing in proportions from older to younger speaker groups.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

68     

Table 3.5. Variably inﬂected classiﬁer stem forms

Classiﬁer, inﬂection Older speakers (tokens) Younger speakers

(tokens)

ma, 1.(34). me (5:1) ŋeme (0:9)

ma, 2S.(34). nam (7:1) t ̪amam (10:13)
ma, .(34). ŋamam, namam, pamam (17:5) ŋujemam etc (3:12)
ɾu, .(6). ŋa, na, ka (3:0) ŋu etc (0:3)
nu, .(7). ŋunna, nunna, punna (10:1) ŋunne etc (0:10)
ɾa, 3.(28). paŋan (4:0) piɾim (1:5)
ɾi, 3.: (36). pim (2:0) piɾim (0:5)

while younger speakers were found to have zero tokens of me and nine of
ŋeme.
Interestingly, one of the few forms earlier documented as being variable,
nuɻa ~ na 3.. (Street 1987: 84), showed only marginal variability in
the corpus data. There are dozens of attestations for na, and only one for nuɻa,
suggesting that the latter variant was already on its way out when Street
recorded it.

3.5 Predictability of changes observed in Murrinhpatha

In the last section we saw that Murrinhpatha classifier stems are a closed class in
which the inflectional paradigms are large, and implicational relations are highly
unpredictable. We also saw that allomorphy of exponence in this system is not
static, but rather encompasses some variable forms, which show signs of change
over the last couple of generations. Thus we are now in a position to investigate
whether the changes observed in Murrinhpatha decrease or increase the complex-
ity of the system. To test this, we ran the Ackerman & Malouf (2015) simplifica-
tion method (with adaptions as described above) on the relevant classifier forms,
identifying the most predicted allomorphs. We show that the observed change
does not replace an incumbent allomorph with the most predictable allomorph in
any of the seven inflected forms. We then go on to consider a weaker form of the
Ackerman & Malouf (2015) simplification mechanism: when speakers replace an
old allomorph with a new one, do they at least select one that is more predictable
than the previous? We find that, on the contrary, most of the changes observed in
Murrinhpatha select less predictable allomorphs, thus increasing the complexity of
the system.
The Ackerman & Malouf (2015) simplification mechanism was implemented
for Murrinhpatha classifier inflections using intersecting formatives to draw
independent analogies, since this method has been shown to provide the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   69

Table 3.6. Allomorphs selected by Ackerman & Malouf (2015) simpliﬁcation

mechanism

Classiﬁer, inﬂection Older Younger Ackerman & Malouf (2015)

speakers speakers simpliﬁcation

ma, 1.(34). me ŋeme me

ma, 2.(34). nam t ̪amam nam
ma, .(34). ŋamam etc ŋujemam etc ŋumam etc
ɾu, .(6). ŋa etc ŋu etc ŋuɻu etc
nu, .(7). ŋunna etc ŋunne etc ŋunni etc
ɾa, 3.(28). paŋan piɾim piɻam
ɾi, 3.:(36). pim piɾim piɻim

greatest probability of correctly predicting allomorphy (Mansﬁeld 2016).¹² The

implementation iterates through every inflected form of every Murrinhpatha
classifier stem, treating each in turn as a target form requiring analogical
prediction. The predictive mechanism takes each other inflected form of the
classifier stem in turn as a source form, and for each identifies comparable
classifier stems, from which candidate allomorphs for the target form are
deduced. The probability of each candidate allomorph is the proportion of
comparable classifier stems that imply that allomorph. The probability of can-
didates is aggregated across all source forms, revealing the overall most probable
candidate. The most probable candidate allomorphs selected by the implemen-
tation for our variable inflected forms are illustrated in Table 3.6, along with the
older and younger speakers’ attested forms (see full paradigms in Appendix).
The results of the implementation do not in any instance match the innovative
forms observed among younger speakers. However, in some instances the
observed innovation, in comparison with the older form, does exhibit some of
the formative allomorphs selected by the Ackerman & Malouf (2015) simplifica-
tion. For example in 1.(6)., the older form is ŋa and the simplification
form is ŋuɻu. The observed innovation ŋu does exhibit the switch to PrefV u-, but
maintains the weak stem grade of the older form, rather than the StemC [] ɻu of
the Ackerman & Malouf (2015) simplification.¹³ Similarly, in 3.(28).
the observed innovation takes on both the PrefV i- allomorph, and the Suffix -m
of the Ackerman & Malouf (2015) simplification, but does not take up the
StemC [] ɻa selected by the simplification, and also diverges from the

¹² The implementation code is written in Python (Python Software Foundation n.d.), and takes as
input the inflectional paradigm data format established for the Principle Parts Analyzer (Finkel &
Stump 2013). Both code and data are available online at http://langwidj.org/Murrinhpatha-inflection.
¹³ ɾu [] ! ɻu [] may not seem like an obvious case of gemination, but it follows from a ɾɾ !
ɻ process observed in Murrinhpatha’s sister language Ngan’gityemerri (Reid 1990) and their shared
proto-language (Green 2003). In Murrinhpatha it is observable only in the classifier stem paradigms,
where it fits with a broader gemination pattern.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

70     

Table 3.7. Exponence probabilities of older and newer forms

Classiﬁer, inﬂection Older form (prob.) Newer form (prob.)

ma, 1.(34). me (.37) ŋeme (.00)

ma, 2.(34). nam (.14) t ̪amam (.09)
ma, .(34). ŋamam, namam, pamam (.14) ŋujemam etc (.00)
ɾu, .(6). ŋa, na, ka (.06) ŋu etc (.06)
nu, .(7). ŋunna, nunna, punna (.06) ŋunne etc (.06)
ɾa, 3.(28). paŋan (.06) piɾim (.06)
ɾi, 3.:(36). pim (.06) piɾim (.12)

simpliﬁcation by selecting StemVF [] (vowel frontness) and StemVH []

(vowel height) formatives. Finally, .(7). takes up the StemVF []
alternation selected by the simplification, but maintains the StemVH [] (vowel
height) alternation of the older form, instead of selecting the StemVH [] of the
simplification.
Since some of the observed innovations take up subsets of the formative
intersection selected by the adapted Ackerman & Malouf (2015) simplification,
which is the overall most probable exponence, we might wonder whether the
observed innovations represent partial or incomplete moves towards Ackerman &
Malouf (2015) simplification. Do the observed innovations have greater probabil-
ity of being predicted by analogy than the older forms they appear to be replacing?
To this question, the answer is again negative, as illustrated in Table 3.7.
Table 3.7 illustrates that in six out of the seven instances, the innovative form
has either lower probability of being predicted than the older form, or equal
probability. Only one instance, the innovation in 3.:(36)., creates
a more predictable exponent. Even though some of the innovated formatives
match the Ackerman & Malouf (2015) simplification, the selection of non-
simplified formatives undermines the predictability of the entire form. We must
therefore conclude that the changes to inflectional allomorphy observed in
Murrinhpatha data collected over forty years (or at least, apparent changes,
suggested by different distribution of variants among older and younger speakers)
increase the complexity of the system. Most of the changes replace more predict-
able allomorphs with less predictable ones.

3.6 Demorphologization and deepening complexity

Observed changes in Murrinhpatha increase the unpredictability of inﬂectional

allomorphs because of breakdown in the structure of intersecting formatives. In
this section we argue that this is a form of incremental demorphologization, where
allomorphic proliferation is associated with the breakdown of segmentability.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   71

Demorphologization in this sense is a complexifying force running counter to the

simplifying force of analogical levelling. Indeed, this demorphologization process
appears to be the same phenomenon that has been underway for a much longer
time period, leading to the unpredictability of implicational relations in the
classifier paradigms. Blurring of constituent boundaries between the inner stems
and affixal exponents in the classifier paradigms has produced the semi-
regularities of the inflectional system.
Ackerman & Malouf (2015) propose that the requirement for inflectional
allomorphs to be reasonably predictable, given knowledge of other forms of the
same lexeme, is a ‘strong evolutionary pressure in language’ (Ackerman & Malouf
2015: 7). They present their model for iteratively simplifying predictions as a
demonstration of how predictability might be achieved, though they do not claim
that this is the actual mechanism at work in the evolution of natural languages.¹⁴
The implementation of their mechanism for Murrinhpatha, compared to
observed changes in the language, suggests that a simplification mechanism of
this type is not in operation in the closed-class system of Murrinhpatha. But the
broader point remains valid: inflectional changes do appear to reflect analogies
drawn by speakers based on the paradigms of other lexemes. This point of view is
supported because the innovated forms in Murrinhpatha copy phonological
elements found in other classifier forms with which they share morphosyntactic
characteristics, rather than being purely phonological changes. But rather than
following a direct aggregation of probable allomorphs, there appear to be other
predictive influences at work—interference in the system, which leads to an
increase in integrative complexity.
Each of the innovations observed in Murrinhpatha has its own story, with
potential sources of analogy detectable upon investigation of paradigmatically
related forms. We here describe two of the innovations in particular, selected
because they illustrate a means by which allomorphic complexity may be
perpetuated, rather than reduced.¹⁵ As with all the observed changes, these are
not the forms selected by the Ackerman & Malouf (2015) simplification
mechanism.

¹⁴ In fact, their main argument focuses on the greater generality of their Low Conditional Entropy
Conjecture (Ackerman & Malouf 2013) as compared to the No Blur Principle (Carstairs-McCarthy
1994), which does not directly concern us here.
¹⁵ The other changes observed are potentially explicable by more subtle departures from the
Ackerman & Malouf (2015) simplification mechanism—for example, by weighting of comparable
classifier stems according to their respective entropies of prediction, with near-categorical predictors
given extra weight (2.(34).), or by allowing prediction to be based on phonological relation-
ships, including identity, rather than inflectional exponents (.(34).) (Bonami & Beniamine
2016). .(6). and..(7). seem to involve greater independence of formatives than has
been previously proposed for the system (Mansfield 2016). Satisfactory analysis of any of these
instances would require a separate study.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

72     

(6) 1.(34).
Older form
∅-a-me[]-∅
me
Ackerman & Malouf (2015) simplified Observed
∅-a-me[]-∅ ŋ-e-me[]-∅
me ŋeme
In the case of (6), there are two observed deviations from Ackerman & Malouf
(2015), the first of which is the selection of PrefC ŋ- instead of ∅-. Both ŋ- and ∅-
are in fact candidates implied by comparable classifier stems for various source
forms, with ∅- selected because it has an aggregate 0.73 probability among all
source forms, versus 0.27 for ŋ-. It is easy to imagine that this outcome might be
different, as in the observed innovation ŋeme, if there were some weighting in the
influence of source forms and comparable classifier stems. However the second
deviation from Ackerman & Malouf (2015) involves the introduction of PrefV e-,
and this is not even a candidate by analogy with comparable classifiers. Classifier
stems that do have PrefV e- are never selected as comparable, because none of the
ma ‘(34)’ source forms use this allomorph, as illustrated in Table 3.8. Rather,
the competing candidates are a- ~ u-. Notice, however, that 1.(34)., like
all (34). forms, has a StemVF [] alternation. It seems that rather than
arising from analogical prediction of PrefV allomorphy, the form ŋeme applies
vowel fronting beyond the morphological inner stem structure ma ~ me in which
the pattern is more generally established. On this view, the predicted form is
derived analogically from other forms, but the prediction of vowel fronting has
been inherited upwards into a morphological unit larger than the inner stem. Such
abrogation of the structural distinction between inner stem and prefix is perhaps
not surprising, given the widespread lack of phonological transparency in
Murrinhpatha classifier stems.

(7) 3.(28).
p-a-∅[]-ŋan
paŋan
Ackerman & Malouf (2015) simplified Observed
p-i-ɻa[:, :, :]-m p-i-ɾi[:, :, :]-m
piɻam piɾim
The case of (7) suggests more extensive breakdown of inner stem/affix structure in
the predictive mechanism. Here the observed deviations from the Ackerman &
Malouf (2015) simplification again include a consonant formative that is an
analogical candidate though not the aggregate strongest candidate, StemC []
instead of StemC [], which again could be accounted for in a system that
includes some weighting of candidates. The other deviation is in the vowel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   73

Table 3.8. Classiﬁer stem paradigm for ma ‘(34)’

ma ‘(34)’ INNER ma  na :

STEM: me : ne :, :
mi :, ni :, :, :
:
NFUT IRR (/FUT) PST PSTIRR
(/PRSL)

SG 1 ŋamam SG 1 ŋama me mi
ŋ-a-[]-m ŋ-a-[]-∅ ∅-u-[]-∅ ∅-u-[,
]-∅
2 nam 2 t̪ama ne ni
∅-a-[]- t̪- . . . ∅-u-[, ∅-u-[,
m ]-∅ ,]-∅
3 mam / 3 kama / pama me mi
kamam k- . . . / p . . . ∅-u-[]- ∅ ∅-u-[,
∅-a-[]- ]-∅
m / k- . . .
INCL 1+2 t a̪ mam INCL pama t̪ume t̪umi
t -̪ a-[]-m 1+2 p-a-[]-∅ t̪-u-[]-∅ t̪-u-[,
]-∅
PL/ 1 ŋamam PL 1 ŋujema ŋume ŋumi
DU ŋ-a-[]-m ŋ-uje-[]-∅ ŋ-u-[]-∅ ŋ-u-[,
]-∅
2 namam 2 nujema nume numi
n- . . . n- . . . n- . . . n- . . .
3 pamam / 3 kujema / pujema pume pumi
kamam k- . . . / p- . . . p- . . . p- . . .
p- . . . / k- . . .

DU 1 ŋujema ŋume ŋumi

ŋ-uje-[]-∅ ŋ-u-[]-∅ ŋ-u-[,
]-∅
2 nujema nume numi
n- . . . n- . . . n- . . .
3 kujema / pujema pume pumi
k- . . . / p- . . . p- . . . p- . . .

formatives StemVF [] and StemVH [], neither of which is predicted by

formative analogies. None of the source forms use such stem vowel alternations
(Table 3.9). Rather, the default inner stem vowel a is overwhelmingly predicted,
rather than the observed [, ] alternation i. The most obvious explan-
ation in this case is the existence of a 3. form piɾim in other classifiers, in
particular i ‘(1)’ and i ‘(2)’. This is another case of analogical relations being
drawn without respect to classifier-internal morphological structure; the compar-
able classifiers have the i vowel, though it is not determined by [, ]
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

74     

Table 3.9. Classiﬁer stem paradigm for ɾa ‘(28)’

ɾa ‘(28)’ INNER ɾa a :

STEM: ∅ :
NFUT (/PRSL) IRR (/FUT) PST PSTIRR

SG 1 ŋiɾaŋan SG 1 ŋiɾa ŋiɾa ŋiɾaŋi

ŋ-i-[]-ŋan ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ŋi
2 t̪iɾaŋan 2 t̪iɾa t̪iɾa t̪iɾaŋi
t̪- . . . t̪- . . . t̪- . . . t̪- . . .
3 diɾaŋan / kiɾaŋan 3 kiɾa / piɾa diɾa diɾaŋi
d- . . . / k- . . . k- . . . / p- . . . d- . . . d- . . .
INCL 1+2 t̪iɾaŋan INCL piɾa t̪iɾa t̪iɾaŋi
t̪-i-[]-ŋan 1+2 p-i-[]-∅ t̪-i-[]-∅ t̪-i-[]-ŋi
PL/DU 1 ŋaŋan PL 1 ŋiɻa ŋiɻa ŋiɻaŋi
ŋ-a-[]-ŋan ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ŋi
2 naŋam 2 niɻa niɻa niɻaŋi
n- . . . n- . . . n- . . . n- . . .
3 paŋam / kaŋam 3 kiɻa / piɻa piɻa piɻaŋi
p- . . . / k- . . . k- . . . / p- . . . p- . . . p- . . .
DU 1 ŋiɻa ŋiɻa ŋiɻaŋe
ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ŋe
2 niɻa niɻa niɻaŋe
n- . . . n- . . . n- . . .
3 kiɻa / piɻa piɻa piɻaŋe
k- . . . / p- . . . p- . . . p- . . .

alternations on an inner stem, but rather by an underlying inner stem vowel

(visible not in the default stem form, but only in forms with suppletive StemC).
Therefore the analogical mechanism depends on a shared morphosyntactic cat-
egory 3., and on some shared formatives, but ignores the patterns of inner
stem vowel defaults and alternations existent in other parts of the paradigm. Again
it draws a phonological analogy that abrogates inner stem/affix structure.
In historical reconstruction, ‘demorphologization’ has been used to describe
phonological material that at one point constitutes a regular, predictable mor-
pheme, and at some later point loses its connection to morphological patterns
from which it derived. For example, the final rime of seldom derives from
Old English dative *-um, while the m in French rompre ‘break’ derives from a
nasal infix associated with present tense in Latin (Klausenburger 1976; Hopper
1990). Each of these was once an inflectional exponent, because it was part of
a form:meaning pattern shared by an inflectional class of lexemes, but the
dissolution of these patterns has left them absorbed into lexical stems. The recent
innovations observed in Murrinhpatha 1.(34). and 3.(28).
do not begin from a clear ‘morphemic’ unit in this way, as predictable form:meaning
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   75

relations in the classifier stem morphology have already long given way to lexically
specific, unpredictable allomorphy. But the changes nonetheless reflect incremental
steps on the path of demorphologization, undermining the morphological structure
of the classifier stem. Every time a paradigmatic cell in the system shifts from a more
predictable allomorph to a less predictable one, the formative structure of the system
is incrementally undermined. Processes of this type are probably responsible for
much of the integrative complexity in Murrinhpatha verbs—though pursuit of this
hypothesis would depend on more extensive historical reconstruction than is pres-
ently available (Green 2003).

3.7 Conclusions

In this chapter, we have investigated changes in Murrinhpatha classiﬁer stem

paradigms, a closed-class system with high integrative complexity. The system of
intersecting formatives underlying the exponence of person, number, and tense on
Murrinhpatha verb classifier stems is unusually complex, in terms of both wealth
of allomorphy and unpredictability of paradigmatic relations. We have studied
changes unfolding in this system with the goal of determining whether observed
changes reduce or increase the complexity of the system. Seven likely changes in
progress were identified, based on variable exponents where younger speakers
showed a strong preference for an innovative variant, as opposed to the conser-
vative variant favoured by older speakers. Calculation of the most predictable
allomorphs for these exponents was performed by adapting the model of
Ackerman & Malouf (2015), but none of the seven observed changes were selected
as expected by this model. Nor were the changed forms more predictable than the
incumbent forms they replaced—in fact, in six of the seven instances, the inno-
vated form was less predictable. Analysis of the analogical sources for two of the
forms suggests that less predictable forms have been selected by speakers because
of analogies that abrogate the inner stem/affix structure evident in the system. The
extensive phonological mutation already undergone by the inner stem elements
has no doubt led to this further obfuscation of inner stem elements, deepening the
overall complexity of the system.
Incremental demorphologization produces integrative complexity, but also
adds to opacity in structure. We have observed this in a closed-class system of
thirty-nine members, but also argued that the problem of integrative complexity
presupposes a closed class of some size. The size of the Murrinhpatha paradigms,
with 1,638 forms in total, presumably allows for some degree of whole-form
memorization. But evidence observed in analogical changes also shows that
implicational relations are active in acquisition or processing, and not all forms
are learnt and stored in isolation. We hope that further research on integrative
complexity will provide more insight into how analogy and memorization interact
in complex inflectional systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

76     

Appendix

Illustrated below are the inﬂectional paradigms for classiﬁers discussed in this chapter.
The paradigms for (34) and (28) are illustrated in the body of the text.

/ø/ ‘(1)’ INNER /∅/ 

STEM: /ɾi/ :
/ju/ : (), :
NFUT IRR (/FUT) PST PSTIRR
(/PRSL)

SG 1 ŋem SG 1 ŋi ŋini ŋini

ŋ-e-[]-m ŋ-i-[]-∅ ŋ-i-[]-ni ŋ-i-[]-ni
2 t ̪im 2 t ̪i t ̪ini t ̪ini
t ̪-i-[]-m t ̪- . . . t ̪- . . . t ̪- . . .
3 dim / kem 3 ki/ pi dini dini
d-i-[]-m / k k- . . . / d- . . . d- . . .
-e-[]-m p- . . .

INCL 1+2 t ̪im INCL pi t ̪ini t ̪ini

t ̪-i.[].m 1+2 p-i.[].∅ t ̪-i-[]-ni t ̪-i-[]-ni
PL/ 1 ŋaɾim PL 1 ŋuju ŋaɾini ŋaɾini
DU ŋ-a-[]-m ŋ-u-[. ŋ-a-[]-ni ŋ-a-[]-ni
]-∅
2 niɾim 2 nuju niɾini niɾini
n-i-[]-m n- . . . n-i-[]-ni n-i-[]-ni
3 pirim / kaɾim 3 kuju / puju piɾini piɾini
p-i-[]-m / k- . . . / p-i-[]-ni p-i-[]-ni
k-a-[]-m p- . . .
DU 1 ŋe ŋaɾine ŋaɾine
ŋe.[].∅ ŋ-a-[]- ŋ-a-[]-
ne ne
2 ne niɾine niɾine
n- . . . n-i-[]-ne n-i-[]-ne
3 ke / pe piɾine piɾine
k- . . . / p- . . . p-i-[]-ne p-i-[]-ne
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   77

/ɾu/ ‘go(6)’ INNER /ɾu/ / /ɾa/ /ɾi/ 

STEM: ɻu/  /je/  (), ,
/∅/  
/ji/  (),

/mpa/ 
(), 

NFUT IRR (/FUT) PST PSTIRR

(/PRSL)
SG 1 ŋuɾan SG 1 ŋuɾu ŋuɾini ŋuɾi
ŋ-u-[]-n ŋ-u-[]-∅ ŋ-u- ŋ-u-
[]- []-∅
ni
2 t ̪uɾan 2 t ̪uɾu t ̪uɾini t ̪uɾi
t ̪- . . . t ̪- . . . t ̪- . . . t ̪- . . .
3 wuɾan / 3 kuɾu / puɾu wuɾini wuɾi
kuɾan k- . . . / p- . . . w- . . . w- . . .
w- . . . / k- . . .

INCL 1+2 t ̪uɾan INCL puɾu t ̪uɾini t ̪uɾi

t ̪-u-[]-n 1+2 p-u-[]-∅ t ̪-u- t ̪u-
[]-ni []-∅
PL/ 1 ŋumpan PL 1 ŋuɻu ŋuɳi ŋuji
DU ŋ-u-[, ŋ-u-[]-∅ ŋ-u-[, ŋ-u-[,
]-n ]-ɳi ]-∅
2 numpan 2 nuɻu nuɳi nuji
n- . . . n- . . . n- . . . n- . . .
3 pumpan / 3 kuɻu / puɻu puɳi puji
kumpan k- . . . / p- . . . p- . . . p- . . .
p- . . . / k- . . .

DU 1 ŋa ŋuɳe ŋuje
ŋ-a-[]-∅ ŋ-u-[, ŋ-u-[,
]-ɳe ,
]-∅
2 na nuɳe nuje
n- . . . n- . . . n- . . .
3 ka / pa puɳe puje
k- . . . / p- . . . p- . . . p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

78     

/nu/ ‘(7)’ INNER /nu/  /nnu/ :

STEM: /ni/ : /nni/ :, :
/nuj/ : /nna/ :, : 
 /nne/ :, : , :
/na/ : 
NFUT (/PRSL) IRR (/FUT) PST PSTIRR

SG 1 ŋunuŋam SG 1 ŋunu ŋuna ŋuni

ŋ-u-[]-ŋam ŋ-u-[]-∅ ŋ-u-[]-∅ ŋ-u-[]-∅
2 t ̪unuŋam 2 t ̪unu t ̪una t ̪uni
t ̪- . . . t ̪- . . . t ̪- . . . t ̪- . . .
3 nuŋam / 3 kunu / punu na* nuj
kunuŋam k- . . . / p- . . . ∅- . . . ∅-u-[]-∅
∅- . . . / k- . . .
INCL t ̪unuŋam INCL punu t ̪una t ̪uni
1+2 t ̪-u-[]-ŋam 1+2 p-u-[]-∅ t ̪-u-[]-∅ t ̪-u-[]-∅
PL/DU 1 ŋunnuŋam PL 1 ŋunnu ŋunni ŋunni
ŋ-u-[]-ŋam ŋ-u-[]-∅ ŋ-u-[, ŋ-u-[,]-∅
]-∅
2 nunnuŋam 2 nunnu nunni nunni
n- . . . n- . . . n- . . . n- . . .
3 punnuŋam / 3 kunnu / punnu punni punni
kunnuŋam k- . . . / p- . . . p- . . . p- . . .
p- . . . / k- . . .
DU 1 ŋunna ŋunna ŋunne
ŋ-u-[, ŋ-u-[, ŋ-u-[,,
]-∅ ]-∅ ]-∅
2 nunna nunna nunne
n- . . . n- . . . n- . . .
3 kunna / punna punna punne
k- . . . / p- . . . p- . . . p- . . .

Note: Street (1987) in addition lists a variant /nuɻa/ use.feet.3.. This variant does not appear in our
corpus data.

/la/ ‘(26)’ INNER /la/ /lla/ :

STEM:
NFUT (/PRSL) IRR (/FUT) PST PSTIRR

SG 1 ŋilam SG 1 ŋila ŋila ŋila

ŋ-i-[]-m ŋ-i-[]-∅ ŋ-i-[]-∅ ŋiŋ-i-[]-ŋi
2 t ̪ilam 2 t ̪ila t ̪ila t ̪ilaŋi
t ̪- . . . t ̪- . . . t ̪- . . . t ̪-i-[]-ŋi
3 dilam / kilam 3 kila / pila dila dilaŋi
d- . . . / k- . . . k- . . . / p- . . . d- . . . d-i-[]-ŋi
INCL 1+2 t ̪ilam INCL pila t ̪ila t ̪ilaŋi
t ̪-i-[]-m 1+2 p-i-[]-∅ t ̪-i-[]-∅ t ̪-i-[]-ŋi
PL/DU 1 ŋillaŋam PL 1 ŋilla ŋilla ŋillaŋi
ŋ-i-[]-ŋam ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ŋi
2 nillaŋam 2 nilla nilla nillaŋi
n- . . . n- . . . n- . . . n- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   79

3 pillaŋam / killaŋam 3 killa / pilla pilla pillaŋi

p- . . . / k- . . . k- . . . / p- . . . p- . . . p- . . .
DU 1 ŋilla ŋilla ŋillaŋi
ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ŋi
2 nilla nilla nillaŋi
n- . . . n- . . . n- . . .
3 killa / pilla pilla pillaŋi
k- . . . / p- . . . p- . . . p- . . .

/ɾa/ ‘. (36)’ INNER /ɾi/ 

STEM: /ɻi/ :
/∅/ :
NFUT (/PRSL) IRR (/FUT) PST PSTIRR

SG 1 ŋiɾim SG 1 ŋiɾi ŋiɾi ŋiɾini

ŋ-i-[]-m ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-ni
2 t ̪iɾim 2 t ̪iɾi t ̪iɾi t ̪iɾini
t ̪- . . . t ̪- . . . t ̪- . . . t ̪- . . .
3 diɾim / kiɾim 3 kiɾi / piɾi diɾi diɾini
d- . . . / k- . . . k- . . . / p- . . . d- . . . d- . . .
INCL 1+2 t ̪iɾim INCL piɾi t ̪iɾi t ̪iɾini
t ̪-i-[]-m 1+2 p-i-[]-∅ t -̪ i-[]-∅ t ̪-i-[]-ni
PL/DU 1 ŋim PL 1 ŋiɻi ŋi ŋiɻi
ŋ-i-[]-m ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-∅
2 nim 2 niɻi ni niɻi
n- . . . n- . . . n- . . . n- . . .
3 pim / kim 3 kiɻi / piɻi pi piɻi
p- . . . / k- . . . k- . . . / p- . . . p- . . . p- . . .
DU 1 ŋiɻi ŋi ŋiɻi
ŋ-i-[]-∅ ŋ-i-[]-∅ ŋ-i-[]-∅
2 niɻi ni niɻi
n- . . . n- . . . n- . . .
3 kiɻi / piɻi pi piɻi
k- . . . / p- . . . p- . . . p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

80     

Acknowledgements

This research is funded by the Australian Research Council Centre of Excellence for
the Dynamics of Language (Project ID: CE140100041). We are greatly indebted to the
people of Wadeye, Australia, who have generously shared their knowledge of
Murrinhpatha with us. We also thank Peter Arkadiev and Francesco Gardani for
inviting us to present at the workshop which led to this volume, and for their
comments on our original submission. Bill Forshaw, Jeff Parker, and an anonymous
reviewer also provided insightful comments, as did audience members of the
‘Morphological Complexity’ workshop at Societas Linguistica Europaea (SLE), 2015.
We dedicate this chapter to the late Chester Street, whose detailed documentation
work revealed the extraordinary complexity of Murrinhpatha verbs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

4
Overabundance resulting from
language contact
Complex cell-mates in Gurindji Kriol
Felicity Meakins and Sasha Wilmoth

4.1 Introduction

One of the oft claimed results of language contact is the reduction of morphological
complexity. For example, syncretism, allomorphic simplification, the difficulty of
transferring morphemes, and increased paradigmatic regularity are all observed
outcomes of contact-induced change (e.g., McWhorter 1998; Myers-Scotton 2002;
Janse & Tol 2003; Gardani 2008). These processes reduce the expression of
morphological features, for example case, tense/aspect/mood (TAM), gender,
and number; and the complexity of relationships between cells in paradigms
expressing these features. In this sense, these changes represent an absolute
decrease in the number of morphosyntactic distinctions that a language makes
both in terms of the internal structure of words and their arrangement into
inflectional classes. This type of morphological complexity has been termed
‘complexity of exponence’ (Anderson 2015a: 20) or ‘E(numerative) complexity’
(Ackerman & Malouf 2013: 433; see also section 1.3.1 in the Introduction to this
volume). Such changes can be quantified as a measure of average paradigm
entropy, that is, the degree of uncertainty in predicting the content of a particular
cell in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013; Parker &
Sims, Chapter 2, this volume).
One area of complexity, which Anderson (2015a: 22) notes as having received
less attention in the morphological literature, is variation within the cells of a
paradigm, for example ‘dived’ and ‘dove’ which are different word forms of the
past tense form of {} in English. Thornton (2011) calls this type of complexity
‘overabundance’. Overabundance refers to multiple forms being realized within
the same cell in a paradigm, or lexemes with ‘cell-mates’, as Loporcaro quips
(see Loporcaro & Paciaroni 2011: 420 and Loporcaro, Chapter 6, this volume).
Thornton observes that variation between cell-mates may be subject to sociolin-
guistic and syntactic-semantic conditions.

Felicity Meakins and Sasha Wilmoth, Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol
In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Felicity Meakins and Sasha Wilmoth.
DOI: 10.1093/oso/9780198861287.003.0004
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

82     

In this chapter, we demonstrate that overabundance can increase in situations

of language contact and, therefore represent an increase in E-complexity due to
the proliferation of exponents, in this case, cell-mates. Perhaps more interestingly,
we also suggest that overabundance represents an increase in I(ntegrative) com-
plexity, that is, increased within-cell variation makes it harder for speakers ‘to
make accurate guesses about unknown forms of words based on exposure to
known forms’ (Ackerman & Malouf 2013: 436; see also section 1.3.2 in the
Introduction to this volume). Usually I-complexity refers to how speakers are
able to surmise a word form in one cell in a paradigm based on other forms in the
same paradigm. In this chapter, we show how overabundance requires speakers to
make calculated choices about forms based on features beyond the paradigm.
We also show that the I-complexity of overabundance can be measured using
generalized linear mixed models (GLMM) which probabilistically measure the
use versus non-use of a feature (dependent variable) against semantic, grammat-
ical, and information structure features in a clause (independent variables or
predictors) and their interactions, within a cluster of idiolects (random variable)
(Pinheiro & Bates 2000; Baayen 2008; Marschner 2011). The relative importance
of the predictors can then be determined using dependence analysis (Azen &
Traxel 2009).
We present a case study of the development of overabundance in the subject-
marking system of an Australian mixed language, Gurindji Kriol, and claim that
this dimension of complexity is the result of language contact. Furthermore, we
assess whether this complexity has stabilized in second-generation child speakers
of Gurindji Kriol. This complexification and subsequent stabilization due to
contact is reflected experimentally in Berdicevskis & Semenuks (Chapter 11,
this volume).
Overabundance in Gurindji Kriol manifests itself as optional case marking and
involves variation within a cell, that is, the use or non-use of a case suffix where the
grammatical role of the nominal is unaffected by non-use (cf. McGregor &
Verstraete 2010). This pattern is shown in sequential clauses in (1) where the
subject is marked in the first clause and unmarked in the second clause.¹

(1) Warlaku na bi-ngku bin jeij-im im

dog  bee-  chase- 3.
dat mukmuk-Ø bin jeij-im dat karu na
the owl-  chase- the child 
‘The bees chased the dog and the owl chased the child.’
(BP: 9yrs: FM13_35_3e: Frog story: 2:10min)

¹ In all examples, Gurindji elements are given in italics, Kriol in plain font and subjects are bolded.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    83

Meakins (2009, 2015) shows that optional subject marking developed as a result
of contact between Gurindji and Kriol whereby the Gurindji ergative marker was
retained in the process of the formation of the mixed language, Gurindji Kriol,
but became optional and was later reanalysed as nominative marking when it
also came to mark intransitive subjects. In this respect, overabundance devel-
oped in the nominative cell of the case paradigm where an alternation now exists
between the forms -ngku/-tu and a zero morph (or nothing, depending on one’s
theoretical approach). Variation is driven by a number of semantic, syntactic,
and information structure features including transitivity and word order
(Meakins 2009; Meakins & O’Shannessy 2010). This optional case marking
system requires speakers of Gurindji Kriol to constantly monitor the clause
and its place in the discourse to make decisions about whether to overtly express
subject marking or not. Thus in this chapter, we make the case that over-
abundance in Gurindji Kriol is an example of a contact-induced change,
which involves the complexification of an inflectional paradigm rather than its
simplification.
In particular, we examine the further development of overabundance in subject
marking using new data from Gurindji children to determine whether the com-
plexity in the case paradigm has stabilized or whether complexification is on-
going. Changes in overabundance are quantified along two dimensions using
different quantitative methods: (i) the change between generations of Gurindji
speakers in the contribution of different predictors to the use of subject marking is
shown through GLMM (Marschner 2011); and (ii) generational differences in the
relative contribution of the different factors is demonstrated using dominance
analysis (Azen & Traxel 2009).

4.2 Dimensions and measures of morphological

complexity in language contact

Numerous studies have shown instances of the reduction of morphological

complexity, particularly in inflectional paradigms, in situations of language con-
tact (see Miestamo et al. 2008 for a recent collection of papers). There are a
number of dimensions which can be affected by simplification processes.
Fundamentally, languages that have morphology are considered to be more
complex than languages which do not, that is, isolating languages (Sapir 1921;
Anderson 1992) (see section 1.2 in the Introduction to this volume). Extreme
cases of language contact such as creolization have also been shown to have a
radically reductive effect on inflectional morphology (see Miestamo et al. 2008 for
a recent collection of papers, and Henri, Stump, & Tribout, Chapter 6, this
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

84     

volume, and McWhorter, Chapter 10, this volume, for further discussions).²
Similarly, inflectional morphology is rarely borrowed or switched into the gram-
matical frame of another language (Myers-Scotton 2002; Aikhenvald & Dixon
2006; Matras & Sakel 2007; Gardani 2008).
Where inflectional morphology remains in situations of language contact,
different dimensions of complexity are affected. In particular what Anderson
(2015a: 20) terms the ‘complexity of exponence’ or Ackerman & Malouf (2013:
433) call ‘E(numerative) complexity’ often undergoes reduction. For example
syncretism, allomorphic simplification, and increased paradigmatic regularity
are all observed outcomes of contact-induced change and language obsolescence
(Dorian 1978; Gal 1989; Janse & Tol 2003). All of these processes reduce the
exponence of morphological features such as case, TAM, gender, and number, and
the complexity of relationships between cells within paradigms expressing these
features. At the extreme end, these features gather up their morphological skirts
and step out of paradigms and into periphrastic constructions, thereby transform-
ing from synthetic forms into analytic forms (see de Groot’s 2008 study of
Hungarian in contact for a recent example). Paradigmatic complexity can be
measured as ‘entropy’ which captures the degree of predictability of forms in a
paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013). Entropy has been
used to measure the relative complexity of different languages (see also Stump &
Finkel 2016 for related work), however it can also be used to measure changes in
complexity across time within the same language (see Mansfield and Nordlinger,
Chapter 3, this volume, for a case study of Murrinhpatha).
As Anderson (2015a: 22) has noted, a dimension of complexity which has
received less attention in the morphological literature is variation within the cells
of a paradigm, for example the ‘dived’ and ‘dove’ examples given in section 4.1—
and many more examples of co-existing regular and irregular past tense and plural
forms in English. Thornton (2011) calls the exponence of multiple forms in the
same cell in a paradigm ‘overabundance’. Overabundance (which can be thought
of as morphological ‘cell-mates’) is defined as ‘a cell in a paradigm . . . filled by two
or more synonymous forms which realize the same set of morpho-syntactic
properties’ (Thornton 2011: 2). She uses the Italian verb paradigm to demonstrate
how variation between forms is motivated by different phonological and
syntactic-semantic conditions.
Thornton’s examples of overabundance mostly involve cases of language
change and the regularization of inflectional paradigms. In this scenario, an
irregular form co-exists with a newer regularized form. Processes of regularization
are one source of variants. We argue that contact with another language provides
another source of variants. It is common for multiple forms from different

² Although, see a number of surveys (Plag 2003a, 2003b; Roberts & Bresnan 2008) and counter-
surveys (DeGraff 2005; Parkvall 2008; Bakker et al. 2011; Henri & Kihm 2015) in response to this claim.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    85

languages to co-exist with their use determined by other features in the clause. To
give another example from English, possession is expressed by the s-genitive
(<Saxon), the of-genitive (also Germanic, but its rise in usage is attributed to
contact with Norman French) and also by an innovative double genitive, such as
‘the book of John’s’. The choice of form largely depends on possessor animacy, the
weight of the possessive phrase, topicality, and definiteness (see Kreyer 2003; Abel
2006; O’Connor et al. 2013; Payne 2013 for overviews). The development of the
double genitive and the rise in usage of the of-genitive occurred during the
formation of Middle English as a result of contact with Norman French and
represents ‘overabundance’ and a complexification in the expression of
possession.
Variation in the expression of particular functions can be probabilistically
modelled. For example, O’Connor et al. (2013) use logistic regression to demon-
strate the factors which drive the differential use of the s-genitive versus the of-
genitive in English. GLMM provide a more appropriate procedure for measuring
the use versus non-use of a linguistic feature such as a type of possessor marking
against other features such as animacy and end-weight. The advantage of GLMM
over normal logistic regression is the use of random variables such as ‘speaker’,
which means the model is able to take into account differing degrees of contri-
bution of data to a corpus and the fact that speakers behave more like themselves
than other speakers (i.e., idiolectal variation) (Baayen 2008; Pinheiro & Bates
2000; Marschner 2011). GLMM analysis is commonly used in Probabilistic Syntax
studies to quantify grammatical variation (e.g., Bresnan 2007; Meakins &
O’Shannessy 2010; Bresnan & Ford 2013) and is increasingly replacing Varbul/
Rbrul in quantitative sociolinguistics.
Although GLMM analysis has not been previously used to measure complexity,
we suggest it provides a useful measure of complexity. In this case, what is being
measured is not E-complexity but rather I-complexity, that is, the predictability of
word forms based on other features. In the case of overabundance, the relevant
features go beyond the paradigm to other features of the clause or discourse. In
particular, we suggest the R² value is a useful metric for measuring both the overall
I-complexity of a form which is variably realized within a cell, and the relative
contribution of predictors to its use. In regression models, R² is a measure of how
well the independent variables predict the variable use of the dependent variable.
In the case of mixed models, two R² values can be calculated—conditional
R-squared (R²C) and marginal R-squared (R²M) (Nakagawa & Schielzeth 2013:
136). R²C calculates variance based on both fixed effects (dependent variables or
predictors) and random effects, and therefore takes account of all factors includ-
ing speaker variation, which are contributing to variation in the data set. The level
of I-complexity of overabundance or within-cell variation is measured by the
number of semantic, grammatical and information structure features (predictors)
required to reach a reportable R²C value or to increase an R²C (while not over-fitting
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

86     

a model). The more variables a speaker needs to take into account when making
online decisions about how express variable linguistic features, the higher the
morphological complexity.
The R² value is also useful in determining the relative contribution of the
individual predictors to the variable use of a particular linguistic feature. Using
a method called dominance analysis, the R²M value is calculated for the individual
predictors as a proportional contribution to the overall R²M value of the mixed
model in order to determine which are the strongest predictors of the use of a
particular linguistic feature. Where the R²C value in GLMM analysis provides an
overall assessment of the complexity of overabundance, dominance analysis
provides a more nuanced picture of the individual contribution of linguistic
features to this complexity.
A similar measure, referred to as ‘factor weights’, is used in Varbul/Rbrul and
has been used to determine the influence of one language on another in situations
of language contact. For example, Meyerhoff (2009) uses factor weights to gauge
substrate influence of Tamambo on the variable expression of subject pronom-
inals in the variety of Bislama spoken on Malo Island, specifically whether null
subjects in Bislama are more likely if the referent was human and topical, as is the
case in Tamambo. Meyerhoff finds that, although the forms of subject pronouns
are different in Tamambo and Bislama, the relative effect of humanness and
topicality in Tamambo pronoun usage has been transferred in the development
of Bislama. In a similar vein, dominance analysis can be used to compare the
relative importance of variables between two languages in contact or across
generations which is the case study presented in this chapter.
In the following sections, we describe as an instance of abundance, optional
subject marking in Gurindji Kriol, and argue that the within-cell variation repre-
sents an instance of increased complexity in a situation of language contact.

4.3 Optional subject marking in Gurindji Kriol

Gurindji Kriol is a mixed language spoken in a number of Aboriginal communities³

in the Victoria River District of northern Australia, which is located around 900
kilometres southwest of Darwin. It is now the main language of Gurindji people
in Daguragu and Kalkaringi, Bilinarra people in Pigeon Hole, and Ngarinyman
people in Yarralin. Figure 4.1 shows the location of these communities and
Aboriginal groups. The data for this chapter comes from Daguragu and, to a
lesser extent, Kalkaringi.

³ Aboriginal communities in Australia are similar to many Indian reservations in the United States,
in that the majority of residents are Indigenous.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    87

ARAFURA SEA

TIMOR Nhulunbuy
Darwin

SEA

A R N H E M L A N D

Bulman

Barunga
Katherine
Ngukurr
Beswick
AY

AY
GULF OF
W

W
H
Jaminjung Mataranka
H

R HIG
IG

ROPE
H

CARPENTARIA
Timber Nungali
ST

Creek Alawa
U

Marra
A
RT

OR IA Ngaliwurru
V IC T Bulla Karrangpurru/
Ngarinyman Karranga Daly Waters Borroloola
Top Springs
Yarralin Binbinka
Amanbidji Dunmarra
AY
Pigeon W
Hole G
H
Kuwarrangu Warranganku Cape Crawford
Jingulu
I
H

Bilinarra Western Jingulu

Mudburra Marlinja Beetaloo Ngarnka Gudanji
Daguragu (Newcastle
Malngin Waters) Elliott
Kalkaringi Eastern
NE
BU NTI Mudburra
Gurindji Lake
Wambaya
Woods

Lajamanu Warlmanpa
HI
GH
WA
Y

Jaru Warumungu
A U S T R A L I A

B AR KL
Warlpiri Tennant
Y
H
IG
Creek H
W
AY

N O R T H E R N
Q U E E N S L A N D
W E S T E R N

T E R R I T O R Y
N

0 100 2 0 0 km
Area covered
Scale by this map

Figure 4.1. Traditional languages and Aboriginal communities of the Victoria River
District
Source: Meakins & Nordlinger (2014: xxxiii)

Gurindji Kriol is derived from Gurindji (Ngumpin-Yapa, Pama-Nyungan), the

traditional Australian language of the region, which is also closely related to
Bilinarra and Ngarinyman; and Kriol, the English-lexiﬁer creole language spoken
across much of northern Australia. Gurindji Kriol combines the lexicon and
structure of these two languages, with the noun phrase structure including case
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

88     

markers, such as -ngku ‘nominative (ex-ergative)’ in (2) and (3), largely derived
from Gurindji; and the verb phrase structure including the TAM auxiliaries, such
as bin ‘past’ in (2) and (3), mostly reﬂecting that of Kriol.⁴

(2) WB an LD an nyuntu-Ø, yumob bin jayijayi

 and  and 2- 2  chase
jurlaka na lingi, WB-ngku baldan na
bird  persistent - fall.over 
‘WB, LD and you chased a bird with persistence. WB falls over then.’
(Meakins 2015: 190)

(3) a. Dat mukmuk bin kuli la=im

the owl  attack =3.
‘The owl attacked (the boy).’
b. Dem bi-ngku kuli la=im dat warlaku=warla
those bee- attack =3. the dog=
‘(And) the bees really went for the dog instead.’ (Meakins, 2009: 82)

Of relevance for this chapter is the alternation in the nominative cell of the
Gurindji Kriol case paradigm between zero and -ngku (and its consonant-final
allomorph -tu). This alternation is an example of what Thornton terms ‘over-
abundance’, that is, zero and -ngku/-tu can be analysed as cell-mates in the case
paradigm because the use and non-use of subject marking does not affect the
grammatical function of the stem. For example, in (2) and (3), the subject is
unmarked in the first clause and marked in the second clause, but both are
unambiguously subjects.
Overabundance in the expression of subject marking in Gurindji Kriol is the result
of language contact, and involves the complexification of an inflectional paradigm
rather than simplification. The combined story presented in Meakins (2009, 2015)
and Meakins & O’Shannessy (2010) argues that the subject marker originated in the
Gurindji ergative marker which was grammatically obligatory. During the forma-
tion of the mixed language, the ergative marker came into contact with Kriol,
which has a nominative system with argument differentiation performed by
word order and some pronoun forms rather than case marking. This contact had
two main effects on the argument marking system of the emergent mixed
language: (i) the Gurindji ergative marker became optional in Gurindji Kriol,
and acquired additional discourse-marking properties, specifically highlighting
the agentivity of subjects; and (ii) a change in case alignment occurred where the
ergative marker was extended to intransitive subjects thereby being reanalysed
as a nominative marker via a stage of optional ergativity.

⁴ The structure of Gurindji Kriol is described in detail elsewhere (Meakins 2011, 2013).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    89

This example of overabundance contrasts with other changes in Gurindji Kriol

which can be dubbed as ‘simplification’ or a decrease in E-complexity, for example
allomorphic reduction in subject marking from seven forms to two (Meakins
2011: 23, 30), as shown in Table 4.1; and syncretism in the marking of transitive
and intransitive subjects, which precipitated a shift from an ergative system to a
nominative system (Meakins 2015). This change has progressed further in a small
group of children who now show syncretism between subject and possessive
marking and therefore an emergent relative system where subject and possessive
marking show syncretism, as shown in Table 4.2.
Unlike these processes of reduction and syncretism, overabundance in subject
marking represents a process of complexification, both of E-complexity and
I-complexity. Speakers no longer simply apply case marking obligatorily to
subjects but instead have an increase in forms to manage (E-complexity). In
addition they must attend to other syntactic, semantic, and pragmatic elements
in the clause (I-complexity) and the broader discourse context to use the case
marker appropriately.
This relationship between the use of subject marking and other elements in the
clause was quantified in Meakins (2009; and later in a comparison with Light
Warlpiri in Meakins & O’Shannessy 2010) using a GLMM analysis for a set of

Table 4.1. Allomorphic reduction in subject marking in Gurindji Kriol

  
-  -ngku - -ngku
 -rlu
-  -kulu - -tu
 -tu, -rtu
 -tu, -rtu
 -ju

Table 4.2. Comparison of case systems and allomorphy across three generations
(syncretisms within generations bolded)
gurindji gurindji kriol gurindji kriol
generation 1 generation 2
ergative1 dative2 nominative3 dative4 relative
v-final -ngku, -lu, -ku -wu -ngku -wu, -yu -ngku

c-final -tu, -ju, -kulu -ku -tu -ku, -tu -tu

Notes: ¹(Meakins et al. 2013: 20–1); ²(Meakins et al. 2013: 22); ³(Meakins, 2011: 26); ⁴(Meakins, 2011: 23).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

90     

1,917 transitive subjects produced by adult speakers of Gurindji Kriol. The use of
the subject marker was tested against ten sociolinguistic, grammatical, and seman-
tic variables. It was found that the ergative marker was significantly more likely to
appear if the agent was inanimate, positioned post-verbally and in conjunction
with a co-referential pronoun. In addition, the use of the ergative marker signifi-
cantly decreased when the verb was marked with continuous aspect and the event
denoted by the verb had not come to completion. These results were interpreted as
an indication that, although subject marking still had some argument disambigu-
ation functions, it had acquired discourse properties; specifically, its presence
highlighted the agentivity of a subject nominal. Similar analyses exist for other
languages with optional ergative marking such as Gooniyandi and Kuuk
Thaayorre (see McGregor 2010 for an overview). This analysis was later extended
to intransitive subjects in Meakins (2015). Whether or not this variability has
stabilized in the second generation of Gurindji Kriol speakers or has undergone
further shift is discussed in the following section.

4.4 Changes in the complexity of subject marking

This section builds on previous probabilistic observations about the variable use of
the subject (ex-ergative) marker in Gurindji Kriol by remodelling the original
dataset and augmenting it with data from intransitive clauses (cf. Meakins 2015).
It then uses data from the second generation of Gurindji Kriol speakers to quantify
changes in the use of subject marking. Any difference between Gurindji adults and
children is assumed to represent a language change scenario (cf. the apparent-time
hypothesis; Labov 1963).
Overabundance is presented as a new dimension of morphological complexity,
particularly in situations of language contact and change. Two aspects of over-
abundance are modelled using different quantitative methods: (i) the contribution
of different factors to the use of subject marking across time is shown through
mixed models (Marschner 2011); and (ii) the relative contribution of the different
factors is demonstrated using dominance analysis (Azen & Traxel 2009). We
begin by examining the use of subject marking in the adult data and then compare
it with the children’s data.

4.4.1 Data

The data for this study are 3,575 instances of transitive and intransitive subjects
from ﬁfty adult Gurindji Kriol speakers (18–35-year-olds) and 2,975 instances of
transitive and intransitive subjects from ﬁfty-three child Gurindji Kriol speakers
(8–14-year-olds). The speakers represent around 20% of the Gurindji population
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    91

at Kalkaringi/Daguragu. The data is derived from eighty hours (57,179 clauses) of

transcribed and (mostly) translated recordings which are sound-linked in CLAN
and %mor-coded. That is, each morpheme is coded according to information
relevant to this study, including transitivity, animacy, part of speech, etc., as
shown in the third line of (4):

(4) Dat yawarta=ma leg i bin kil-im fens-tu

‘The fence hit the horse’s leg.’ (CE: FHM014.cha: 3:32min)
Dat yawarta =ma
the horse =TOP
demjdat&k=the n:animaljyawarta&g=horse suf:topjma&g=TOP
leg i bin
leg 3SG.S PST
n:bpjleg&k=leg wproji&3SG/S&k=he/she/it\P v:auxjbin&PAST&k=PST
kil -im fens
hit -TR fence
v:tranjkilim&k=hit sufjim&TR&k=TR n:inanimatejfens&k=fence\S
-tu
-NOM
case:ergjtu&g=ERG
Data was extracted in a tab-delimited ﬁle using a Python script which identiﬁed
subjects and coded the rest of the clause according to the predictors under
investigation. The script also was able to ‘look back’ in transcripts and identify
the previous full subject nominal and whether or not it was marked, as an
indication of priming. These variables are discussed in the following section.

4.4.2 Procedure

Two GLMMs with a logistic link function (glmr; glm2 package in R)⁵ (Marschner
2011) were applied to the adult and child data in turn to see what affected the use
of subject marking in Gurindji Kriol across time. Separate dominance analyses
were then run on the different models to test the relative effects of the variables on
the use of subject marking (Azen & Traxel 2009).⁶, ⁷

⁵ http://cran.r-project.org/web/packages/glm2/glm2.pdf
⁶ Note that the adult and child data were not run in a single model. Separate models were required
for the dominance analysis to tease out the relative effects of the different dependent variables or
predictors in the adult data set versus the child data set.
⁷ No R package yet exists for dominance analysis although the preliminary functions have been
developed by Claudio Bustos and are available on https://github.com/clbustos/dominanceAnalysis In
this chapter, we performed the calculations manually through a series of R² calculations as discussed in
section 4.4.3.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

92     

All tokens were coded for a dependent variable (use of a subject marking), for
example present in (5) and (7), and absent in (6). Each token was also coded for
six independent variables: (i) clause transitivity, for example transitive in (5) and
(6), and intransitive in (7); (ii) the relative order of the subject and verb, for
example VS in (5), and SV in (6) and (7); (iii) the animacy of the subject,
for example inanimate in (5), and animate in (6) and (7); (iv) the presence vs.
absence of a co-referential pronoun in the clause, for example present in (5) and
(7), and absent in (6); (v) the actualization of event, for example underway in (6),
not yet happened in (7), and completed in (5); and (vi) the marking vs. non-
marking of a previous subject.

(5) Dat yawarta=ma leg i bin kil-im fens-tu

the horse= leg 3.  hit- fence-
‘The fence hit the horse’s leg.’ (CE: FHM014.cha: 3:32min)

(6) Dat gel-Ø bin hold-im im lungkarra-karra

the girl-  hold- 3. cry-
‘The girl held him as he was crying.’ (CE: FHM014.cha: 3:39min)

(7) Dat kajirri-ngku i garra partaj

The old.woman- 3.  climb
‘The old woman will climb up.’ (CS: FM14 40 2b: 1:40min)

Each dependent variable was categorical and coded binary, that is, Y/N, in/
animate, or SV/VS. Speaker was coded as a random effect (Figure 4.2).

• Fixed effects:
– Dependent:
• Presence of subject marking
o Y, N
– Independents:
• Transitive
o Y, N
• Relative order of subject and verb
o SV, VS
• Animacy of subject
o animate, inanimate
• Presence of coreferential pronoun
o Y, N
• Actualization of event
o Y, N
• Priming of subject marking
o Y, N
• Random effect:
– Speaker

Figure 4.2. Fixed and random effects used to measure the use vs. non-use of subject
marking in Gurindji Kriol
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    93

The variables were chosen based on the previous analyses in Meakins

(2009) and Meakins & O’Shannessy (2010). Two more variables were
added: transitivity (cf. Meakins 2015) and priming. Note that it may seem
odd to include SV word order and transitivity in the same analysis given that
there is a relationship between these variables based on argument disambigu-
ation, and word order is irrelevant to the intransitive clauses. Nonetheless we
have put these predictors in the same analysis because, although argument
disambiguation may have been the original source of the variation, there is
also variation in the SV order of intransitive clauses, and as will be shown in
the analysis, this relationship is significant.
The GLMM analysis is appropriate for data in which the dependent variable is
binary—that is, the subject marker either appears or doesn’t—so a normal distri-
bution of data points is not possible. The use of a logistic link function is necessary
when the independent variable levels are categorical, that is, Y/N, in/animate,
rather than a numeric range. This analysis also takes into account both fixed
effects and random effects in one procedure. In particular, the specification of
‘speaker’ as a random effect means the model takes into account that speakers
disproportionately contribute to the data under analysis (with differing num-
bers of tokens) and that individual speakers behave more like themselves than
other speakers. Dominance analysis allows an assessment of the relative import-
ance of the predictors in regression analysis. Different types of dominance
analyses exist. Here we adopt a method appropriate to categorical variables
developed by Azen & Traxel (2009) which measures the relative effect of the
independent variables according to their contribution to the overall R² value of
the model.

4.4.3 Results

4.4.3.1 Adults
The occurrence of subject marking for fifty adult speakers of Gurindji Kriol
in 3,575 clauses according to the dependent variables is given in Table 4.3.
The output of the GLMM adult data is given in Table 4.4. The significant
results are bolded. The analysis shows that subject marking occurs 39%
(n=1379) of the time,⁸ but is significantly more likely if the clause is transitive

⁸ Note that this ﬁgure is lower than 66.5% reported in Meakins (2009; based on 1,917 overt
transitive subjects across different genres) and 64% reported in Meakins & O’Shannessy (2010; based
on 612 overt transitive subjects in narratives). The reason for this difference is partly because
intransitive clauses have been included. But even the transitive clauses alone report a lower use of
ergative marking (57%) compared with the earlier studies. This is a result of the data extraction
procedures. In the ﬁrst two studies, transitive subjects including unmarked subjects were extracted
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Table 4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according to predictors
Transitive SV Order Animate Priming Actualized Corefer TOTAL

NOM no yes VS SV A I no yes no yes no yes

no 1362 834 166 2030 2039 157 1742 454 1742 454 1221 975 2196
yes 269 1110 249 1130 1286 93 680 699 1109 270 471 908 1379
% 16 57 60 36 39 37 28 60 39 37 28 48 39
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    95

Table 4.4. Output of generalized linear mixed model analysis on 3,575 tokens
Random effects Name Variance Std. Dev.

Speaker (Intercept) 0.308 0.555

Analysis conducted on 3,575 grammatical subjects, ﬁfty speakers
Fixed effects Estimate Std. Error z value p value
(Intercept) 1.65944 0.17978 9.231 < 0.001
Transitive 2.09956 0.09244 22.712 < 0.001
SV word order 0.96422 0.13336 7.230 < 0.001
Animate 0.02516 0.16971 0.148 0.882
Co-referential 0.78885 0.09220 8.556 < 0.001
Primed 1.30133 0.08999 14.460 < 0.001
Actualized 0.57165 0.10760 5.313 < 0.001

(p<0.001), the verb occurs after the subject (p<0.001), the previous nominal is
also marked (p<0.001), a co-referential pronoun is present (p<0.001), and the
event is actualized (p<0.001). The model explains a good amount of variation in
the data set (R²c=0.41).⁹ Each of these significant variables will be discussed in
section 4.4.4. Note that animacy was not significant which differs from the
earlier studies by Meakins (2009) and Meakins & O’Shannessy (2010).
The GLMM model provides information about which variables significantly
predict the use of subject marking. Dominance analysis is required to determine
the relative effect of the significant predictors, that is, which variables contribute
the most to the use of ergative marking. Dominance analysis for logistic regression
was developed by Azen & Traxel (2009) and measures the relative contribution of
each predictor to the R²m value (=0.36 in this model), that is, the marginal R² value
which is the variance explained when the random effect is not included in the
numerator. R²m is used rather than R²c because we are only concerned with the
relative contribution of fixed effects to the model not the additional contribution
of random effects.

manually from an unannotated corpus whereas in the present study, transitive subjects were
extracted using a Python script across the same corpus which is now coded for the dependent
variables. It is highly likely that many unmarked transitive subjects were simply missed in the first
studies due to the manual nature of data extraction. This would have artificially inflated the
frequency of subject marker use.

⁹ Recall that conditional R² (rather than a marginal R²) calculates variance based on both ﬁxed and
random effects and therefore takes account of all factors which are contributing to variation in the data
set (2009). R²c was calculated using the MuMIn package in R (Bartoń 2015).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

96     

The second column of Table 4.5 shows the R²m which were individually calcu-
lated using the Multi-Model Inference (MuMIn) package in R (Bartoń 2015). In
the second, third, fourth, and ﬁfth group, the additive effect of each variable on the
R²m was calculated, that is, the ﬁgures in column 2 are the increasing R²m value of
the model as more variables are added to the model (X₁ → X₁ X₂ → X₁ X₂ X₃ . . . ).
The remaining columns show the additional contribution of the predictors to the
R²m value of the model. For example, SV order (X₂) contributes .037 to a model
which has only SV order and Priming as predictors ((X₂ X₃) - X₃). The k rows
show the average predictive value of the variables. For example, the average
predictive value of SV order is .201 + .037 + .100 + .000 + .085 divided by the
number of predictors, that is, 4 ((X₁ +X₃ +X₄ +X₅) /4). These values are discussed
in more detail in Azen & Traxel (2009: 327–9). The results of the dominance
analysis show the relative predictive power of the variables: Transitive (.166) >
Priming (.085) > Co-referential pronoun (.046) > SV order (.034) > Potential
(.025), that is, if the clause is transitive, this is the most powerful predictor of
subject marking. If a subject marker is used in a previous clause which has a full
subject, this is the next most powerful predictor of the use of subject marking in
the clause, etc.

4.4.3.2 Children
The occurrence of subject marking for fifty-three child speakers of Gurindji Kriol
in 2,975 clauses according to the dependent variables is given in Table 4.6. The
output of the GLMM child data is given in Table 4.7. The significant results
are bolded. The analysis shows that subject marking occurs 43% (n=1283) of
the time (which is higher than in the adults), and is more likely if the clause is
transitive (p<0.001), the previous nominal is also marked (p<0.001) and a co-
referential pronoun is present (p<0.001). The model explains a reasonable amount
of variation in the data set (R²c=0.28). Note that, like the adult data, animacy was
also not significant, but unlike the adult data, word order and event actualization
were also not significant. These differences with the adult data will be discussed in
section 4.4.4.
A dominance analysis was performed and included the non-significant vari-
ables of actualized and SV word order simply to draw parallels with the adult data.
The results, shown in Table 4.8, show the relative predictive power of the vari-
ables: Transitive (.049) > Priming (.041) > Co-referential pronoun (.013) >
Actualized (.008—non-significant) > SV order (.008—non-significant). Note
that the relative order of the significant variables is the same as for adults, that
is, Transitive > Priming > Co-referential pronoun.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    97

Table 4.5. Relative effect of the signiﬁcant predictors according to dominance analysis

Additional contribution of fixed effects

Subset
R2M Transitive SV order Coreferential Priming Actualized
model
(X1) (X2) (X3) (X4) (X5)
k = 0 average .199 .028 .054 .092 .000
X1 .199 - .201 .194 .195 .208
X2 .028 .030 - .011 .036 .028
X3 .054 .048 .037a - .054 .055
X4 .092 .088 .100 .092 - .092
X5 .000 .009 .000 .001 .000 -
k = 1 average .044 .085b .075 .071 .096
X1X2 .229 - - .033 .093 .008
X1X3 .248 - .014 - .084 .011
X1X4 .287 - .035 .045 - .009
X1X5 .208 - .029 .051 .088 -
X2X3 .065 .197 - - .098 .001
X2X4 .128 .194 - .035 - .000
X2X5 .028 .209 - .038 .100 -
X3X4 .146 .186 .017 - - .001
X3X5 .055 .204 .011 - .055 -
X4X5 .092 .204 .036 .055 - -
k = 2 average .199 .024 .043 .086 .005
X1X2X3 .262 - - - .080 .010
X1X2X4 .322 - - .020 - .010
X1X2X5 .237 - - .035 .095 -
X1X3X4 .332 - .010 - - .010
X1X3X5 .259 - .013 - .083 -
X1X3X5 .296 - .036 .046 - -
X2X4X4 .163 .179 - - - .000
X2X3X5 .066 .206 - - .097 -
X2X4X5 .128 .204 - .035 - -
X3X4X5 .147 .185 .016 - - -
k = 3 average .194 .019 .034 .089 .008
X1X2X3X4 .342 - - - - .016
X1X2X3X5 .272 - - - .086 -
X1X2X4X5 .332 - - .026 - -
X1X3X4X5 .342 - .016 - - -
X2X3X4X5 .163 .195 - - - -
k = 4 average .195 .016 .026 .086 .016
X1X2X3X4 .358 - - - - -
X5
Overall average .166 .034 .046 .085 .025
a (X2 X3) - X3
b (X1 + X3 + X4 + X5) - 4
Table 4.6. Occurrence of subject marking in child Gurindji Kriol speakers according to predictors
Transitive SV Order Animate Priming Actualized Corefer TOTAL

NOM no yes VS SV A I no yes no yes no yes

no 1194 498 101 1591 1615 77 1288 404 1489 203 1054 638 1692
yes 653 630 71 1212 1236 47 576 707 1120 163 658 625 1283
% 35 56 41 43 43 38 31 64 43 45 38 49 43
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    99

Table 4.7. Output of generalized linear mixed model analysis on 2,975 tokens
Random effects Name Variance Std. Dev.

Speaker (Intercept) 0.7514 0.8668

Analysis conducted on 2,975 grammatical subjects, ﬁfty-three speakers
Fixed effects Estimate Std. Error z value p value
(Intercept) 1.45894 0.24122 6.048 < 0.001
Transitive 1.01318 0.08926 11.351 < 0.001
SV order 0.23995 0.19073 1.258 0.20838
Animate 0.31779 0.21602 1.471 0.14125
Co-referential 0.31724 0.09360 3.389 < 0.001
Primed 0.98983 0.09040 10.949 < 0.001
Actualized 0.18714 0.13185 1.419 0.15581

4.4.4 Discussion

The overall question posed by this chapter is whether a change in the complexity
in the expression of subject marking has occurred across two generations of
Gurindji Kriol speakers. This question is set against the backdrop of broader
theoretical questions about how to measure complexity in cases of overabundance,
and whether all language contact leads to simpliﬁcation. The combination of these
broader questions allows us to determine whether changes have taken place in
subject marking in Gurindji Kriol, and why these changes might have occurred.
The question of whether there has been a change in complexity of subject
marking was modelled using GLMM analysis. The results show three predictors
in common for adults and children. Transitive subjects such as (8) are signiﬁ-
cantly more likely to be marked than intransitive clauses such as (9). Whether
the nominal subject is marked also primes the appearance of the nominative in
the next occurrence of a nominal subject. An example is given in (10) of
sequential clauses containing nominal subjects with overt nominative marking.
Third, subject marking is more likely when a co-referential pronoun is present,
as shown in (11) in comparison with (12) which does not have a co-referential
pronoun.

(8) Warlaku-ngku bait-im marluka leg-ta

dog- bite- old.man leg-
‘The dog bites the old man on the leg.’ (SS: FHM051: 1:37min)

(9) Dat warlaku bin kutij nyantu-ranyj

the dog  stand 3-
‘The dog stood on its own.’ (CE: FHM014: 2:24min)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

100     

Table 4.8. Relative effect of the signiﬁcant predictors according to dominance analysis
Additional contribution of fixed effects
Subset
R2 M Transitive SV order Coreferential Priming Actualized
model
(X1) (X2) (X3) (X4) (X5)
k = 0 average .047 .000 .006 .048 .000
X1 .047 - .047 .041 .056 .047
X2 .000 .047 - .001 .001 .000
X3 .006 .041 .007 - .008 .006
X4 .048 .056 .046 .050 - .049
X5 .000 .047 .000 .000 .001 -
k = 1 average .048 .025 .023 .017 .026
X1X2 .047 - - .006 .057 .000
X1X3 .047 - .006 - .064 .005
X1X4 .104 - .000 .007 - .001
X1X5 .047 - .000 .005 .058 -
X2X3 .007 .046 - - .050 .051
X2X4 .049 .055 - .008 - .000
X2X5 .000 .047 - .058 .049 -
X3X4 .056 .055 .002 - - .000
X3X5 .006 .046 .052 - .050 -
X4X5 .049 .056 .000 .007 - -
k = 2 average .051 .010 .015 .055 .010
X1X2X3 .053 - - - .059 .017
X1X2X4 .104 - - .008 - .001
X1X2X5 .047 - - .023 .058 -
X1X3X4 .111 - .001 - - .001
X1X3X5 .052 - .018 - .060 -
X1X3X5 .105 - .000 .007 - -
X2X4X4 .057 .055 - - - .001
X2X3X5 .058 .012 - - .000 -
X2X4X5 .049 .056 - .009 - -
X3X4X5 .056 .056 .002 - - -
k = 3 average .045 .005 .012 .044 .005
X1X2X3X4 .112 - - - - .001
X1X2X3X5 .070 - - - .043 -
X1X2X4X5 .105 - - .008 - -
X1X3X4X5 .112 - .001 - - -
X2X3X4X5 .058 .055 - - - -
k = 4 average .055 .001 .008 .043 .001
X1X2X3X4 .113 - - - - -
X5
Overall average .049 .008 .013 .041 .008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    101

(10) Najan kujarra-ngku dei bin gon jeij-im im

another two- 3.  go chase- 3.
ankaj yapakayi najan kujarra-ngku na rarraj
poor.thing small another two-  run
‘Another two went chasing the poor little thing. Two more run then.’
(RR: FM009.A: 6:11min)

(11) Jintaku warlaku-ngku i bin bait-im im

one dog- 3.  bite- 3.
marluka la leg-ta
man  leg- 
‘One dog bit a man on the leg.’ (AC: FHM052: 1:58min)

(12) Dat warlaku bin bait-im im leg-ta dat marluka

the dog  bite- 3. leg- the man
‘The dog bit the man on the leg.’ (SS: FHM065: 4:53min)

Adults had two more signiﬁcant variables than children which predicted subject
marking—word order, that is, Gurindji Kriol-speaking adults are more likely to
mark subjects when they occur after the verb, as shown in (13) as opposed to (14);
and event actualization, that is, events that weren’t actualized were less likely to be
marked, as demonstrated in (15) which has a verb marked continuative and (16)
which uses the potential auxiliary.

(13) I=m put-im jumok tebul-ta igin dat kajirri-ngku

3.= put- smoke table- too the woman-
‘The woman puts the smokes on the table.’ (LS: FHM066: 0:19min)

(14) Dat kajirri i=m put-im jumok jiya-ngka

the woman 3.= put- smoke chair-
The woman puts the smokes on the chair. (CA: FHM127: 2:24min)

(15) Dat karu-ma mirlarrang-jawung i garra jarrwaj

The child- spear- 3.  spear
im jamut
3. turkey
‘The child will shoot the turkey with a spear.’ (RR: FHM061: 3:10min)

(16) Dat warlaku i bin hard-im-bat-karra nyanuny wartan

the dog 3.  hurt--- 3. paw
‘The dog hurt his paw.’ (DO: FM15_55_1b: 1:42min)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

102     

In terms of E-complexity, both the adult system and the child system display
overabundance, while traditional Gurindji uses subject marking obligatorily.
Nonetheless the adult Gurindji Kriol system requires attention to a greater
number of variables to make decisions about the application of subject marking.
Thus the subject marking system seems to have complexified, in the sense of I-
complexity, at the point of contact with the genesis of the mixed language
(represented in the adult speech), then simplified in the next generation. The
child system seems to be a refined version of the adult system. Of the three
variables in common, the relative predictive power of variables is the same:
transitivity > priming > use of co-referential pronoun. For two of those
predictors—priming and co-referential pronoun—subject-marking usage seems
stable across the generations. For adults, 60% of primed subjects are marked
compared with 28% of unprimed subjects; and children: 64% of primed subjects
compared with 31% of unprimed subjects. Similarly for adults, 48% of subjects
with co-referential pronouns are marked compared with 28% of subjects without
co-referential pronouns; and for the children: 49% of subjects with co-referential
pronouns compared with 28% of subjects without co-referential pronouns. Thus
the influence of priming and the use of co-referential pronoun seem quite stable
diachronically. On the other hand, transitivity, which is the strongest predictor of
subject marking for both adults and children, shows larger differences across the
generations—adults: 59% of transitive subjects compared with 16% of intransitive
subjects; and children: 56% of transitive subjects compared with 35% of intransi-
tive subjects.
We argue that differences in the importance of transitivity, coupled with the
loss of SV order as a predictor of subject marking in the children’s speech, are the
results of decreasing contact with Gurindji. First, the subject marking in Gurindji
Kriol finds its origins in the Gurindji ergative marker, which marked only
transitive subjects. Many members of the first generation of Gurindji Kriol
speakers only used subject marking for transitive subjects, although it was clearly
beginning to spread to intransitive subjects. For child speakers of Gurindji Kriol,
this pattern is much more entrenched, suggesting that the original influence of the
Gurindji ergative pattern is waning. Second, the loss of SV order as a significant
variable reinforces the argument that there is a decreasing contact with the
Gurindji system. In general, SV order is more dominant for child speakers (only
5% of transitive clauses show VS order compared with 12% of adult speakers),
reflecting the Kriol system of argument disambiguation. For adult speakers,
ergative marking is more likely in VS clauses, which reflects the continuing
interplay of the Gurindji and Kriol systems of argument disambiguation. This
influence has been lost in child speakers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -    103

4.5 Concluding remarks

This study has shown that complexification occurred in the area of subject
marking in Gurindji Kriol in the intense contact period which saw its genesis.
Subject marking was borrowed from Gurindji where it transformed from obliga-
tory to variable marking, leading to a situation of overabundance, that is, a
proliferation of cell-mates (E-complexity). Overabundance required speakers to
monitor other linguistic features in the clause and discourse more broadly—
transitivity, SV order, the marking of the previous nominal subject, the presence
of a co-referential pronoun, and event actualization, rather than just the phono-
logical composition of the stem, as is the case in Gurindji (I-complexity). Another
generation on and only three of these variables are now relevant—transitivity, the
presence of a co-referential pronoun, and priming. We argue that changes in the
relative importance of transitivity and SV order in the children’s speech, and
therefore simplification in the exponence of overabundance, is the result of
decreasing contact with Gurindji.
This chapter demonstrates that language contact does not always lead to the
simplification of morphology, and in the case of overabundance, complexity,
that is, the degree of variation in the expression of a form within the cell of a
paradigm, can be a result of language contact. In the situation outlined by this
chapter, the intense contact between Gurindji and Kriol argument marking
systems which led to the formation of Gurindji Kriol also saw the development
of a system of subject marking which was derived from Gurindji but was more
complex than the obligatory marking system of Gurindji. The new generation of
Gurindji Kriol has less access to Gurindji, that is, there are fewer speakers
of Gurindji in their linguistic environment and they have had fewer years of
exposure to Gurindji than the adult speakers. The result has been a simplifica-
tion of overabundance where the system is no longer an interplay between the
Gurindji and Kriol systems of argument disambiguation (i.e., SV order no
longer predicts subject marking), and there is an increase in the marking of
intransitive subjects, which is far removed from the function of the original
Gurindji ergative marker.

Acknowledgements

The data collection (see section 4.4.1) was funded by the Aboriginal Child Language
(ACLA) project from 2004 to 2007, the Jaminjungan and Eastern Ngumpin DoBeS
project from 2007 to 2008 (available in the DoBeS archives—http://dobes.mpi.nl/
projects/jaminjung/), a Hans Rausing Endangered Languages Project from 2008 to
2010 (IPF0134; available in the ELAP archive—http://elar.soas.ac.uk/deposit/0273), an
Australian Research Council APD project from 2009 to 2012 (DP0985024); and an
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

104     

Australian Research Council DECRA project from 2014 to 2017 (DE140100854). As

well as Cassandra Algy, a number of language consultants were instrumental in the
collection of data: Samantha, Lisa, Rosie & Leanne Smiler, Cecelia Edwards, and
Ronaleen & Anne-Marie Reynolds. We are also grateful for the support of Appen, in
particular to Simon Hammond for technical support.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

5
Derivation and the morphological
complexity of three French-based creoles
Fabiola Henri, Gregory Stump, and Delphine Tribout

5.1 Introduction

The claim of creole simplicity is pervasive in linguistics. This claim harks back to
the nineteenth-century view that linguistic complexity correlates with the prop-
erties of a language’s inflectional morphology and with its age (DeGraff 2001).
According to this view, isolating languages are ‘primitive’ in comparison with
synthetic languages, whose morphology is taken as evidence of heightened com-
plexity. Modern creolistic literature abounds with such assumptions. Creoles are
seen as newborn languages that emerge from rudimentary pidgins embodying a
break in the transmission of the lexifier. As such, they constitute a kind of
transition between primitive pidgin ‘protolanguages’ and mature languages
(Bickerton 1981). Complementing this view of creoles as ‘young’ languages are
comparisons with ‘complex’ languages that purportedly reveal creoles to be ‘the
world’s simplest grammars’ on the grounds that they exhibit no, or at most, insig-
nificant vestiges of the lexifier’s system of inflectional marking (Seuren & Wekker
1986; Bickerton 1988; McWhorter 2001; Parkvall 2008; Bakker 2014; among others).
As has been argued elsewhere (DeGraff 2001; Mufwene 2008; Blasi et al. 2017),
these assertions rest upon several controversial assumptions that may be ques-
tioned on empirical, theoretical, and sociohistorical grounds. In the domain of
morphology, for example, the received view that creoles are maximally isolating
has been decisively disconfirmed by unequivocal evidence of inflectional morph-
ology in many creoles (Kihm 1994; DeGraff 2001; Bakker 2003; Baptista 2003a,
2003b; Roberts & Bresnan 2008; among others). It is true that a creole may exhibit
less morphology than its lexifier,¹ but does this entail that it is less complex?

¹ Studies relating to the morphological complexity of creoles usually rely on comparisons with the
lexifiers rather than with the contributing substrates. A combination of factors has given rise to this
preference. First, the formation of a creole usually involves one contributing lexifier, but may involve
several substrates whose contributions to the creole’s formation are hard to evaluate in terms of
proportion. In the absence of adequate historical documentation, we cannot always attribute particular
contributions to particular substrate languages. Even so, we can definitely affirm that the substrates of

Fabiola Henri, Gregory Stump, and Delphine Tribout, Derivation and the morphological complexity of three French-based
creoles In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Fabiola Henri, Gregory Stump, and Delphine Tribout.
DOI: 10.1093/oso/9780198861287.003.0005
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

106  ,  ,   

Morphological complexity is often equated with numerousness—of morphs,

categories, processes, or paradigm cells; but this is not the only way of measuring
complexity, nor is it in general the most enlightening way (Ackerman & Malouf
2013; Stump 2017).
In this chapter, we draw upon an alternative conception of morphological
complexity which we apply to a language’s system of derivational morphology.
Drawing on a precise analysis of their deverbal derivation, we argue that three
French-based creoles (Mauritian, Guadeloupean, and Haitian) display an unex-
pected degree of morphological complexity. We detail our conception of mor-
phological complexity in section 5.2, and in section 5.3, we discuss the issue of
creole simplicity. In section 5.4, we examine the morphology of French (the lexiﬁer
language of these creoles) and that of the creoles themselves. In section 5.5, we
deﬁne our theoretical framework. Finally, section 5.6 presents our new analysis of
deverbal nominalizations in Mauritian, Guadeloupean, and Haitian.

5.2 Morphological complexity

Various perspectives have informed recent discussions of the notion of linguistic

complexity (Dahl 2004; Hawkins 2004; Miestamo et al. 2008; Sampson et al. 2009;
Newmeyer & Preston 2014; and Baerman et al. 2015a). On the one hand, the
complexity of a linguistic phenomenon may be seen in psycholinguistic terms as
the extent of the difﬁculties that it poses for a language’s learners and users. On the
other hand, complexity may be seen in more absolute terms as an independently
measurable property of the language system itself, separable, in principle, from
issues of acquisition, production, and processing (though no doubt correlated
with them in discoverable ways). Moreover, linguistic complexity is logically of at
least two types (Ackerman & Malouf 2013): a linguistic phenomenon’s enumera-
tive complexity depends on how many categories (of whatever type) it employs; its
integrative complexity, by contrast, depends on the idiosyncrasy of the inter-
actions among those categories. A language’s morphology can exhibit complexity
in a variety of ways. The most intensively studied kinds of complexity involve
either the morphotactics of individual word forms (whose enumerative complex-
ity is a function of degree of synthesis and degree of fusion; Schlegel 1808;
Humboldt 1836; Sapir 1921; Greenberg 1960; Bickel & Nichols 2013) or the
structure of whole inﬂectional paradigms (whose integrative complexity is a
function of the predictability of a paradigm’s word forms; Moscoso del Prado
Martín et al. 2004; Ackerman et al. 2009; Milin et al. 2009; Ackerman & Malouf

Caribbean creoles differ from those of Indian Ocean creoles. Moreover, creolistics has a history of
Eurocentrism, which has favoured the comparison of creole grammars with the more familiar
grammars of their Indo-European lexiﬁers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     107

2013; Stump & Finkel 2013). But a language’s morphology may exhibit other
kinds of complexity as well.²
Here, we are concerned with the integrative complexity of a language’s morph-
ology as reflected by the interaction of a lexeme’s inventory of forms with its
participation in deverbal derivation. In general, a language’s derivational morph-
ology may exhibit complexity in two different dimensions. In order to distinguish
these, it is useful to distinguish not only between a derivational relation’s  and
 , but also between the relation’s  and  —the
specific stems of the base and derived lexemes whose morphology participates in
the formal expression of their derivational relation. Thus, the derivational relation
of the base lexeme  to the derived lexeme  is formally expressed by
means of the relation of the base stem thiev- to the derived stem thievish. Given
these distinctions, the first dimension of a derivational relation’s complexity is that
of the predictability of the base lexeme’s base stem; the second dimension is that of
a base stem’s restrictedness in the morphology of the base lexeme. Consider first
the dimension of base-stem predictability.
In discussing this dimension, we make the uncontroversial assumption
(Aronoff 1994; Stump 2001) that a lexeme L has a   whose members
serve in the definition of both (i) the inflected word forms constituting L’s
inflectional paradigm; and (ii) the stem sets of lexemes derived from L. In general,
we assume that a lexeme’s stem set may include both free and bound stems. On
this assumption, the complexity of a particular derivational relation depends on
which member of the base lexeme’s stem set is its base stem in that relation. In the
simplest cases—those whose complexity is of degree 0—the base stem for a base
lexeme L in a particular derivational relation is the only member of L’s stem set.
From this endpoint of maximal simplicity, successively greater degrees of com-
plexity can be calibrated. In cases of derivation exhibiting complexity of degree 1
or 2, the base lexeme in a particular derivational relation possesses more than one
stem, only one of which serves as its base stem in that relation. In cases exhibiting
complexity of degree 0 or 1, the base lexeme’s base stem is predictable; in cases
exhibiting complexity of degree 2, the base lexeme’s base stem is unpredictable.
Thus, instances of derivation may evince three degrees of increasing complexity,
as in Figure 5.1.
This first notion of complexity calls to mind those approaches to complexity
based on information theory (Arkadiev & Gardani, Chapter 1, this volume); in
such approaches, complexity arises from a lack of predictability among a system’s
parts. In assessing complexity of this sort in a system of inflection classes, the
parts at issue are an inflectional paradigm’s cells (cf. Parker & Sims, Chapter 2,
this volume); here, by contrast, the parts at issue are those members of a base

² See Stump (2017) for a discussion of the wide range of possible measures of morphological
complexity.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

108  ,  ,   

Complexity Is base lexeme’s Degree Example Cardinality of

base stem in R base lexeme’s
predictable? stem set
0 boy → boyish 1
low yes man (~ men) → mannish

↕
1
goose (~ geese) → goosish

self ~ selve(s) → selfish >1

high no 2 BUT
thief ~ thieve(s) → thievish

Figure 5.1. Degrees of complexity in the predictability of a base lexeme’s base stem
in a particular derivational relation R

lexeme’s stem inventory available for the definition of a derived lexeme’s stem
inventory. By this criterion, the derivational relation between boy and boyish is
least complex, since the stem on which boy-ish is based is the only available choice,
the sole stem of boy; the derivational relation between man and mannish (or
between goose and goosish) is more complex, since the stem on which mann-ish
(or goos-ish) is based is not the only available choice, though it does conform to a
general pattern favouring the use of the singular form’s stem; and the relation
between thief and thievish is most complex, since the stem on which thiev-ish is
based is not the only available choice and actually fails to conform to the general
pattern favouring the use of the singular form’s stem.
The second dimension of a derivational relation’s integrative complexity is that
of base-stem restrictedness. Where X is the particular member of a lexeme L’s
stem set that serves as L’s base stem in a particular derivational relation, how
restricted a role does X play in the morphology of L? In the simplest cases (e.g.,
that of English grass ! grassy), X is L’s only stem and therefore has an unre-
stricted role in the morphology of L. In more complex cases (e.g., that of English
leaf [~ leave(s)] ! leafy), a base lexeme L’s base stem in a particular derivational
relation is only used in the realization of certain cells in L’s inflectional paradigm,
so that its role in L’s inflectional morphology is restricted according to the
morphosyntactic property set to be realized. In the most complex cases (e.g.,
that of English louse /laʊs/ ! lousy /laʊzi/), a base lexeme L’s base stem is ‘hidden’
to the extent that it has no role at all in the inflection of L but is reserved for
defining the stems of some or all lexemes deriving from L. This second dimension
of complexity is schematized in Figure 5.2, where we again distinguish three
degrees of complexity.
This second notion of complexity is qualitative in the sense that it equates
complexity with deviation from a canonical ideal (cf. Nichols, Chapter 7, this
volume)—specifically, it equates complexity with deviation from a canonical
pattern in which the stem that defines a derived lexeme’s form also defines the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     109

Complexity Degree Role of X in L’s morphology Example

0 Unrestricted because X is L’s sole stem grass → grassy

low
In the inflection of L, X is restricted to leaf [~ leave(s)] →

↕ 1 the realization of certain leafy

morphosyntactic property sets

high X is not used in the inflection of L, but louse /laʊs/ → lousy

2 is restricted to the definition of stems of /laʊzi/
derivatives of L

Figure 5.2. Degrees of complexity in the restrictedness of stem X in the morphology

of lexeme L, where X serves as L’s base stem in a particular derivational relation

base lexeme’s inflected forms. By this criterion, the derivational relation between
grass and grassy is least complex, since the stem on which grass-y is based is
employed in both inflected forms of grass; the derivational relation between leaf
and leafy is more complex, since the stem on which leaf-y is based is only
employed in one of the inflected forms of leaf; and the relation between louse
and lousy is most complex, since the stem on which lous-y is based isn’t employed
in either of the inflected forms of louse.

5.3 Creole simplicity

According to Seuren (1998: 292–3), ‘if a language has a Creole origin it is SVO, has
TMA particles, [and] has virtually no morphology’. Claims of this kind reflect an
ideology about creoles that finds its origin in the eighteenth century, when creoles
were described as ‘corrupt’ and ‘deficient’ compared to exemplary grammars such
as that of Latin. These deficiencies were presumed to result from the inability of
Africans to acquire the grammatical intricacies of European languages (Bertrand-
Bocandé 1849; Baissac 1880; see also Meijer & Muysken 1977 for discussion).
With the advent of generative grammar, Bickerton (1981) formulated the
Language Bioprogram Hypothesis, a theory that sees the process of creolization
as the complexification of a pidgin that creole children are exposed to. A pidgin,
according to Bickerton, is an unstable form of communication that results from a
simplification of the lexifier language by adults during the process of second-
language acquisition. The contact languages emerging from this sort of process
come closest to revealing Universal Grammar in its naked form, embodying ‘the
world’s simplest grammars’ (McWhorter 2001).³

³ Although McWhorter’s (2001) claim is about creoles, both pidgins and creoles are generally
characterized as simple languages (Romaine 1988). Bickerton’s (1988) hypothesis, however, ranks
pidgins as the simpler of the two, since pidgins are not systematic. On his view, it is as an effect of
UG that a pidgin is creolized. Research has cast doubt on this generalization. Rich inﬂection can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

110  ,  ,   

Simpliﬁcation, as evoked in creole studies, is often associated with morphology,

particularly inflectional morphology: a creole is identified as a type of language
that exhibits semantically regular derivational affixation but no inflectional affix-
ation (McWhorter 1998). More generally, McWhorter (1998, 2001, 2011, among
others) claims that simplification of inflectional morphology is an effect of a ‘break
in transmission’. In Chapter 10, this volume, McWhorter elaborates on the
hypothesis that ‘radical analyticity’ in creoles, Sinitic, Niger-Congo, and some
Austronesian languages stems from the drastic elimination of inflection, in par-
ticular, contextual inflection during extensive adult acquisition. This peculiar kind
of ‘unnatural’ change is nothing comparable to the processes of grammatical-
ization witnessed in languages like English or French. While these are more
analytic than their ancestors, both of these languages retain agreement and
complex expression of inherent inflection via root allomorphy. A similar claim
is made by Grant (2009), who posits that simplicity is a reduction in the allomor-
phy found in the lexifier’s system to a sufficient extent that the emerging pidgin/
creole shows no inflectional marking. However, the evidence does not support
either of these conceptions of linguistic simplification. Contra McWhorter,
Palenquero does show agreement in adnominal adjectives (Schwegler 2013) and
even if many creoles have lost gender and number agreement, they have innovated
new contextual morphology most certainly influenced by their substrate lan-
guages: all varieties of Melanesian Pidgin feature a transitivity marker which is
suffixed to an English inherited lexicon (1).

(1) a. bild > bild-im haos

build > build- house
b. pei > pe-im skul yuniform
buy > buy- school uniform
c. let > let-em yu go
let > let- you go (Arika 2012)

French-based creoles spoken in the Indian Ocean all exhibit contextual inﬂection
(see section 5.6.1.1 on Mauritian). As for the question of allomorphy, the
approach we adopt in the next sections is that languages do not merely eliminate
allomorphy. What appears in a new system in terms of forms is heavily dictated by
frequency and the identiﬁcation of paradigmatic patterns that will subsequently
serve to make new forms. Such a perspective doesn’t warrant the existence of a
prior pidgin. As Mufwene (2008) points out, a closer examination of the facts
shows that creoles do not evolve from pidgins but rather from the approximation

found in pidgins, even more so than in some creoles (Bakker 2003). If a creole develops through the
nativization of a pidgin, as the Language Bioprogram Hypothesis holds, we would expect the creole to
be more complex than the pidgin from which it develops.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     111

of a non-standard variety of the lexiﬁer. Indeed, a recent study on the emergence

of creole languages questions whether the existence of a pidgin is a necessary
precursor to creolization, and suggests that contrary to common belief, emerging
creoles are not typologically distinct from other languages (Blasi et al. 2017).
In addition, the input for language learners in purely spoken settings differs
radically from that of guided settings, since an inflectional paradigm’s perceptible
distinctions are very different in speech and writing. Syncretism in a lexeme’s
paradigm is much more pervasive in speech than in writing. In spoken French,
only three forms are distinguished in the present indicative of first-conjugation
verbs (e.g., /mɑ̃ʒ/ eat../3 ~ /mɑ̃ʒɔ̃/ eat..1 ~ /mɑ̃ʒe/ eat..2),⁴
making the form-function relationships quite opaque in purely spoken settings
(cf. section 5.4.1). And while some forms, like the simple past (passé simple), are
rare altogether in colloquial French, others, like the periphrastic future, are
preferred over synthetic forms (Abouda & Skrovec 2015, 2017). This is also true
of gender and number agreement, which is less perceptible in spoken French than
in written French.
The stark differences between spoken French and the French of more guided
settings are clearly revealed by Cajun French, which derives from varieties of
spoken French dating from the period of colonialism both in the Americas and in
the Indian Ocean. Cajun French features extensive use of periphrastic expressions
comparable to those observed in the creoles. Such periphrasis allows differences of
tense, aspect, and mood (TAM) to be expressed without differences in synthetic
morphology; the form of the main verb manger ‘to eat’ remains unchanged in
periphrastic expressions such as vous-autres est après manger ‘you () are eating’
and vous-autres va manger ‘you () will eat’. Thus, verb paradigms in Cajun
French distinguish fewer synthetic forms than their counterparts in standard
French.
French-based creoles are likewise outgrowths of spoken French; as such, they
have not drastically simplified the French inflectional system, but have instead
developed a native verb alternation that resembles one salient in spoken forms of
the lexifier (Bonami et al. 2013). This is in line with recent empiricist approaches
that reject the language innateness hypothesis and favour an integrative view of
second-language acquisition according to which language learning relies on mul-
tiple factors, including innate learning abilities, prior knowledge of first language,
social setting, and perceptual and statistical mechanisms (see also Saffran et al.
1996 and Tomasello 2000).⁵ Finally, there is also the logical problem of language

⁴ In French, the ﬁrst conjugation constitutes the largest conjugation as well as the most regular and
productive.
⁵ Other research on the emergence of language also suggests that aside from the human genetic
endowment for language acquisition, human beings possess a mathematical or computational compo-
nent for language creation and complexiﬁcation (Hauser et al. 2002; Fitch & Hauser 2004; Gervain &
Mehler 2010).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

112  ,  ,   

simplification with regard to what has been identified as foreigner talk (Ferguson
1971). Foreigner talk refers to a simplified version of a language used by native
speakers when addressing non-natives; the omission of inflections is widespread
in these varieties (Hock & Joseph 1996). In any case, future creole speakers clearly
have no prior knowledge of the lexifier language before acquisition, begging the
question as to how they could have simplified it. These observations crucially
support the view that the input was already simplified.
The morphological complexity of creoles has generally been evaluated based on
comparisons with their lexifier languages using traditional views of morphology. It
must be said at the outset that the extent of a creole’s morphological complexity
cannot simply be equated with the extent to which it mirrors complex patterns in
the lexifier language; otherwise, as will be argued below, dimensions of complexity
in the creole that have no counterpart in the lexifier language may simply be
overlooked. This point is all the more crucial given that complexity can be
measured in more than one way.
Under a morpheme-based approach, a creole’s lexifier can be argued to be
morphologically complex because it distinguishes a large number of inflected
words, a large number of affixes, and, perhaps also, a large number of morpho-
logical processes. By these measures, the morphology of the creole under com-
parison appears much less complex.These measures, however, imply a particular
conception of what constitutes morphology. In the generative-transformational
tradition, it has been customary to see periphrasis as a syntactic construct; but
periphrasis has recently been argued to function as a kind of inflectional exponence
on a par with synthetic varieties of exponence (see Bonami 2015 and the references
cited therein). Under the assumption that not all morphology is synthetic morph-
ology, creole morphology takes on a higher degree of complexity, with larger arrays
of morphosyntactic properties, larger paradigms, and larger inventories of inflec-
tional exponents (Henri 2010; Kihm 2014; Henri & Kihm 2015).
Nevertheless, as we noted in section 5.2, the complexity of a system is not
simply enumerative; morphological complexity does not simply reduce to the
cardinality of its morphosyntactic properties, the size of its paradigms, or the
variety of its inflectional resources (Bonami et al. 2015). Even if creole inflectional
systems are smaller on average⁶ than those of their lexifiers, they exhibit a
comparable degree of integrative complexity. For example, Henri (2010) shows
that in Mauritian, the complementary environments in which a verb’s long and
short alternants appear cannot be characterized in morphological, syntactic, or
information-structural terms by complementary natural classes of properties

⁶ Verbs in both Mauritian and French exhibit alternating forms, but a Mauritian verb’s synthetic
paradigm is limited to two cells, neither of whose forms exhibits true affixation or any coherent
morphosyntactic content (Henri 2010); in French, by contrast, a verb’s synthetic paradigm exhibits
fifty-one cells, combinations of up to three inflectional affixes (e.g., i-r-i-ons ‘(we) would have gone’)
and arguably six morphosyntactic features (Bonami and Boyé 2003, 2007).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     113

(cf. section 5.6.1). Mismatches of this kind have been argued to be an indicator of
integrative complexity (see Stump 2017: 70–1 and the references cited there).
Mauritian is likewise more complex when it comes to interpredictability, that is,
the difficulty of predicting one form based on knowledge of another (Henri 2010;
Bonami & Henri 2010; Bonami et al. 2011). Luís (2014) also shows that Indo-
Portuguese creoles exhibit different types of form-meaning mismatches in their
inflectional system. Korlai, for example, presents both class-specific syncretism
and paradigmatic opacity that affect morphosyntactic transparency (Bonami et al.
2013; Luís 2014). Comparable mismatches are found in other Portuguese-based
creoles spoken in Africa (Kihm 2014).

5.4 Verb inﬂection: from French to French-based creoles

Creoles are usually claimed to retain few if any of their lexifier’s inflectional
distinctions. In French-based creoles, this reduction has led to systems in which
each verb has at least a short form (SF) and a long form (LF); systems of this kind
are said to be characteristic of French-based creoles spoken in the Indian Ocean,
and in the Americas, of Louisiana Creole and Haitian. The formal distinction
between a verb’s SF and LF is claimed to be a syntactically-conditioned shape
alternation in Isle de France creoles—Seychellois, Rodriguais, Chagossian, and
Mauritian⁷—but not in Reunionese (Corne 1982; Seuren 1990; Syea 1992). Corne
(1982) argues for a typological difference between Reunionese and Isle de France
creoles on the basis of their verbal systems. Isle de France creoles’ verb alternations
are said to have been influenced by Bantu alternations while those of Reunionese
are reconciled with the assumption that it is merely a variety of French.⁸, ⁹

⁷ These languages are said to form varieties of the same creole, namely Mauritian, this for reasons
linked to colonization. Indeed, the Seychelles used to be part of British Mauritius together with
Rodrigues and the Chagos. Rodrigues remains a Mauritian dependence while the sovereignty of the
Chagos is still under dispute.
⁸ Depending on the verb, mesolectal varieties of Reunionese exhibit up to five inflected forms,
expressing distinctions of tense and aspect. For example the verb ‘eat’ has the three inflected forms
mâz, mâze, and mâzra, with the third one being restricted to negative future-tense contexts. Irregular
verbs like ‘come’ exhibit five inflected forms, for example viê, vne, viê(n)ra, vni, vnir, where the future
tense form viê(n)ra is again restricted to negative contexts and where there is a distinction between a past
participle form vne and an infinitive vnir (Corne 1982). Corne (1982) further notes that those forms are
unstable to the extent that the past tense, the past participle and the infinitive are interchangeable.
Wittmann & Fournier (1987) present a severe critique of Corne’s data and analysis, drawing attention to
a range of problems. They argue that his analysis is observationally inaccurate and theoretically
questionable (given, e.g., the disparate range of factors that must be assumed to condition the proposed
phonological rules; see also Henri 2010); that the analysis is not obviously informed by current thought
on the usual motivations for regular sound changes; that the analysis is not compatible with reasonable
assumptions about the uniformity of diachronic processes effecting language change; and that his
assumption that Mauritian and Reunionese have fundamentally different histories is highly question-
able.
⁹ Klingler (2003) and Rottet (1992) also assume that verb alternation in Louisiana Creole is
reminiscent of French, making Louisiana Creole a plausible variety of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

114  ,  ,   

Following Baker (1972), Corne argues that the LF/SF alternation affects 70% of the
Mauritian verb lexicon and that SFs are derived by truncation of the LF’s final
vowel under conditions that are syntactically and semantically determined.
Chaudenson (2003), Veenstra & Becker (2003), Veenstra (2009), and others
defend an alternative analysis according to which Mauritian inherits its long
and short forms from a French verb’s infinitive and third-person singular present
indicative forms (respectively) but without inheriting their corresponding func-
tions. This development, they argue, is based on universals at play during second-
language learning. Veenstra (2009: 110) further hypothesizes that the LF/SF
alternation is at first phonologically conditioned but that it gradually becomes
grammaticalized so that the appearance of a verb’s SF is conditioned by a
following complement. As discussed in section 5.6.1.1, the distribution of the
Mauritian alternation is much more complex than what Veenstra assumes (see
also Henri 2010). The function of the alternation seen in Mauritian—he says—
might reflect Bantu influence, since the conjoint and disjoint verb forms found
in Makhuwa and other Eastern Bantu languages exhibit similar functions. While
the hypothesis is plausible, it raises the question of the Bantu contribution in
Haitian, which shows an alternation associated with a more or less parallel
function. According to DeGraff (2001:75), the distinction in Haitian is subject
to prosodic or morphosyntactic constraints. Verb alternations are, according to
DeGraff (2001), manifestations of inflectional morphology, with a verb’s SF
arising from its LF by subtractive morphology in the context of a following
complement.
The evidence that we present below suggests that verb-stem alternations are
characteristic of all French-based creoles to a greater or lesser degree. While the
form of such alternations and the functions that they serve are innovated in each
individual creole, they are nevertheless relatable to the existence of comparable
though distinct alternations in the verb morphology of the lexifier. We advocate a
theory of creole genesis that includes unguided second-language acquisition as
one of the key components of creolization. In addition, we believe that there are a
number of additional factors that may influence the emergence of a creole; these
include frequency, salience, ease of perception, transparency, invariance, and
congruence (see also Corne 1982; Mufwene 2008).

5.4.1 Properties of the French verbal paradigm

As mentioned in section 5.2, the French verbal system is highly unpredictable and
therefore unlikely to remain unchanged in French-based creoles (Bonami et al.
2013). Standard written French distinguishes three conjugation classes of syn-
thetic paradigms consisting of a total of ﬁfty-one cells expressing TAM, person,
number, and gender. The ﬁrst conjugation is the productive class, into which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     115

Table 5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013)
     
/.2 dit
finise rɑ̃de kɥize puve
./3 dize
lave
 rɑ̃dʁ kɥiʁ puvwaʁ diʁ
. fini rɑ̃dy py
kɥi di
. rɑ̃ pø
.3 lav pœv
finis rɑ̃d kɥiz diz
./3 pɥis

loans and neologisms are integrated, as opposed to the non-productive second

and irregular third conjugation.
As Table 5.1 shows, French verb paradigms exhibit extensive syncretism: in the
first conjugation, many of a verb’s forms have one of two shapes, distinguished
only by the presence or absence of a final /e/, for example /mɑ̃ʒe/ ~ /mɑ̃ʒ/
(Chaudenson 2003; Veenstra & Becker 2003; Henri 2010). The French Xe ~
X alternation decidedly resembles the long-short alternation seen in French-
based creoles, although, as we argue, the creole alternation cannot be seen as
purely inherited (see section 5.4.2). In eighteenth-century French, final ‘r’ became
unpronounced in second-conjugation infinitives and in third-conjugation infini-
tives ending in /iʁ/ (though not those ending in /iʁә/, such as écrire ‘to write’); this
means that in the expression of the paradigm cells listed in the left hand column of
Table 5.1, only three forms were distinguished in the second conjugation, as
Bonami et al. 2013 observe.
Various factors tend to maximize the use of the syncretic forms in Table 5.1. In
both spoken and written corpora, instances of the Xe ~ X pattern of /mɑ̃ʒe/ ~
/mɑ̃ʒ/ constitute more than 89% of forms (Bonami et al. 2013). In spoken French,
the periphrastic future formation, involving the combination of the ancillary
lexeme ‘go’ with an infinitive form (as in (2a), with syncretic /mɑ̃ʒe/), is over-
whelmingly preferred to the synthetic formation in (2b). Similarly, the use of
.1 forms with subject nous (nous mangeons ‘we’re eating’) tend, in collo-
quial French, to be supplanted by that of indefinite .3 forms with subject on
(on mange ‘one is eating’, with syncretic /mɑ̃ʒ/).

(2) a. Il va manger.
3 go.3 eat.
‘He will eat.’
b. Il mangera.
3 eat..3
‘He will eat.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

116  ,  ,   

Bonami et al. (2013) also note that French verb forms are often ambiguous with
respect to their inflection-class membership. For instance, /pɛɳe/ serves as the
.2 form for both the first-conjugation verb peigner ‘comb’ and the third-
conjugation verb peindre ‘paint’. Thus, certain differences in form may be widely
recurrent even if they don’t stem from a single inflection-class difference. If
creolization is at all sensitive to factors such as frequency, saliency, and perception,
we expect to find an LF/SF distinction in creole verbs as a reflection of the wide
recurrence of a comparable distinction in the lexifier (Bonami et al. 2013; see also
Corne 1999; DeGraff 2001).

5.4.2 French-based creoles

Verb alternations are observable across the French-based creoles, though the
number of verbs exhibiting such alternations varies from one creole to another.
Verbs in Guadeloupean are customarily described as being invariable. For
example, Hazaël-Massieux (2002: 71) claims that Guadeloupean doesn’t show
any real inflection, and distinctions between two forms of the same lexeme, like
the distinction between fè /fɛ/ and fèt /fɛt/ ‘to do’, are French borrowings and are
purely exceptional. A similar type of description is provided by Ehrhart (1993:
158), who maintains that Tayo, a French-based creole spoken in New Caledonia,
behaves like American creoles (with the exception of Louisiana Creole) in having
only a few verbs with more than one form, such as mete /mete/ ~ met /met/ ‘to
put’, balaj /balaj/ ~ balaje /balaje/ ‘to sweep’, kouver /kuvɝ/ ~ kouvri /kuvʁi/ ‘to
cover’.
Granting the limited nature of verb alternations in these two creoles, we
nevertheless believe that even here, the role of such alternations in a creole’s
grammar cannot be ignored. When forms of a verb alternate, they exhibit sys-
tematic distributional differences. Moreover, the incidence of such alternations is
important as a feature shared by the French-based creoles; it constitutes a com-
mon aspect of their development from French, but also a significant dimension of
innovative divergence among the creoles themselves.
We claim that the verb alternations found in the French-based creoles were in
all cases shaped by but not necessarily inherited from their lexifier, pace
Chaudenson (2003), Veenstra & Becker (2003), and Veenstra (2009). Consider
the Mauritian verb forms shown in Table 5.2. The examples suggest that the
alternation stems from a single French form from which a second form is
independently innovated. The source form in French is very often the infinitive
but may instead be some other form. For example, Mauritian /kone/ ‘to know’,
though imported as a long form, stems not from the infinitive connaître but from
the . connai(t/s) (itself a ‘short form’ in French). For syncretic forms like
dwa ‘to owe’, there are two possibilities: either they are integrated as LFs (as in the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     117

Table 5.2. Comparison of  and .3 forms

in French with long and short forms in Mauritian
French Mauritian
 .   Gloss

ale va ale al ‘go’

vəni(ʁ) vjɛ̃ vini vin ‘come’
sɔʁti(ʁ) sɔʁ soɚti soɚt ‘exit/go out’
dəvwa(ʁ) dwa dwa dwa ‘owe’
konɛtʁ kone kone konn ‘know’
aswa(ʁ) asjɛ asize asiz ‘sit’

Table 5.3. Sample comparison of long and short forms in four French-based creoles
Reunionese Louisiana Guadeloupean Haitian Gloss
Creole

       
ale al ale alea ale ay ale al/ay ‘go’
vɛne vɛn vini vinb vini vin vini vin ‘come’
soɚti soɚrt sɔɾti sɔɾ sɔti sɔt sɔti sɔt ‘exit/go out’
save sav
konɛt kone kɔnɛ̃ kɔnɛ̃ kɔnɛt kɔnɛt kɔnɛ̃ kɔn ‘know’

Notes:
a
Louisiana Creole has a short form /al/ alternating with a longer form /ale/ meaning ‘to haul/pull’. The
suppletive French form /va/ 3. also appears in some French-based creoles as an irrealis
marker: va in Mauritian and Louisiana Creole. In Reunionese Creole a form /sava/, possibly
lexicalized from the agglutination of the demonstrative with the 3. form of the verb ,
is used in a number of impersonal constructions. Armand (2014) describes it as an auxiliary.
b
In addition to /vin/, both Mauritian and Louisiana Creole have the form /vjɛ̃/. But in both languages,
this is a late borrowing and the two forms are used interchangeably.

case of kone) and the syncretic SFs are derived from them or they enter the
paradigm as SFs from which the corresponding LFs are derived. Notice also
the case of Mauritian asiz ‘to sit’, whose French source is evidently the feminine
past participle assise, is imported as a Mauritian SF from which the corresponding
LF asize is then derived.
Together with Louisiana Creole, French-based creoles spoken in the Indian
Ocean show a more extensive pattern of alternation than New Caledonian creole,
Tayo and the creoles of the French West Indies. Table 5.3 illustrates alternations
from Reunionese, another French-based creole spoken in the Indian Ocean, and
Louisiana Creole, Guadeloupean and Haitian, all spoken in the Americas. In our
view, it is likely that verb alternations in these varieties started out as a sandhi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

118  ,  ,   

alternation that was subsequently exapted to serve one or another function in each
individual creole.
While we focus here only on three French-based creoles, Mauritian,
Guadeloupean, and Haitian, we hypothesize that verb-form alternations in all
French-based creoles are unequivocally more complex than has previously been
acknowledged (see, e.g., Ehrhart 1993; Hazaël-Massieux 2002; Bernini-
Montbrand et al. 2013). As we show in the following section, this complexity is
revealed by the creoles’ processes of deverbal derivation.
In discussing deverbal derivation in these creoles, we will draw upon the
following useful distinctions:

(i) In most cases, an LF may be seen as consisting of a stem plus a particular

vowel; we refer to the stem in this combination as an LF-.
• In many instances, a verb’s LF-stem is simply the verb’s SF, as in the
case of Haitian or Mauritian ‘come’: LF vini, LF-stem/SF vin.
• Occasionally, a verb’s LF ends is a consonant that is absent from the
verb’s SF. Here, too, the LF-stem may be equated with the SF, as in the
case of Haitian ‘do/make’: LF fèt, LF-stem/SF fè.
(ii) In some cases, there is a relation of  between a verb’s LF and
its SF; that is, there is a single form that the grammar of the language treats
as both an LF and an SF.
• In such cases, the syncretized forms may have the vowel-ﬁnal morph-
ology of a typical LF, in which case the LF-stem is distinct from the
SF. In cases of this kind, the LF-stem may have the status of a hidden
stem of the sort discussed in section 5.2 above; we call this a  LF-
. As we will see (section 5.6.3.2), the Haitian verb ‘chat’ has koze as
both its LF and its SF, with koz as a hidden LF-stem.
• But there are also cases in which a verb’s syncretized LF and SF have the
shape of a typical SF; in such cases, one can assume that the LF, the SF
and the LF-stem are all alike, as in the case of Mauritian ‘drink’, whose
LF, SF, and LF-stem are all bwar.
(iii) Finally, a verb may have a hidden stem that is distinct from its LF, its SF,
and its LF-stem; we call this a   . In Mauritian, for
example, the verb ‘drink’ has bwar as its LF, SF, and LF-stem, but also has
the special hidden stem biv- appearing in nominalizations such a biver
‘drinker’.

5.5 Approaches to derivation

Our analysis is based on the theoretical framework of lexeme-based morphology

(Matthews 1972; Aronoff 1994) where the lexeme is deﬁned as a lexical entity
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     119

abstracted away from the syntactic contexts in which it may appear; a lexeme
belongs to a lexical category, has semantic content, and is realized by one or more
word forms through which it participates in syntax. In inflectional languages, a
lexeme is usually associated with a collection of stems used to form the inflected
forms that can be inserted into sentences. For instance, the French verbal lexeme
 ‘to drink’ has a stem /byv/ upon which are built the inflected forms /byvɔ̃/
(buvons ‘we drink’), /byve/ (buvez ‘you () drink’), /byvɛ/ (buvais ‘I was drink-
ing’), etc., and a stem /bwa/ from which are formed the homophonous word forms
/bwa/ (bois ‘you () drink’, boit ‘s/he drinks’). Stems such as /byv/ and /bwa/ are
morphomic in the sense of Aronoff (1994): they participate in formal alternations
whose conditioning cannot be coherently characterized in semantic, morphosyn-
tactic, or phonological terms but must be seen as purely morphological in its
motivation.
For French verbs, Bonami & Boyé (2002, 2003) propose a stem space with
twelve slots; this is a kind of matrix within which each verb’s full inventory of
stems is uniformly specifiable. The stem slots are linked to one another by default
implicative rules, so that for a regular verb, there is a slot whose stem suffices to
determine the stems in all of the other slots in that verb’s stem space. An irregular
verb is a lexeme whose stem space includes at least one stem that overrides a
default implicative rule. Extending this idea, Bonami et al. (2009) show that a
thirteenth stem is needed to account for deverbal lexemes suffixed with the action
nominalizer -ion, the adjectivalizer -if, or the agent nominalizers -eur/-rice. Thus,
both rules of inflection and rules of derivation draw upon a lexeme’s stem space;
an individual stem may, however, be accessible to rules of only one type; for
instance, the thirteenth stem proposed by Bonami et al. (2009) is hidden to
inflection, being accessible only to rules of derivation, as in Table 5.4.

Table 5.4. Stem space of  ‘to form’,  ‘to ﬁnish’, and ́ ‘to defend’
# Stem use   

1 imperfect, pres. 1/2 fɔʁm ﬁnis defɑ̃d

2 present 3 fɔʁm finis defɑ̃d
3 present  fɔʁm fini defɑ̃
4 present participle fɔʁm finis defɑ̃d
5 imperative 2 fɔʁm fini defɑ̃
6 imperative 1/2 fɔʁm finis defɑ̃d
7 pres. subjv.  & 3 fɔʁm finis defɑ̃d
8 pres. subjv. 1/2 fɔʁm finis defɑ̃d
9 infinitive fɔʁme fini defɑ̃d
10 future, conditional fɔʁm fini defɑ̃d
11 simple past, past subjv. fɔʁma fini defɑ̃di
12 past participle fɔʁme fini defɑ̃dy
13 hidden stem fɔʁmat finit defɑ̃s
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

120  ,  ,   

At least ﬁve members of a French verb’s stem set are available as base stems in
instances of deverbal nominalization (Bonami et al. 2009; Tribout 2012). Deverbal
nouns in -age have stem 1 as their base stem (e.g., /netwaj/:  ‘clean-
ing’); deverbal nouns in -ment generally have stem 2 as their base stem (/netwɑ/:
 ‘cleaning’, /ʒonis/:  ‘yellowing’); and the base stem of
a deverbal noun arising by conversion may be stem 3 (/dɑ̃s/:  ‘to dance’!
 ‘dance’), stem 12 (/ɑʁive/:  ‘to arrive’ ! ́ ‘arrival’), or the
hidden stem 13 (/defɑ̃s/: ́ ‘to defend’ ! ́ ‘a defense’).
The selection of a deverbal derivative’s base stem is not uniquely determined by
phonological or grammatical criteria. For example, there are instances in which
more than one of a verb’s stems serves as a base for conversion, as in the case
of  ‘to dive’, whose derivatives include  ‘dishwashing’ (whose
stem /plɔ̃ʒ/ is stem 3 of ) and ́ ‘diving’ (whose stem /plɔ̃ʒe/ is
stem 12 of ). More importantly, base-stem selection has no correlation
with the semantics of the derived nominal: nominalizations expressing action,
result, agent, instrument, or location vary unpredictably with respect to which of
the base lexeme’s ﬁve possible stems serves as their base stem.
Given the dimensions of complexity discussed in section 5.2, we claim that
French derivational relations contribute substantially to the morphological com-
plexity of French. In particular:

(i) base-stem predictability in the deﬁnition of deverbal nominalizations in

French exhibits the highest degree of complexity (degree 2 in Figure 5.1); and
(ii) where X is a verbal lexeme L’s base stem in a particular derivational
relation, the restrictedness of X in L’s morphology may evince the highest
degree of complexity (degree 2 in Figure 5.2).

5.6 Derivational relations in French-based creoles

We now turn to the description and analysis of derivation in Mauritian,

Guadeloupean, and Haitian; in each case, we preface this discussion with a brief
overview of the function of long and short verb forms in the creole under scrutiny.

5.6.1 Mauritian

5.6.1.1 Function of verb forms in Mauritian

In Mauritian, verbs alternate between a short and a long form. Most verbs (70%)
have morphologically distinct forms but some (30%) have syncretic long and short
forms (Henri 2010); the verbs in Table 5.5 are representative of the different
observed cases. Contrary to previous assumptions (e.g., those of Corne 1982),
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     121

Table 5.5. Verb alternations in Mauritian

Verb SF LF

 ‘to think’ pans panse

 ‘to stay’ res reste
 ‘to buy’ aste aste
 ‘to ask’ demann demande
 ‘to amend’ amand amande
 ‘to snore’ ronf ronﬂe
 ‘to drink’ bwar bwar

the alternation is not phonologically predictable and shows an intricate distribu-

tion that encodes morphological, syntactic, and information-structure oppositions
(Henri 2010).
In syntax, a verb’s SF is used in the presence of a non-clausal complement (3) as
opposed to the LF, which appears in the absence of any complement (4a). LFs also
appear with verbs that select clausal complements (4b), have an extracted com-
plement (4c) or are followed by an adjunct (4d).

(3) Toulezour, mo pans mo fami.

everyday, 1. think. 1. family
‘Everyday, I think about my family.’

(4) a. Zan ronﬂe.

John snore.
‘John snores.’
b. Mo panse ki tou dimoun intelizan.
1. think. that every person intelligent
‘I think that everybody is intelligent.’
c. Se mo fami ki mo panse.
It 1. family that 1. think..
‘It’s my family that I think about.’
d. Zan ronﬂe gramatin
John snore. morning
‘John snores in the morning.’

However, a verb’s LF may appear where its SF would otherwise be expected

under certain discourse conditions. In counter-assertions, the LF is interpreted
as an exponent of Verum Focus—using the LF evokes and denies the converse
of the proposition making up the content of the clause (Henri et al. 2008;
Henri 2010).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

122  ,  ,   

(5) a.  : To pa pans to fami zame!

1.  think. 1. family never
‘You never think about your family!’
 : Mo panse mo fami!
1. think. 1. family
‘I do think about my family.’
b.  : To pa fer seki to anvi isi!
2.  do. what 2. want. here
‘You don’t do what you want here!’
 : Mo panse kouma mo le kan mem.
1. think. how 1. want. still
‘I still think like I want to.’

Similarly, post-verbal constituents that are usually construed as adjuncts, while

ordinarily inducing the use of the LF, can appear with SFs if and only if those post-
verbal constituents are focused; this is true of locatives, instrumentals, temporal
adjuncts, and adjuncts of degree, frequency, and manner.

(6) a.  : Kot to manze dan zedi?

where 2. eat.  Thursday
‘Where do you eat on Thursdays?’
 : Mo manz rozil dan zedi!
1. eat. Rose-Hill on Thursday
‘I eat in Rose-Hill on Thursdays’
b.  : Ar ki to manze?
 what 2. eat.
‘What do you eat with?’
 : Mo manz ar lame.
1. eat. with hand
‘I eat with my hands.’

Finally, both the short and the long form are used in lexeme-formation processes
such as reduplication (Henri 2010, 2012). A derived verb formed by reduplication
itself has both an SF and an LF; as the examples in Table 5.6 show, the derived
verb’s SF is a doubling of the base verb’s SF while its LF is the base verb’s SF
combined with its LF.
Heterogeneous distributional patterns such as those of a Mauritian verb’s short
and long forms can be characterized as morphomic (Henri forthcoming), a
property that has been argued to contribute to a system’s integrative complexity
(Aronoff 1994). As we now show, Mauritian derivations are as integratively
complex as those of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     123

Table 5.6. Reduplication in Mauritian

Base lexeme Reduplicated derivative lexeme
SF LF Gloss SF LF Gloss

pans panse ‘think’ pans-pans pans-panse ‘think episodically’

manz manze ‘eat’ manz-manz manz-manze ‘nibble’
res reste ‘stay’ res-res res-reste ‘stay occasionally’
demann demande ‘ask’ demann-demann demann-demande ‘ask occasionally’
bwar bwar ‘drink’ bwar-bwar bwar-bwar ‘sip’

5.6.1.2 Derivational relations in Mauritian

As we have seen, verbs in Mauritian have two basic forms: an SF and an LF. In
instances of deverbal nominalization, verbs vary according to whether their base
stem is their SF or their LF, as the examples in Table 5.7 show. A deverbal
nominalization’s base stem may also be a special hidden stem, as in the case of
biv in Table 5.7. In some instances, it is not immediately clear whether a deverbal
nominalization’s base stem is an LF or an SF: in cases in which a nominalizing
suffix begins with a vowel, the base stem lacks a final vowel, either because it is an
SF (or possibly even a hidden LF-stem) or because it is an LF that has undergone a
(morpho)phonological process of elision serving to avoid vowel hiatus. Other
cases, however, are not ambiguous in this way. In the morphology of the lexeme
 ‘to remain’, for example, the LF reste has a t but the SF res does not; in view
of this fact, the nominalization restan ‘leftovers’ likely involves elision of the LF
reste. Conversions in general are unambiguous with respect to their choice of base
stem. Moreover, they show that derived nominal lexemes have the same kinds of
meanings (action, result, location) whether their stem arises from a verb’s LF or its
SF; thus, a base lexeme’s base stem is not, in itself, predictable in Mauritian.
A verb’s derived nominal stem is not always inherited from the lexifier lan-
guage. Derived nominals like  (stem /dɑ̃se/) ‘dancing’ or  (stem
/luke/) ‘peep’ do not exist in French and thus cannot be inherited. As Mauritian
innovations, these nouns demonstrate that derivation is a productive process from
a qualitative perspective (i.e., the process is still available to form new nouns).
Deverbal nominalizations in Mauritian involve base stems that are both vari-
able and unpredictable (Table 5.7): base stems may be LFs, SFs, special hidden
stems, and perhaps also hidden LF-stems; in some instances they are comparable
in complexity to deverbal nominalizations in French. In particular, base-stem
predictability in the definition of deverbal nominalizations in Mauritian exhibits
complexity of degree 2 (see again Figure 5.1).
Because the grammar of Mauritian defines complementary syntactic distribu-
tions for a verbal lexeme’s LF and SF, both of these function as inflected forms and
neither, therefore, is hidden. But we also identified instances where a special
hidden stem is used in the formation of derived nouns. As a consequence,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

124  ,  ,   

Table 5.7. Deverbal nominalizations in Mauritian

Verb → Noun
LF SF by conversion by suffixation
danse danse ‘dancing; ball’
‘to dance’ dans-er/ez* ‘dancer’
dans (la)dans ‘dance’

louke louke ‘peep’

‘to peep’ louk-er* ‘peeping Tom’
louk
chake
‘to stroll’ chak-er* ‘stroller’
chak chak ‘stroll’

‘to reste rest-an ‘leftovers’

remain’ res (le)res ‘rest’
bwar bwar ‘drink’
‘to drink’ special
biv-er ‘drinker’
hidden
stem biv labiv-et ‘bar’

‘to insult’ kamoufle kamoufle ‘insults’

[‘to cover ‘episode of
hidden LF- -
kamouflaz*
with insulting’
insults’] stem kamoufl
In a given row, each nominalization has that row’s verb form as its base stem.
*An asterisk marks a derived stem that is morphologically ambiguous, involving either
(a) a base stem that is an SF or hidden LF-stem or (b) a base stem that is an LF whose
final vowel undergoes prevocalic elision.

Mauritian derivations exhibit a degree of base-stem restrictedness similar to that

of French (see again Figure 5.2).

5.6.2 Guadeloupean

5.6.2.1 Function of verb forms in Guadeloupean

Guadeloupean shows significantly fewer verbs having distinct long and short
forms compared to Mauritian. We propose that the grammar of Guadeloupean,
like that of Mauritian, makes essential reference to a grammatical distinction
between long and short forms, but that the Guadeloupean lexicon differs from
that of Mauritian insofar as most verbs exhibit syncretism between their long and
short forms. We have identified thirty-four verbs having morphologically distinct
short and long forms, based on a sample of 1,824 verbs extracted from two
dictionaries (Tourneux & Barbotin 2008; Bernini-Montbrand et al. 2013);
Table 5.8 provides a sample of verbs having distinct long and short forms.
As is the case in Mauritian, LFs alternating with a morphologically distinct SF
usually end in a vowel in Guadeloupean, specifically e and i, but with more
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     125

Table 5.8. Verb alternations in Guadeloupean

Verb SF LF

́ ‘to look’ gay/gad gadé

́́ ‘to put’ mèt mété
́ ‘to know’ sav savé
́ ‘to hold’ ken kenbé
 ‘to come’ vin vini
̀ ‘to do’ fè fèt
́ ‘must’ fo falé
 ‘to give’ ba(n) bay

members in the i class (four members in Mauritian vs. ten in Guadeloupean).

Guadeloupean also presents two cases in which a verb’s LF ends in a consonant
that is absent from its SF—fèt /fɛt/ ~ fè /fɛ/ ‘to do’ and bay /baj/ ~ ba(n) /ba/ or
/bɑ̃/ ‘to give’; neither is found in Mauritian. Given the restrictedness of the
phenomenon in Guadeloupean, one might think of Guadeloupean verb alterna-
tions as irregularities in a system in which verbs usually exhibit only a single form
and in which alternations that do arise can be argued to be phonologically
systematic, conforming to a small number of patterns ranging from the truncation
of a final segment or syllable (mété /mete/ ~ mèt /mɛt/ ‘to put’; fouté /fute/ ~ fou
/fu/ ‘give’) to a combination of final truncation with nasal spread (défandi /defɑ̃di/
~ défann /defɑ̃n/ ‘to defend’) or nasal shift (kenbé /kɛ̃be/ ~ ken /kɛn/ ‘to hold’).
There are also instances of partial suppletion, as in alternations such as gadé /gade/
~ gay /gɛ/ ‘to look’ or falé /fale/ ~ fo /fo/ ‘must’.
Our view is that the difference between the Guadeloupean verb system and that
of Mauritian is a difference of degree, not of kind. In particular, we assume that in
the grammars of both languages, long and short verb forms are systematically
distinguished but that the two forms are syncretic in some cases; this syncretism is
more widespread in Guadeloupean than in Mauritian, but that is a lexical fact
rather than a fact of grammar. This perspective entails that in both languages, LFs
possess a systematic cluster of properties distinct from that possessed by SFs—that
a verb exhibiting distinct long and short forms is not an irregular verb whose
forms possess their own peculiar distributional idiosyncrasies, but fits into a larger
pattern. The simplest assumption is that this larger pattern is common to all verbs,
but that a verb’s conformity to the pattern is often obscured by the same kind of
poverty of forms as characterizes English verbs such as hit, spread, and cost (which
exhibit a single form for the infinitive, the non-3 present, the past, and the past
participle).
Guadeloupean verb alternation codes an aspectual distinction, where SFs are
usually interpreted as referring to single events (as in (7a)–(14a)) and LFs as
referring to multiple events (as in (7b)–(14b)). In the absence of other TAM
markers, the long and short alternants may also express tense contrasts: in (7a) the
SF expresses present tense, while in (7b), the LF expresses past tense (or passé
composé). (Guadeloupean resembles Louisiana Creole in this respect.)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

126  ,  ,   

(7) a. An ken ni ba-’w.

1. hold. 3 -’2.
‘I hold it for you.’ (single event)
b. An kenbé-’y ba-’w.
1. hold.-’3. -’2.
‘I held it for you.’ (multiple events)

When SFs are combined with the progressive marker ka, the interpretation is that of
what might be called a ‘progressive completive’, as in (8a); but the combination of
an LF with ka (as in (8b)) instead has a prospective reading, in which a multiplicity
of future events, potentially but not necessarily completed, is understood.

(8) a. A(n) ka vin.

1.  come.
‘I’m coming all the way.’ (‘progressive completive’)
b. A(n) ka vini.
1.  come.
‘I’m planning to come.’ (prospective)

Similarly, SFs with the irrealis marker ké or the past tense marker té may have a
single event interpretation; the SF sav ‘know’ in (9a) has a single event interpret-
ation, and the SF mèt ‘put’ in (10a) may receive either a single event or multiple
events interpretation. By contrast, LFs combine with ké and té to express multiple
events, as in (9b) and (10b).

(9) a. An pé ké sav konté.

1.   know. count.
‘I won’t know how to count (on that occasion).’
b. An pé ké savé konté.
1.   know. count.
‘I won’t know how to count (in general).’

(10) a. I té mèt pima adan.

3.  put. pepper inside
‘He/She put pepper in it (on that occasion / in general).’
b. An té mété pima adan.
1.  put. pepper inside
‘He/She put pepper in it (in general).’

This contrast is of course not obvious in cases in which the long and short forms
are syncretized. The data in (11) exemplify syncretic verbs exhibiting meanings
that are ambiguous between the single-event and the multiple-event
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     127

interpretations. However, no prospective reading is available in (11a). Speakers

typically use kay¹⁰ instead of ka to express the prospective in these contexts.

(11) a. An mangé kribich.

1. eat./ crawﬁsh
‘I eat/ate crawﬁsh.’ (present single-event/past multiple-event)
b. A(n) ka(y) dòmi.
1.  sleep
‘I am sleeping.’ (‘progressive completive’ or prospective)
c. Timoun-la té chanté on bel chanson
child-  sing./  beautiful song
‘The child sang a beautiful song.’ (past single- or multiple-event)
d. Pon moun pé ké bougé.
no person   move./
‘No one will move.’ (irrealis single- or multiple-event)

A subclass of verbs shows different constraints: SFs of the verbs ́́ ‘to peep’, ́ ‘to
look’, and ́ ‘to put/give/leave’ are only used as imperatives, as in (12); these reﬂect
a more direct borrowing from French, with the exception of the form gay /gɛ/ (12b),
apparently a creole neologism. A comparable behaviour is seen with ̀, whose short
and long forms discriminate between the active and the passive/causative, as in (13).

(12) a. Fou sa la!

put. this here
‘Put this here!’ (rude)
b. Gay bonda-la-sa!
look. ass--
‘Look at this ass!’

(13) a. Manman a-’w ka fè mangé.

Mother -’2.  make. food
‘Your mother is making food.’
b. Mangé ka fèt.
food  make.
‘Food is cooking.’

Finally, the verb  ‘to give’ features semantic contrasts but also sandhi effects.
With non-pronominal objects, we ﬁnd both the form bay and ba combined with
the irrealis marker ké, with the former form encoding an irrealis single-event

¹⁰ The form kay probably derives from the contraction of the TAM marker ka with ay (from the
short form of the verb  ‘to go’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

128  ,  ,   

meaning (as in (14a)) and the latter encoding an irrealis multiple-event meaning
(as in (14b)). With pronominal noun phrases, the form ba precedes a vowel-initial
pronoun (14c) and ban, a nasal-initial pronoun (14d).

(14) a. An ké bay on tap.

1.  give.  slap
‘I’ll slap you (on that occasion).’
b. An ké ba on tap.
1.  give.  slap
‘I’ll slap you (in general).’
c. An ba-’w li.
1. give.-’2. 3
‘I give/gave it to you.’
d. Jan ban mwen tout lajan a-’y.
1. give. 1. all money -’3.
‘John gives/gave me all his money.’

5.6.2.2 Derivational relations in Guadeloupean

Guadeloupean shows less verb alternation than Mauritian. When Guadeloupean
verbs do have both an LF and an SF, deverbal nominalization seems to favour
the LF as the verb’s base stem. Verbs having syncretic forms also give rise to
deverbal nominalization. Both cases are illustrated in Table 5.9. Like Mauritian,
Guadeloupean exhibits derived nominals that do not exist in French (e.g., ́,
́, ́ in Table 5.9); such innovations reveal that deverbal nominaliza-
tion is qualitatively productive in Guadeloupean.
Guadeloupean grammar defines distinct syntactic distributions for a verbal
lexeme’s long and short word forms; for some verbs, these are distinct forms
(e.g., vini / vin ‘to come’) though for most, the two forms are syncretized. But even
for verbs that do not exhibit a distinct SF, there is sometimes evidence for a
distinct LF-stem with its own special distribution.
A large number of verbs that lack distinct long and short forms have a present
participle formed by means of a suffix -an; the examples in (15) illustrate.
Examples of this sort exhibit an ambiguity similar to that observed for
Mauritian in section 5.6.1.2: either -an attaches to the verb’s LF-stem or it attaches
to the verb’s LF with prevocalic elision of the LF’s final vowel.

(15) ́ ‘to lie’ !  ‘lying’

́ ‘to fight’ !  ‘fighting’
́ ‘to mix’ !  ‘mixing’
́ ‘to drink alcohol’ !  ‘drinking’
Several operations of deverbal nominalization exhibit a similar pattern in
Guadeloupean; these include the operation of -è /ɛ/ suffixation, which forms
agent nouns, and the operations of -aj and -asyon suffixation, which form action
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     129

Table 5.9. Deverbal nominalizations in Guadeloupean

Verb → Noun
LF SF by conversion by suffixation
vini vini ‘arrival’
‘to come’
vin
sòti sòti ‘outing’
‘to go out’
sòt
gadé gadé ‘look’
‘to look’
gad
babouké
‘to constrain’ LF-stem babouk-aj* ‘halt’
babouk ‘constraint’
babouk
‘to fight’ goumé goumé ‘fight’
badiné
badin-è* ‘joker’
‘to joke around’ LF-stem badin-aj* ‘joke’
badin
chomé chomé ‘party’
‘to have fun’ LF-stem chom-aj* ‘party’
chom
pwofité
‘to take
LF-stem pwofit-asyon* ‘benefit’
advantage’ pwofi ‘benefit’
pwofi(t)
poupoulé
‘to tease’ LF-stem
poupoul-man ‘teasing’
poupoul
In a given row, each nominalization has that row’s verb form as its base stem.
*An asterisk marks a derived stem that is morphologically ambiguous, involving
either (a) a base stem that is an LF-stem or (b) a base stem that is
an LF whose final vowel undergoes prevocalic elision.

nouns; these operations are exempliﬁed in Table 5.9, with additional examples in
(16)–(18).¹¹ Here, too, the derivational sufﬁx joins with either a verb’s LF-stem or,
with elision, its LF.

(16) ́ ‘to cuddle’ ! ̀ ‘cuddler’

́ ‘to stroll’ ! ̀ ‘stroller’
(17) ́ ‘to exchange’ !  ‘exchange’
́ ‘to unite’ !  ‘union’

¹¹ The sufﬁxal derivatives in Table 5.9, in (15)–(17), and in (20) are cited from Villoing & Deglas
(2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

130  ,  ,   

(18) ̀́ ‘to annoy’ ! ̀ ‘annoyance’

 ‘to follow’ !  ‘pursuit/chase’

Villoing & Deglas (2016) pursue the assumption that such derivations involve a
sandhi operation by which vowel hiatus is avoided through the prevocalic elision
of an LF’s final é. Additional evidence, however, reveals that at least some cases
cannot be attributed to prevocalic elision but must be seen as involving direct
suffixation to a verb’s LF-stem.
Consider, for example, the operation of -man suffixation, by which action
nouns such as those in (19) are derived.

(19) ́ ‘to tease’ !  ‘teasing’

́́ ‘to hurry up/start moving’! ́ ‘moving/
activating’
́́ ‘to separate’ ! ́ ‘separation’

As these examples show, deverbal nouns suffixed with -man also lack the final é of
the verb’s LF. Here, however, the absence of the final é cannot be attributed to
hiatus avoidance, since the suffix begins with a consonant. Moreover, nouns such
as , ́, and ́ have no counterparts in French
and so cannot simply be inheritances from the lexifier. The only explanation is that
they are productively formed in Guadeloupean through the direct suffixation of
-man to a verb’s LF-stem. Moreover, Occam’s Razor favours the assumption that all
of the operations in (15)–(19) involve direct suffixation to a verb’s LF-stem.
By maintaining a distinction between a verb’s SF and its LF-stem, we can arrive at
a straightforward account of deverbal nominalizations such as those in (20) as well
as denominal verb derivations such as those in (21). On one hand, the deverbal
nominalizations in (20) are conversions of a verb’s LF-stem to a noun; by contrast,
the derivations in (21) are conversions of a noun to a verb’s LF-stem, to which the
suffixal formative for a verb’s LF then attaches. This account contrasts with that of
Villoing & Deglas (2016), who regard the derivations in (20) and (21) as involving
processes of suffixation that induce elision rather than processes of conversion.

(20) ́ ‘to ﬂirt’ !  ‘a ﬂirt’

́ ‘to offend’! ̀ ‘an insult’
́ ‘to stroll’ !  ‘a stroll’

(21)  ‘zouk’ ! ́ ‘to dance zouk’

̀ ‘Christmas’! ́́ ‘to celebrate Christmas’
 ‘drizzle’ ! ́ ‘to drizzle’
 ‘refuge’ ! ́ ‘to take refuge’

Our analysis assumes the coexistence of deverbal nominalizations whose base

stem is a verb’s LF (e.g., goumé ‘to fight’ ! goumé ‘fight’) with those whose
base stem is a verb’s LF-stem (e.g., LF-stem bas ‘to flirt’ ! bas ‘flirt’). This analysis
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     131

predicts that a particular verb may give rise to two derived nominal stems, one
based on the verb’s LF, the other on its LF-stem. This prediction is indeed borne
out: ́ ‘to win’ has two derived nominals, ́ ‘victory’ (whose stem is
the verb’s LF) and  ‘win’ (whose stem is the verb’s LF-stem).
In summary, we assume that every verb has an LF-stem, even if it doesn’t
exhibit distinct long and short word forms; for those that do, the SF shares the
form of the LF-stem. Postulating an LF-stem for every verb offers a unified
analysis of both denominal verb derivation and deverbal nominalization (whether
by conversion or by the addition of a derivational suffix).
On this account, Guadeloupean derivation shows a degree of complexity equiva-
lent to those of French and Mauritian with respect to base-stem predictability. In
Guadeloupean, a verbal lexeme’s base stem is its LF in some cases and its LF-stem in
others; thus, base-stem predictability in the definition of deverbal nominalizations
exhibits complexity of degree 2. By contrast, it is not clear that Guadeloupean
deverbal nominalizations ever have a hidden form as their base stem; not even a
verb’s LF-stem can be claimed to be hidden in view of its use in the formation of a
present participle, an inflected form. Guadeloupean deverbal nominalizations there-
fore exhibit a base-stem restrictedness whose complexity is no higher than degree 1.

5.6.3 Haitian

5.6.3.1 Function of verb forms in Haitian

Only twelve out of 2,657 verbs excerpted from Valdman et al. (2007) alternate
between a long and a short form (Table 5.10). The alternation is, according to
Alleyne (1996), the result of a phonological reduction, or more precisely that of a
syllabic reduction (Cadely 1994).
The function of the alternation shows some similarities with both Mauritian and
Guadeloupean. DeGraff (2001) argues that truncation occurs when verbs are
followed by non-pronominal objects (22a) but fails when the verb is in sentence-
ﬁnal position (22b), has an extracted object (22c) or is followed by an adjunct (22d).

(22) a. Mari gen kouraj.

Marie have. courage
‘Marie has courage.’ (DeGraff 2007)
b. Tonton Bouki ap ale.
uncle Bouki  go.
‘Uncle Bouki is leaving.’
c. Konbyen dan Tonton Bouki genyen?
how_much tooth uncle Bouki have.
‘How many teeth does uncle Bouki have?’
d. Le klosh ape sone aster.
the bell  ring. now.
‘The bells are ringing now.’ (Roberts 1999)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

132  ,  ,   

Table 5.10. Verb alternations in Haitian

Verb SF LF

́ ‘to go’ al alé

́ ‘to look’ gad gadé
ò ‘to go out’ sòt sòti
 ‘to come vin vini
 ‘to eat’ gen genyen
̀ ‘to do/make’ fè fèt
 ‘to give’ ba(n) bay

Notice that the behaviour in (21b) is also attested in Guadeloupean with the verb
ale. The opposition fèt fè ‘to do/make’ also occurs in both creoles. In addition,
DeGraff (2001) claims that LFs are used for emphasis. He concludes that verb
alternations in Haitian are an instance of inﬂectional morphology whose realiza-
tion is determined by phonological phrasing and argumenthood.

5.6.3.2 Derivational relations in Haitian

Deverbal nominalization is evidently productive from a qualitative perspective in
Haitian, since a number of derived nominal stems have no counterpart in French,
for example those in (23).

(23) a.  ‘to run’ !  ‘the action/result of running’

b.  ‘to lie’ !  ‘the action/result of lying’ (Lefebvre 1998)

Because very few verbs in Haitian exhibit an overt inflectional alternation between
long and short forms, there are few cases of derivation where one can readily
identify the choice of one alternant over the other. When cases of this sort do
occur (typically in conversions), they involve the LF in some instances and the SF
in others, as in Table 5.11.
Suffixal derivation of nouns from verbs often involves a vowel-initial suffix, as
in (24); the existence of a sandhi rule eliminating vowel hiatus by means of stem-
final vowel truncation might (as in Guadeloupean) be claimed to allow such
derivatives to be based on a verb’s LF. But as in Guadeloupean, the noun-forming
suffix -man does not create vowel hiatus; its appearance in post-consonantal
positions therefore cannot be attributed to elision, but must be seen as the effect
of direct suffixation to a verb’s LF-stem. In some cases (e.g., (25)), the resulting
nominalization has no counterpart in French, and so cannot be seen as a direct
inheritance from the lexifier. We must therefore assume that as in Guadeloupean,
a Haitian verb’s LF-stem sometimes participates directly in the workings of its
derivational morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     133

Table 5.11. Deverbal nominalizations in Haitian

Verb → Noun
LF SF by conversion by suffixation
vini vini ‘arrival’
‘to come’
vin
alé alé ‘departure’
‘to go’
al
sòti sòti ‘going out’
‘to go out’
sòt
gadé
'to see’
gad gad ‘look’
genyen
‘to win, to gain’
gen gen ‘gain’
djòle
‘to chat’ hidden LF- djòl-è* ‘talker’
stem djòl
tranché tranché ‘labor pain,
‘to cut up, to shoemaker’s
slice’ hidden LF- knife’
tranch-man ‘pain’
stem tranch
bati
‘to build’ special
‘construction
hidden batis-man
stem batis (action)’

In a given row, each nominalization has that row’s verb form as its base stem.
*An asterisk marks a derived stem that is morphologically ambiguous, involving
either (a) a base stem that is a hidden LF-stem or (b) a base stem that is an LF whose
final vowel undergoes prevocalic elision.

(24) a.  ‘to bet’ !  ‘a bet’

b. ̀ ‘to chat’ ! ̀̀ ‘talker’ (Lefebvre 1998)

(25)  ‘to chat’ !  ‘a chat’12 (DeGraff 2003)

VN compounds might seem to afford a parallel argument, since the verb in such
compounds often appears to be an LF-stem; for example,  ‘break’,  ‘break’,

¹² Nominalizations similar to kozman include for instance ajoutman ‘addition’, frapman ‘knocking’
and pledman ‘discussion, quarrel’, which are absent in contemporary French but found in Medieval
French. DeGraff (2003: 69) rightfully argues that these might have been inherited from regional
varieties spoken in the colonies in the seventeenth century.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

134  ,  ,   

and  ‘walk’ all seem to be represented by their LF-stems in the compounds
̀ ‘a destructive individual’ (Fr. -), ̀ ‘hard question’ (Fr. -
̂), and  ’stoop, steps to a house’ (Fr. -); but these com-
pounds all apparently originate in French, and evidence of the productivity of
exocentric VN compounds is in general lacking in Haitian (Lefebvre 1998: 345).
A final parallel between Haitian and Guadeloupean pertains to denominal
verbs. Verbs are apparently derived from nouns by means of a suffix -e, which
sometimes produces verb forms having no counterpart in French. (The examples
in (26) illustrate.) But as in Guadeloupean, these can instead be seen as instances
of N!V conversion whose output is a verb’s LF-stem (in which case -e has the
role of an LF-forming verb suffix); here again, distinguishing a verb’s LF-stem
from its SF affords a more streamlined account of derivation.

(26) a.  (stem pansyon) ‘thought, anxiety’

!  (LF pansyon-e) ‘to think, to ponder’
b.  (stem makak) ‘stick’
!  (LF makak-e) ‘to hit with a stick’
c.  (stem bourik) ‘donkey, work horse’
!  (LF bourik-e) ‘to work like a dog’
d. ̀ (stem tèk) ‘a hit (in marbles)’
! ̀ (LF tèk-e) ‘to hit a marble’
(Lefebvre 1998; DeGraff 2003)

It is clear that at least some Haitian verbs possess special hidden stems. Each of the
verbs in (27) has a special hidden stem used in derivation (e.g., with the nomin-
alizing sufﬁx -man: vomis-man) but not in inﬂection. The productivity of this
pattern of alternation is attested to by the fact that it gives rise to derivatives
having no counterpart in French, as in (28).

(27)  ‘to vomit’  ‘vomiting’

 ‘to refresh’  ‘refreshment’
 ‘to cool’  ‘cooling’

(28)  ‘to build’  ‘construction (action)’13

 ‘to ﬁnish’  ‘end’
̀/̀ ‘to thank’ ̀ ‘thanking’

Thus, relations of deverbal nominalization in Haitian are comparable in complex-

ity to those of Mauritian and French. The base stem in deverbal nominalization is
the LF for some verbs, the SF for others, the LF-stem for others, and a special

¹³ Finissement and b^
atissement can be found in Medieval French, but not *remercissement.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     135

Table 5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean,

and Haitian
French Mauritian Guadeloupean Haitian

Degree of complexity in base-stem 2 2 2 2

predictability
Degree of complexity in base-stem 2 2 1 2
restrictedness

hidden stem for still others. Base-stem predictability in the deﬁnition of deverbal
nominalizations therefore attains complexity of degree 2. And given that the base
stem in some deverbal nominalizations is a special hidden stem, base-stem
restrictedness in the deﬁnition of these nominalizations likewise exhibits com-
plexity of degree 2.

5.7 Conclusion

In this chapter, we have presented criteria for assessing the integrative complexity
of a morphological system’s derivational relations, and we have applied these
criteria in an analysis of derivational relations in Mauritian, Guadeloupean, and
Haitian. We have demonstrated that each of these languages possesses deverbal
nominalizations that are not a mere inheritance from the lexifier language but
must be seen as the effect of a productive process within the creole itself.
Moreover, we have shown that the complexity of the derivational relations in
these creoles attains the same degree of complexity as those of the lexifier; our
results are summarized in Table 5.12.
When a verb L is the base lexeme in a derivational relation, the identity of L’s
base stem in L’s stem set is not, in general, predictable either in French or in
Mauritian, Guadeloupean, or Haitian; moreover, the status of L’s base stem in the
definition of L’s morphology may be as peripheral in Mauritian and Haitian as in
French. These results challenge the extreme simplicity that has so often been
attributed to creole morphology. We hypothesize that as further work is done on
the morphology of creole languages, other sorts of derivational processes will be
found to exhibit a comparable level of integrative complexity.

Acknowledgements

We would like to thank Jean-Michel Benjamin for his input on the Guadeloupean data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

6
Simpliﬁcation and complexiﬁcation in
Wolof noun morphology and morphosyntax
Michele Loporcaro

6.1 Introduction

In this chapter, I will describe how Wolof noun morphology has become simplified,
compared with the system that can be reconstructed for a previous stage through
comparison with other Atlantic languages (the subdivision of the Niger-Congo
family to which Wolof belongs). On the other hand, I will also show that, in some
respects, Wolof noun morphology and especially morphosyntax has become more
complex—more complex than in previous stages of the language and also more
complex than usually assumed in the literature—acquiring new irregularities.
The Wolof—and Atlantic—facts will be scrutinized against the background of
recent research on linguistic complexity. Since the study is about the grammatical
system and does not adduce any psycholinguistic evidence (from language usage
and/or processing), I will be addressing what the relevant literature (e.g., Dahl
2004: 39; Miestamo 2008: 27; Sinnemäki 2008: 72; Lindström 2008: 217) labels
‘absolute complexity’, not what is sometimes called ‘relative complexity’ (Kusters
2008: 4–8), that is, memory cost/difficulty (Hawkins 2007).
The chapter is organized as follows: in section 6.2, I introduce the language and
its classification; in section 6.3, I present the basics of the Wolof noun class system,
which is then placed in its Atlantic context in section 6.4.¹ In section 6.5, I will
briefly introduce the distinction between complexity and morphological richness—
as defined in the literature on morphological complexity I take as a point of
reference (in particular Baerman et al. 2010; 2015b; 2017; Dressler 2011)—and
how complexity and richness relate to morphological type, to then move on to

¹ While the data from other Atlantic languages are drawn from the available literature, for Wolof
available sources are complemented with ﬁrst-hand data from the variety of Mbakke (Mbacke), lying
about 150 kilometres east of Ndakaaru/Dakar, in the territory of the traditional kingdom of Bawol
which is part of the Wolof heartland, the area on whose dialects the standard variety of Wolof is based.
These were collected in cooperation with Cheikh Anta Babou, to whom I am indebted, and are
presented in more detail in Babou & Loporcaro (2016). Glossing obeys the Leipzig glossing rules: in
addition,  indicates class marker (without numbering for Wolof, since contrary to other Niger-
Congo languages mentioned in the chapter, there is no agreed-on numbering of noun classes in studies
on Wolof).

Michele Loporcaro, Simpliﬁcation and complexiﬁcation in Wolof noun morphology and morphosyntax In: The Complexities of
Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Michele Loporcaro.
DOI: 10.1093/oso/9780198861287.003.0006
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     137

considering complexifying changes which have affected Wolof noun morphology,

changing many aspects of what must have earlier been a coherently agglutinative
system into a system which, in addition to many properties going towards the
isolating type, also developed some inflectional irregularities normally found in
inflecting-fusional languages. The section also compares similar developments in
other Atlantic languages, while section 6.6 addresses complexification in the para-
digm of agreement targets. Finally, in section 6.7, I discuss whether the diachronic
dynamics of change observed in the language may be explained in external terms,
considering the sociolinguistic setting of the language and the nature of the speech
community in which it is spoken.

6.2 Wolof and Atlantic languages

Wolof is the native language of four million (Lewis et al. 2015) to 4.5 million
(Leclerc 2015), and the main inter-ethnic lingua franca among the thirteen million
inhabitants of Senegal. It is also spoken in Gambia (about 226,000 speakers),
where it is the second most spoken language after Mandinka, Mali (62,000
speakers), Mauritania (around 16,400 speakers), and Guinea Bissau, as well as
in migrant communities in Europe (France, Italy, and Spain) and the USA (mainly
New York City).²
The evidence to establish change in Wolof is twofold: on the one hand, the
language has been described thoroughly since the early nineteenth century (cf.
Dard 1825, 1826, Boilat 1858, Kobès 1869, etc., with some news on relevant
aspects of its structure available since as early as the late sixteenth century: cf.
Doneux 1978: 45), so that changes leading to the present situation can be followed
through the extant documents and descriptions. Transcending this limited time-
depth requires reconstruction, and this poses problems since the classiﬁcation of
Wolof within the Northern Atlantic branch of Niger-Congo is debated: the
traditional view considers Wolof most narrowly related to Fula, and places
Wolof/Fula, together with Seereer, in a Senegambian subdivision of Atlantic (cf.
Sapir 1971: 47f; followed by Wilson 1989: 87f; Childs 2004, 2010: 36, etc.), while
Doneux (1978: 43–5) and Segerer (2010: 4f) propose alternatively that the closest
relative to Wolof is the Ñuun (also: Bagnoun, Bainuk, Baïnounk) language/dialect
cluster (straddling Casamance, in Southern Senegal, the north of Guinea-Bissau,
and Gambia), and Pozdniakov (2015: 58) lists Fula/Seereer, Buy/Nyun, and Wolof
as three different branches of Northern Atlantic. Be that as it may, all the

² Occasionally, one comes across much lower ﬁgures in the literature: see, for example, Njie (1982:
16), reporting slightly more than one million speakers (‘le wolof se parle en Gambie et au Sénégal par
un peu plus d’un million de personnes’). Higher ﬁgures (e.g., the 7.5 million reported by Perrin 2012:
11) are given by authors not drawing the distinction between native/L1 and vehicular/L2 usage of
Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

138  

languages mentioned display a better-preserved noun class system of the Niger-

Congo type than Wolof, a fact that must be kept in mind when reconstructing past
changes leading to the grammatical system observed today.

6.3 Wolof noun classes: the basics and the received view

In the rich literature on Wolof, the language is invariably described as featuring ten
noun classes (henceforth abbreviated NCs), eight singular and two plural, marked on
determiners and other noun modiﬁers occurring adnominally as well as pronomin-
ally.³ A complete list of the usually assumed classes is given on the horizontal dimension
in (1), while (1a)–(1d) exemplify the larger list of class-marked function words:

(1)  

NC marker b- g- k- j- l- m- s- w- y- ñ-
a. proximal deﬁnite article bi gi ki ji li mi si wi yi ñi
b. distal deﬁnite article ba ga ka ja la ma sa wa ya ña
c. proximal demonstrative bii gii kii jii lii mii sii wii yii ñii
d. distal demonstrative bee gee kee jee lee mee see wee yee ñee
etc.

Taking the proximal deﬁnite article, the following examples illustrate NC contrasts:

(2) a. xarit b-i b. góor g-i

friend -. man -.
‘the friend’ ‘the man’
c. nit k-i d. jëkkër j-i
person -. husband -.
‘the person’ ‘the husband’
e. ndongo l-i f. njëngtéef m-i
disciple -. sorcerer -.
‘the disciple’ ‘the sorcerer’
g. soxna s-i h. far w-i
honourable lady -. lover/ﬁancé -.
‘the honourable lady’ ‘the lover/ﬁancé’

³ Cf., for example, Boilat (1858: 11ff); Rambaud (1898: 11); Delafosse (1927: 30f); Labouret (1935:
46); Gamble (1957: 134); Sauvageot (1965: 72–4); Stewart & Gage (1970: 392); Sapir (1971: 75); Irvine
(1978: 43); Thiam (1987: 9); Fal et al. (1990: 17); Mc Laughlin (1997: 2); Munro & Gaye (1997: ix);
Becher (2001: 42); Ndiaye (2004: 26); Camara (2006: 11); Diouf (2009: 153); Guérin (2011: 84); Tamba
et al. (2012: 895); Torrence (2013: 16); Pozdniakov & Robert (2015: 548). The notion ‘noun class’ is
used in different ways by different authors, within and beyond African language studies (see the
discussion in Babou & Loporcaro 2016: 4–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     139

i. xarit/jëkkër/ndongo/njëngtéef/soxna/far y-i
friends/husbands/disciples/sorcerers/ladies/lovers -.
‘the friends/husbands/disciples/sorcerers/lovers’
j. góor/nit ñ-i
man/person -.
‘the men/persons’

As usual in Atlantic languages, there is a disproportion between classes, in several

respects: (a) a disproportion with respect to number, as there are eight singular
classes as opposed to only two classes traditionally recognized for the plural: yi
plurals ((2i)), and ñi plurals ((2j)); and (b) an imbalance in numerosity. The
exhaustive list of ñi plurals (eleven lexemes in all, all denoting humans) is the
following:

(3) gaa/gan/géer/gor/góor/jaam/jigéen/
people/guest/non-casted/free man/man/slave/woman/
mag/maggat/ndaw/nit ñi
adult/old person/youngster/person .-.
‘the people/guests/non-casted/free men/men/slaves/women/adults/old
people/youngsters/persons’

All the rest of the nouns take yi in the plural ((2i)). Likewise, in the singular the bi
class in (2a) accounts for the vast majority of nouns, and has been constantly
attracting new members, as schematized in (4) (based on Becher 2001: 42–52):

(4) incidence of the bi class among singular nouns:

a. b. c. d.
Nineteenth- Twentieth- Today, urban/Dakar Today,
century rural century rural urban/Banjul
44% > 64% > ‘for the most part’ > 90%
Dard (1825), Irvine (1978: Tamba et al. (2012: Becher (2001:
Kobès (1875) 51) 894, n. 5) 47f)

Its incidence has grown from less than 50% in nineteenth-century rural Wolof to
near generalization in the contemporary urban language. As a result, the agree-
ment pattern selected by most nouns in all varieties of Wolof is the one in (5)
(singular bi/plural yi):⁴

⁴ This is the default agreement class (consisting of the two default NCs for singular and plural), both
in lexical and in syntactic terms: lexically, loanwords are assigned bi/yi class membership (cf. Rambaud
1898: 22; Stewart & Gage 1970: 392; Guérin 2011: 83); syntactically, there are rules substituting yi for
other plural markers under certain conditions (cf. Babou & Loporcaro 2016: 16, 31f).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

140  

(5) a. buur b-i noppi na/*na-ñu . . . moom . . .

/king .-. ready .3/-3 3
‘the king is ready; he . . . ’
b. wuur y-i noppi na-ñu/*na . . . ñoom . . .
/king .-. ready -3/.3 3
‘the kings are ready; they . . . ’

Note that class-agreement is marked exclusively on determiners (boldfaced in

(5)), while adjectives (which really are stative verbs in Wolof) do not mark class
contrasts. Verb auxiliaries and pronouns mark person and number, not class.

6.4 Wolof within the Atlantic context

Thus, Wolof has moved far away from the pervasiveness of agreement typically
observed in Niger-Congo, including Atlantic languages. Compare the Fula examples
in (6), where the word for ‘king’ is class-marked itself and controls class-agreement
on adjectives and function words; or the Baïnounk examples in (7), with class-
agreeing demonstratives, adjectives, and numerals; or those from Diola-Fogny in (8),
with class-agreement also on the verb (again, class markers are boldfaced for clarity):

(6) Pular, Fuuta Jaloo (Guinea; Diallo 2010: 80f):

a. lan-ɗo maw-ɗo mo yiiɗ-en on ko janan-o
king-. old-. . see-.1  be foreigner-.
‘the old king we saw is a foreigner’
b. lan-ɓe maw-ɓe ɓe yiiɗ-en ɓen ko janan-ɓe
king-. old-. . see-.1  be foreigner-.
‘the old kings we saw are foreigners’

(7) Baïnounk, Gubaher; Ñuun (Casamance, Senegal; Cobbinah 2010: 186)

a. bә-kәr ba-m-ba / bә-kәr-әŋ ba-naːk-aŋ
-chicken -.- / -chicken- -two-
‘this chicken’ ‘two chickens’
b. feːbi fa-dikaːm / feːbi-ɛŋ fa-naːk-aŋ
goat -female / goat- -two-
‘female goat’ ‘two goats’

(8) Diola-Fogny (Casamance, Senegal; Sapir 1965: 24, 90)

a. bu-bәːr-ә-b bә-mәk-ә-b bu-lɔlɔ
9-tree--9 9.-big--9 9-fall
‘the big tree fell’
b. u-bәːr-ә-w wә-mәk-ә-w u-lɔlɔ
8-tree--8 8.-big--8 8-fall
‘the big trees fell’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     141

Since this pervasiveness of class marking on nouns and different agreement targets
is a property reconstructed for Niger-Congo, and for Atlantic, Wolof has lost it,
which boils down to loss of complexity, under the view that redundancy adds to
complexity, maintained by Dahl (2004: 10) among others:

the spread-out of information from a segment of the signal to its neighbours

means that the mapping from input to output—and thus the system as such—
becomes more complex. (Dahl 2004: 10)

Indeed, most of the changes in noun morphology and morphosyntax from

Atlantic to Wolof produced simplification, in one way or the other: there has
been loss of redundancy in agreement (as readily apparent from comparison of (5)
with (6)–(8)), and reduction in the number of NCs (Proto-Atlantic had about
fifteen NCs; Doneux 1975: 114), which amounts to loss in constitutional com-
plexity, in Rescher’s (1998: 9) terms. We have also seen (in (4)–(5)) that there is a
trend towards the generalization of the default NCs.
This is the kind of changes the literature on Wolof tends to focus on. However,
there were also changes which made the system more complex, leading to the rise
of (previously absent) morphological irregularity (in static morphology; in
Dressler’s 2011: 161 terms), both on nouns (with the rise of inflectional classes
(ICs), untypical for agglutinating languages), and on agreement targets (rise of
defective and otherwise irregular paradigms). These are the changes on which
I am going to focus in what follows.

6.5 Complexiﬁcation in Wolof noun inﬂection, against

the background of Atlantic noun class systems

6.5.1 Morphological complexity vs. morphological richness

Niger-Congo languages on the whole have agglutinative morphology. In an

ideally agglutinating language, as pointed out, for example, by Dressler (2011:
160), we expect to ﬁnd less complexity than in languages of the inﬂecting-
fusional type:

Strongly inﬂecting-fusional languages have a sizeable amount of morphological

richness, but also many unproductive patterns, i.e. additional morphological com-
plexity. Strongly agglutinating languages have much more morphological richness,
but ideally no unproductive morphological patterns, a situation nearly completely
obtained by Turkish. (Dressler 2011: 160)⁵

⁵ As is well-known in Turkish ‘there are no inﬂectional classes’ (Wurzel 1989: 74).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

142  

To recognize this, though, one has to distinguish complexity from richness of

inﬂection:

I agree with Baerman et al. (2010: §1) that the size of a paradigm is not a
primary criterion of complexity; it is [ . . . ] a criterion of morphological richness
dependent on the importance of inﬂectional morphology in the morphology–
syntax interface. (Dressler 2011: 160)

Under this view, the morphology of an ideally agglutinating language is rich, not
complex. To mention just one crucial aspect, relevant for the present discussion,
such a language, lacking inflectional classes, lacks ‘the additional structure
imposed by inflectional morphology, above and beyond its dedicated task of
expressing syntactic and semantic distinctions’ (Baerman et al. 2010: 1).
As a final remark to this section, note that the use of notions such as ‘agglutin-
ating’ and ‘inflecting-fusional’ in morphological typology has been criticized, most
influentially by Haspelmath (2009), who analyses what he calls the ‘Agglutination
Hypothesis’ into three distinct indexes (the Cumulation, the Alternation, and the
Suppletion Index) and takes it to be falsified by the fact that, on the whole, the
languages in his sample score differently on the three. A language displaying one-to-
one correspondence between form and meaning in inflectional morphology scores
higher on the Cumulation Index than languages allowing for one-to-many corres-
pondences. The ‘Alternation Index’, on the other hand, assigns 0 to languages
‘which exhibit complete stem invariance’, and higher values to languages showing
more ‘stem alternations, that is, the (co-)expression of morphological categories by
changing, rather than adding to, the stem’ (Haspelmath 2009: 17). The ‘Suppletion
Index’, finally, is ‘defined as the average percentage of subcategories (per category-
system) that exhibit affix suppletion’ (Haspelmath 2009: 22).
Note that the only Niger-Congo language in the sample (Swahili) scores 0.1 on
the Cumulation Index, while a paramount instance of an agglutinating language
such as Turkish (Haspelmath 2009: 23) scores 0. Both Swahili and Turkish also
score 0 on the Alternation Index. On the Suppletion Index, on the other hand,
Turkish scores 23/100 and Swahili 28/100, which is far from 0 (Nivkh) but much
closer to it than to the score reached by a typically ‘inflecting-fusional’ language
like Latin (84/100).
Thus, despite the scepticism Haspelmath airs about the usefulness of the ‘agglu-
tinating’ vs. ‘inflecting-fusional’ distinction, his own data show that it is far from
odd to qualify languages such as Turkish or Swahili as consistently agglutinating, for
the purposes of the present study. More broadly, Haspelmath’s line of argument
seems to be at odds with the notion itself of a ‘type’, whose legitimacy cannot be
called into question by pointing to empirical objects which poorly fit the ideal
instantiation of it, however defined, given that ‘linguistic types’ are ‘ideal constructs
which natural languages approach to various degrees’ (Dressler 2005: 7).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     143

6.5.2 The emergence of inﬂectional classes in Wolof

Like Niger-Congo in general, Wolof too has agglutinating morphology, but this is
today the case only in the verb, since the noun has become almost completely
invariable, as reﬂected in the white dot (meaning ‘no distinct plural form’) on the
WALS map 33 on noun plurality (Dryer 2013), a fact remarked ever since the
earliest descriptions of Wolof.⁶ However, while such remarks and the WALS white
dot are accurate for the overwhelming majority of Wolof nouns, uninﬂectedness
has not yet triumphed completely. In fact, one of the rare lexemes still preserving
two distinct forms, that is, buur ‘king’, has already been displayed in (5). The same
is the case for about twenty nouns (listed in (9)), whose singular and plural differ
because of an alternation in the initial consonant:⁷

(9)   Gloss

a. mbaam mi baam yi ‘donkey’
mbootaay mi bootaay yi ‘piggyback’
ndono li dono yi ‘heritage’
ndab li dab yi ‘utensil’
ndënd mi dënd yi ‘drum’
ngàttaan mi gàttaan yi ‘short one’
b. mbagg mi wagg yi ‘shoulder’
c. baaraam bi waaraam yi ‘ﬁnger’
boroom bi woroom yi ‘owner’
buur bi wuur yi ‘king’
buy bi wuy yi ‘baobab fruit’
d. pepp mi fepp yi ‘grain’8
e. këf ki yëf yi ‘thing’

⁶ On noun invariability in Wolof, see the early remarks by Dard (1826: 14): ‘Mais si le nom n’est
pas suivi de la préposition ou, on ajoute après ce nom les articles ya, yi, you, sans jamais rien changer
dans son orthographe’ [‘But if the noun is not followed by the preposition ou, one adds after this
noun the articles ya, yi, you, withouth ever changing anything in its orthography’]. Similarly, Boilat
(1858: 7) points out: ‘En Wolof, les noms ne changent pas de terminaison dans les différentes
combinaisons que leur fait éprouver le discours, pas même en passant du singulier au pluriel’ [‘In
Wolof, nouns do not change ending in the different combinations in which discourse places them,
not even when they change from singular to plural’]. Thus, ‘le substantif est invariable’ [‘the noun is
invariable’] (Boilat 1858: 11).
⁷ The alternations—as described in Sauvageot (1965: 74); Diagne (1971: 79); Diouf (2009: 155);
Camara (2006: 7–8), etc.—may take different forms, illustrated in (9). The proximal form of the definite
article—already seen in (1)–(2)—is added after each word form, to indicate that the two occur in
distinct environments (thus glosses expand to ‘the x/the x’s right here’).
⁸ Camara (2006: 8) also reports pan/fan ‘day/days’, showing the same p-/f- consonant alternation as
in (9d). However, this paradigm is no longer attested in Mbakke Wolof, where the formerly plural form
fan has generalized and is used for singular as well: for example, benn fan jàll na ‘one day has passed’.
The lexeme fan is reported as invariable also in also Fal et al.’s (1990: 70) dictionary: fan wi ‘the day’/
ñaari fan ‘two days’. The older singular form pan still occurs only in the fixed expression weer-u benn
pan ‘the first day of the month’ (literally ‘crescent-. one day’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

144  

f. bët bi gët yi ‘eye’

bëñ bi gëñ yi ‘tooth’
g. loxo bi yoxo yi ‘hand, arm’
h. waa ji gaa ñi ‘guy’

For most lexemes, this difference today is only optional since—with the sole
exception of këf ‘thing’—the singular form may, and indeed tends to, be used in
plural contexts, while the reverse is not the case (see Guérin 2011: 85; Babou &
Loporcaro 2016: 10). Once uninflectedness is generalized, noun morphology will
have become simplified again, but as long as paradigms such as those in (9)
survive, they represent an increase in morphological complexity, determined by
changes which introduced morphological irregularity of the sort familiar from
inflecting-fusional languages: in other words, that in (9) is evidence for the
occurrence of (residual) inflectional classes in Wolof. Note also that free variation
in the plural cell of those noun lexemes determines overabundance (Thornton
2011; Meakins & Wilmoth, Chapter 4, this volume), that is, variation between two
cell-mates (Loporcaro & Paciaroni 2011: 420), thus contributing to a local increase
in complexity, if only ephemeral, on the way towards simplification.

6.5.3 Agglutinative noun-class morphology and inﬂectional

classes in other Atlantic languages

The initial consonant alternations defining these inflectional classes are the last
remnants of two distinct but intertwined processes which are observed—with
varying degrees of regularity—in the neighbouring Atlantic languages, and spe-
cifically, in those to be considered as representative comparator languages from
the North Atlantic branch under either classification hypothesis for Wolof (see
section 6.2), that is, either Fula and Seereer or Ñuun. The two processes are one
morphological (NC-prefixation), the other morphonological (initial consonant
mutation). Integration of initial consonant mutation into the NC system is an
innovation that is currently reconstructed for Proto-Northern Atlantic (see
Pozdniakov 2015: 60), even if not preserved in all daughter languages: in Ñuun
languages, ‘the system is barely operative now, but can be partly reconstructed’
(Wilson 2007: 86), and the same is true of Wolof, as discussed in (18)–(19) below.
In Fula and Seereer, by contrast, the consonant mutation system itself and its
interaction with NCs are well-preserved.
As an illustration consider the word koor ‘man’ in Seereer-Siin (or Siin-
Gandum, the most conservative variety of Seereer in this respect, spoken in the
Sine region of Senegal; see Faye 2013: 3, 9). This nominal root may occur, with
distinctive morphology, in several of the sixteen NCs of the language (see Mc
Laughlin 2000: 336)—eleven of them displaying overt class prefixes, five lacking
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     145

them, all selecting class-marked enclitic determiners—thus generating word forms

such as the following (see also Mc Laughlin 1997: 6):

(10) koor ‘man’ o- koor-oxe Class 1 singular Seereer-Siin

goor-we Class 2 plural
o- ŋgoor-oɴɢe Class 12 diminutive singular
fo- ŋgoor-ne Class 13 diminutive plural
a- ŋgoor-ale Class 3b augmentative singular
(-)man-

In (10), one observes consonant mutation on the stem-initial consonant, exempli-

fied here with koor ‘man’, which appears as koor, goor, or ngoor, depending on the
class: ‘Stem-initial consonant mutation in Seereer-Siin is morphologically condi-
tioned by noun class in nouns and dependent adjectives’ (Mc Laughlin 2000: 335).
Fula, on the other hand, has twenty-one to twenty-five NCs, according to
dialects,⁹ and having lost all NC prefixes, contrasts NCs by means of suffixes,¹⁰ on
nouns as well as on agreement targets, resulting in very elaborate paradigms. The
initial consonant of both stems and suffixes is subject to mutation, whose effects are
exemplified in (11)–(12) with data from the dialect of Gombe (Northern Nigeria),
excerpted from the detailed account offered by Arnott (1970: 79–109):

(11) Fula, Gombe, N. Nigeria (Arnott 1970: 87). Sufﬁx grades, lexically selected
(invariable stems):
Grade A Grade B Grade C Grade D Class Gloss (grammatical)
ɓoy-re leemuu-re tummu-de loo-nde 9 ‘x’
ɓoy-e leemuu-je tummu-ɗe loo-ɗe 24 ‘x’s’
ɓoy-el leemu-yel tummu-gel loo-ŋgel 3 ‘small x’
ɓoy-um leemu-yum tummu-gum loo-ŋgum 5 ‘worthless little x’
ɓoy-on leemu-hon tummu-kon loo-kon 6 ‘small x’s’
ɓoy-a leemu-wa tummu-ga loo-ŋga 7 ‘big x’
ɓoy-o leemu-ho tummu-ko loo-ko 8 ‘big x’s’
‘baobab fruit’ ‘orange’ ‘calabash’ ‘storage pot’ Gloss (lexical)
ɓoy- leemu(u)- tummu- loo- Stem

The horizontal dimension shows grade alternation in sufﬁxes, while on the

vertical dimension an arbitrary selection of NCs is offered for illustration. For
nominal stems, the grade depends on the class, which in turn correlates largely

⁹ For the Senegalese variety of Pulaar Mc Laughlin (1997: 7) describes twenty-one NCs, while
twenty-two are reported for the one described by Sylla (1982: 31) and twenty-five for the Gombe dialect
(Northern Nigeria) described by Arnott (1970: 75).
¹⁰ This ‘affix renewal’ occurs not only in North Atlantic, as also in ‘at least one language of South
Atlantic, Kisi, the normally prefixed NCMs [= noun class markers] are suffixed’ (Childs 2009: 117; see
Childs 1983 and the recent discussion by Di Garbo 2014: 80).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

146  

(though not perfectly, see Arnott 1970: 73) with the semantics, as shown in the
gloss column on the right-hand side: thus, for instance, class twenty-four hosts
word forms which are plural to class 9; class 3 is the corresponding diminutive
singular, which pluralizes in turn as class 6; class 5 is diminutive/pejorative;
and so on.
For suffixes, by contrast, the grade is lexically selected by the (lexical specifica-
tion of the) stem. The data in (11) exemplify invariable stems, where only class
suffixes vary according to the class-dependent consonant grade, while the noun
stem stays the same because its initial consonant is an invariable one, not involved
in consonant mutations, observed here only on suffixes. Thus, for instance, in
class 9 the forms -re, -de, -nde, marking different grades, are related morphono-
logically via mutation with each other, and are selected by the individual noun
lexemes so that, for example, ‘baobab fruits’ cannot be *ɓoy-je/-ɗe (i.e., cannot
take plural class 24 suffixes of grades B–D) because of lexical specification.
The nouns in (12), by contrast, exemplify what Arnott (1970: 93) calls ‘variform’
stems (only some consonant alternations are displayed here, as selected by
grades A, C, and D; in other words, (12) displays an arbitrary selection, not only
of noun classes, but also of grades and consonant alternations; the reader is
referred to Arnott’s description for a full account of the intricacies of this
fascinating system):

(12) Fula, Gombe, N. Nigeria (Arnott 1970: 98). Consonant alternation in noun
stems of different grades:
Grade A Grade A Grade C Grade D Sufﬁx grade (selected)
r/d/nd w/b/mb w/g/ŋg y/g/ŋg C- alternation on stem
Class Gloss (grammatical)
dim-o beer-o gor-ko gim-ɗo 1 ‘x’
rim-ɓe weer-ɓe wor-ɓe yim-ɓe 2 ‘x’s’
dim-el beer-el gor-gel gim-ŋgel 3 ‘small x’
dim-um beer-um gor-gum gim-ŋgum 5 ‘worthless little x’
ndim-on mbeer-on ŋgor-kon ŋgim-kon 6 ‘small x’s’
ndim-a mbeer-a ŋgor-ga ŋgim-ŋga 7 ‘big x’
ndim-o mbeer-o ŋgor-ko ŋgim-ko 8 ‘big x’s’
‘free man’ ‘host’ ‘man’ ‘person’ Gloss (lexical)
rim- weer- wor- yim Stem

For instance, the first two stems rim- ‘free man’ and weer- ‘host’ select the same
class suffixes (both grade A) but differ in the initial consonant, while the other
two, wor- ‘man’ and yim- ‘person’, select allomorphs of the class suffixes which
differ from each other, apart from some syncretisms (seen in classes 6 and 8).
Thus, for instance dim-o, gor-ko and gim-ɗo all display what is morphologically
the same class 1 suffix, but in different allomorphs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     147

In other words, what we have here is different inflectional classes, in spite of the
overall agglutinating character of Fula morphology. The Fula situation, as for the
selection of the forms of each NC suffix, is closer to that of an inflecting-fusional
language like Italian, with ICs, than to that of a strongly agglutinating language
like Turkish, without inflectional classes, as schematized in (13):

(13) inﬂectional classes in Fula? a. Turkish b. Fula c. Italian

i. alternative forms of the inﬂections are + –
related phonologically
ii. alternative forms of the inﬂections are + – –
selected phonologically

Turkish has no inflectional classes, since the alternants of each affix are selected
phonologically (e.g., ev/ev-ler ‘house/-s’ vs. yol/yol-lar ‘trip/-s’, with plural -ler/-lar
depending on the front/backness of the root vowel), while Italian has because can-
e/can-i ‘dog()-/-’ vs. lup-o/lup-i ‘wolf()-/-’) take different singular
endings, not derivable from each other phonologically ((13i)), due to lexical
specification ((13ii)).¹¹ In Fula too, ‘there seems no advantage in treating all
suffixes of each class as morphophonemic variants of a single class suffix’
(Arnott 1970: 68). In fact, while in some cases one observes, between different
suffix grades, alternations that could be accounted for through independently
valid morphonological rules of the language (e.g., the alternation between voiced
and voiced prenasalized stops between Grades C–D in Classes 3, 5, or 7), this
cannot be generalized, since, for example, in Class 1 -ko (Grade C) and -ɗo (Grade
D) are not related morphonologically. Thus, Fula differs in this respect from an
ideally agglutinative language such as Turkish and rather resembles Italian, where
inflections are selected depending on inflectional class (a lexeme-inherent purely
morphological property) and are not derived by morphonological rule from one
another. In sum, there is no alternative but to recognize the occurrence of
inflectional classes in Fula too, though this—as highlighted in Babou &
Loporcaro (2016: 44)—is a descriptive notion which is hardly used in the gram-
mars of Atlantic languages.
More generally, Atlantic languages offer interesting evidence for the rise of
inflectional classes within an agglutinating system.¹² This applies also to the Ñuun

¹¹ Here, an editorial comment asked: ‘why not analyse -o/-e as part of the stem truncated before
plural -i?’. This corresponds to Scalise’s (1983: 293–4) vowel deletion rule, and the alternative between
the two is indeed a handbook topic in Italian morphology: the reader is referred to Thornton (2005:
160), who shows that this readjustment rule becomes superﬂuous under a word and paradigm
approach to morphology.
¹² An anonymous reviewer comments that, with the present discussion, ‘The author seems to
suggest that inﬂectional classes of nouns are an innovation in the history of individual languages’.
Actually, one must recognize ICs for previous stages of Atlantic languages: as observed in n. 16, the
same mechanisms of consonant gradation responsible for IC-contrasts in Fula are currently assumed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

148  

language/dialect cluster, the alternative closest relevant comparator languages for

Wolof under the second classification hypothesis in section 6.2. For Baïnounk, as
shown for different dialects by Sauvageot (1967), Bao Diop (2015; on Baïnounk
Gunyamolo) and Cobbinah (2010; on Baïnounk Gubaher), one has to assume
inflectional classes, since not all nouns are subject to singular and plural formation
via NC prefixes. Rather, in Baïnounk Gubaher (spoken in the village of Djibonker,
south of Ziguinchor, in the Casamance), analysed by Cobbinah (2010: 182–7), only
one subset of the noun lexemes forms singular and plural prefixally ((14a)), while
another substantial subset displays suffixal plurals formed with a default suffix -Vŋ,
and divides into a group with plural suffix only ((14b)) and a mixed group
combining a plural class-marked prefix and the plural class-neutral suffix ((14c)):¹³

(14) Baïnounk Gubaher (Cobbinah 2010: 182–7)

a. prefixal class marking, paired for  and : for example, ra-maːsix
‘crab’/  ɟa-maːsix
b. no prefix in the ;  suffix (class-neutral -Vŋ): for example, bәːb
‘father’/  bәːb-әŋ ‘fathers, old men’
c. prefixal class marking in the ;  with prefix and class-neutral suffix:
bә-kәr ‘chicken’/  bә-kәr-әŋ

While (14a) mirrors the inherited Niger-Congo noun inflection, the rest is the
product of a series of innovations (e.g., the prefixes occurring in type (14c) nouns
‘do not occur as singular prefixes in the paired prefixed groups or if so then
only very rarely’; Cobbinah 2010: 186), which makes the recognition of different
inflectional classes, as schematized in (14), necessary, even if the combination
of morphs in noun word forms largely stayed agglutinative, rather than fusional,
in nature.
This evidence could be multiplied, another case in point being, for example,
Diallo’s (2010), (2014: 151–81) study of the adaptation of borrowed Mande nouns
leading to the creation of inflectional classes (not present in the native lexicon) in
Fuuta-Jaloo Pular, the Fula variety spoken in the Fuuta-Jaloo area in Guinea. This
shows that all over the area a trend towards the creation of allomorphy in nominal
paradigms (and new inflectional class distinctions) is observed.

for earlier stages of Wolof as well. However, this is orthogonal to the fact that new morphological
irregularities, deﬁning (new types of) ICs, can be shown to have arisen, as is the case with the stem
alternations in (9), which deﬁne (residual) ICs (a) of a kind different from that reconstructed for earlier
stages of Atlantic, and (b) that are not usually recognized in the literature, before Babou & Loporcaro
(2016).

¹³ Pozdniakov (2015: 79–82) reviews pluralizing sufﬁxes (-Vn/ŋ) from different Atlantic languages
suggesting that they may be etymologically related with the plural class marker for humans reﬂected in
Wolof as ñ-.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     149

6.5.4 The complexiﬁcation of Wolof noun inﬂection

As seen in section 6.5.3, thus, Wolof is not the only Atlantic language to have
developed morphological irregularities of the kind found in fusional languages.
Since such irregularities add to morphological complexity (section 6.5.1), one
must recognize that even the morphological system of Wolof, much less rich
than those seen in section 6.5.3, has developed new forms of complexity.
Recapitulating so far, the marking of NC contrasts in the North Atlantic
languages considered above can be summarized as follows (after Mc Laughlin
1997: 7, with one small modiﬁcation):¹⁴

(15) Class markers in some North Atlantic languages (Mc Laughlin 1997: 7, revised)
    
  
a. Seereer-Siin √ √ √
b. Fula √ √ √
c. Wolof (traces) (traces) √

As seen for Fula in (11)–(12), in this language consonant mutations and suffixation
(which replaced prefixation in the affix renewal process: see n. 10) are involved in
lexically conditioned allomorphy defining inflectional classes. Some remnants of this
situation persist in Wolof ((15c)), though this has neither class prefixes nor class-
marked clitics nor suffixes but, in its present state, marks NC only on determiners.
These remnants are the singular/plural alternations in (9), which concerned many
more lexemes in the nineteenth century, as shown in (16), listing lexemes which now
have lost consonant alternation but still had it according to nineteenth-century sources:

(16) Becher (2001: 50f): nouns with allomorphy in Boilat (1858) and Kobès (1875)
  Gloss / today (Fal et al. 1990)
banta bi wanta yi ‘stock’ bant bi/yi ‘bit of wood’
badoolo mi wadoolo yi ‘peasant’ baadolo bi/yi
bakan bi wakan yi ‘nose’ bakkan bi/yi
bopa bi gopa yi ‘head’ bopp bi/yi
garab gi yarab yi ‘tree’ garab gi/yi

Further language-internal evidence comes from the indeﬁnite article, which is the
only noun determiner to occur categorically in pre-nominal position (while

¹⁴ The modification consists in indicating the occurrence of traces of earlier prefixes for Wolof: see
(9) as well as the diachronic data in (16)–(17). In particular, I am non-committal about Mc Laughlin’s
distinction between ‘clitic determiners’ and ‘independent determiners’, a distinction one anonymous
reviewer finds fault with: ‘I have serious doubts about the validity of the distinction between “clitic
determiners” and “independent determiners”.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

150  

deﬁnite article and demonstratives normally follow the noun, though demonstratives
can also be preposed), and the only one to display the class marker after, rather than
before, its class-invariable part. According to Doneux (1975: 49), this doubly excep-
tional distribution arose via reanalysis of earlier preﬁxes, reconstructed as seen in (17a):

(17) Doneux (1975: 49): Wolof prenominal  article < former class preﬁx on
noun
a. a-b sëriñ ‘a healer’ < *a-b-sëriñ
b. sëriñ b-i ‘the healer’ < *bi-sëriñ b-i (bixirim, AD 1594; Ferronha
1994: 24f)

Converging documentary evidence for earlier prefixes, seen in (17b), comes from
a Portuguese voyager, who—writing in 1594—calls bixirim what is today sëriñ b-i
‘the healer’, which is evidence, as Doneux comments, ‘qu’un préfixe (probable-
ment figé) était encore utilisé à cette époque’ (Doneux 1975: 45). While in this
lexeme, like in most Wolof nouns, the prefix has been simply dropped, one may
argue that some of today’s irregular singular/plural alternations in Wolof (seen
above in (9)) show the traces of former class prefixes, which have become fused with
the stem, as observed also in other Atlantic languages.¹⁵ Among those irregular
alternations, some others come instead from consonant mutations, which are
regularly involved in NC inflection in other Atlantic languages (cf. (15a–b) and
the examples above in (10)–(12)). In Wolof, consonant mutation is still regular in
some derivational processes, such as diminutive or deverbal noun formation:

(18) a. diminutive formation:

garab gi ‘the tree’ ! ngarab si ‘the little tree’
janq bi ‘the little girl’ ! njanq si ‘the very little girl’
b. deverbal noun formation:
digël ‘advise’ ! ndigël li ‘the advice’
jang ‘study’ ! njang mi ‘the education/knowledge’

The overall mutation pattern, as observed in today’s derivational morphology, is

as follows:

(19) Wolof consonant mutations (Mc Laughlin 1997: 4):

a. base/non-diminutive b d j g s x ʔ
b. derivative/diminutive mb nd nj ng c q k
In noun inﬂection, however, there is no regular mechanism of consonant mutation
contrary to Seereer-Siin and Fula ((15a–b)), but inﬂectional alternations—nowadays

¹⁵ This has been remarked by many scholars: cf. Pozdniakov & Robert (2015: 551) for a recent
recapitulation. As for other Atlantic languages, see, for example, Cobbinah (2010: 189) on the so-called
‘literal alliterative concord’ in Baïnunk: ‘the disputed elements [ . . . ] are archaic noun class morphemes
in different stages of fusion with the stem’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     151

irregular—such as (9a–d) and maybe (9h) can be interpreted as remnants thereof.¹⁶

Conversely, alternations such as këf/yëf ((9e)) must go back to original prefixes, as
suggested by alliteration with the class-marked determiners, while they cannot pos-
sibly come from consonant mutation because—as seen also in (18)–(19)—this only
involves homorganic consonants in all Atlantic languages: ‘the range of variation,
called a , is always restricted to homo-organic consonants, e.g. f/p/mp’ (Sapir
1971: 65). By the same token, one can argue that, e.g., pepp mi/fepp yi (9d) ‘the grain/-s’
may have arisen as an instance of a nowadays lost type of consonant mutation.
Summing up, the regular mechanisms occurring elsewhere in the noun inflec-
tion of other Atlantic languages—consonant mutation and class prefixation—have
been conflated into a synchronic system for which one has no other choice but to
assume (residual) inflectional classes, that is, that kind of morphological com-
plexity usually occurring in inflecting-fusional languages.

6.6 Complexiﬁcation in Wolof: paradigmatic irregularity

in some agreement targets

Concluding section 6.4, I mentioned changes which led to the rise of morphological
irregularity also in the paradigm of agreement targets: in fact, in the indeﬁnite
article, some defective and otherwise irregular paradigms have been created in
Wolof, which are not inherited from Proto-Atlantic. This boils down to an increase
in formulaic complexity (descriptive and generative), in Rescher’s (1998: 9) terms.
To see this, however, we have to abandon morphology proper and consider
morphosyntax, since agreement is a crucial criterion to establish the irregular
paradigms I will be concerned with. The agreement facts at stake crucially involve
the recognition (as in Babou & Loporcaro 2016) of two additional NCs in the plural
(boldfaced in (20b)) with respect to the current view ((1), repeated here in (20a)):

(20) a. Wolof: eight singular and two plural classes (traditional analysis):
 
NC marker b- g- k- j- l- m- s- w- y- ñ-
b. Wolof: eight singular and four plural classes (Babou & Loporcaro 2016):
 
NC marker b- g- k- j- l- m- s- w- y- ñ- j- s-

The singular/plural pairings of NCs traditionally recognized, even in the

most accurate treatments available before Babou & Loporcaro (2016), are sche-
matized in (21a–b) (from Guérin 2011: 84, who highlights that most

¹⁶ See Pozdniakov (1993: 85) and Pozdniakov & Robert (2015: 552f) for a reconstruction of the set of
initial consonant mutations—richer than the one still observed today in (19)—involved in NC-related
alternations in an earlier stage of Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

152  

singular classes combine with both the traditionally recognized plurals, thus
resulting in (21b) rather than (21a)), while (21c) schematizes Babou &
Loporcaro’s (2016) account:¹⁷

(21)
(a) (b)
Expected pairings Observed pairings
Singular Plural Singular Plural
k- k-
g- g-
ñ- ñ-
j- j-
m- m-
s- s-
l- l-
y- y-
b- b-
w- w-

(c)

Observed pairings
Singular Plural
k- ñ-

b- y-

¹⁷ Singular/plural pairings of NCs deﬁne distinct genders: cf. Corbett’s (1991: 190f) analysis of
Wolof and Fula.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     153

Note preliminarily that several of the pairings in (21), as well as several of the NCs
themselves, are established based on small amounts of lexemes. This ‘inquorate’
character (in Corbett’s 1991: 170–5 terms) is however a normal situation in
Atlantic languages, as remarked by Ferry & Pozdniakov (2001: 166):

Il est faux de penser que chaque appariement de classe nominale, faiblement

représenté, refléterait un figement ou la disparition de prefixes ayant existé. Les
langues atlantiques se caractérisent par un trait particulier: on y rencontre
souvent une classe spéciale ne comportant que deux ou trois noms ou même
un seul. [ . . . ] Chaque langue atlantique présente au moins un mot ayant un
accord statistiquement rare, irrégulier, qui traduit une notion sélectionné et
marquée dans cette culture précise.

[It is wrong to think that each weakly represented NC pairing reflects the fixation or
the disappearing of prefixes that once existed. Atlantic languages are characterized
by a particular feature: in these languages, one often comes across a special class
featuring no more than two or three nouns, or even just one. [ . . . ] Each Atlantic
language displays at least one word that has a statistically rare, irregular agreement
pattern, which translates a selected and specific notion in that very culture.]

Thus, if a consistent syntactic behaviour, distinct from that of other NCs, can be
identiﬁed for a set of nouns, however small, this must count as evidence to
establish a separate NC. This is what Babou & Loporcaro (2016) did for two
additional NCs, the plural classes ji and si. These are homophonous with two
singular classes, but must be kept distinct from them because they differ in the
agreements they trigger. This is a principle of method that holds in general and is
standardly applied also in studies of the Atlantic languages. For example, consider
Arnott’s (1970: 72) account of the two homophonous ko classes of Gombe Fula
(classes 20 and 8), one singular, one plural, distinguished by agreement:

There are two ko classes (8 and 20), with agreement marked by -o, -ho, -ko, ko-,
ko elements, etc.; but they are distinguished (i) by the different category of initial
consonant in full nominals (F-category in class 20, N-category in class 8 [ . . . ]),
and (ii) by the different pattern of agreement with verbal radicals [ . . . ], class 20
being a singular class requiring F- or P-category initial in the verbal radical, while
class 8 is a plural class requiring N-category initial in the radical, e.g.:

20 huɗo ko’o wonnake this grass has got spoiled

but 8 mbinndirko ko’o mbonnake these big pens have got spoiled

Exactly the same happens in Wolof, where what is indeed two couples of distinct
classes have been previously confused, disregarding the evidence from verb
agreement. This is in fact the only morphosyntactic diagnostic, independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

154  

from class-marker assignment, allowing one to assess the difference between

singular and plural NCs in Wolof. Applying the agreement test, it is easy to see
that what has been previously lumped together into one class, the si NC, indeed
consists of two distinct NCs. On the one hand, si is selected by singular nouns such
as soble ‘onion’, in (22a), whose plural is soble yi ((22b)):

(22) a. soble s-i baax na / *na-ñu

onion .-. good .3 / -3
‘the onion is good’
b. soble y-i baax na-ñu / *na
onion .-. good -3 / .3
‘onions are good’

On the other hand, other nouns that select si, viz. those in (23b), take plural verb
agreement (while, of course, when used in the singular the same nouns take
another class marker):

(23) a. Séeréer s-i jekk na-ñu / *na

Seereer .-. handsome -3 / .3
‘the Seereers are handsome’
sëriñ s-i ñów na-ñu / *na
healer .-. arrive -3 / .3
‘the healers have arrived’
b. Séeréer b-i jekk na / *na-ñu
Seereer .-. handsome .3 / -3
‘the Seereer is handsome’
sëriñ b-i ñów na / *na-ñu
healer .-. arrive .3 / -3
‘the healer has arrived’

The same can be repeated for plural ji (jeeg/janq ji ‘the women/little girls’, (24b)),
which is distinct from singular ji, seen in (2d) and exempliﬁed again in (24c):

(24) a. jeeg/janq b-i sonn na / *na-ñu

lady/little girl .-. tired .3 / -3
‘the lady/little girl is tired’
b. jeeg/janq j-i sonn na-ñu / *na
lady/little girl .-. tired -3 / .3
‘the ladies/little girls are tired’
c. jigéen j-i sonn na / *na-ñu
woman .-. tired .3 / -3
‘the woman is tired’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     155

The fact that these are plurals has been overlooked in the literature on Wolof up to
now because traditionally plurals such as Séeréer si and jeeg ji have been called
‘collective’, in the wake of Sauvageot’s (1965: 73) inﬂuential statement:

A l’opposition de nombre singulier/pluriel, s’ajoute celle du collectif. Ce dernier a

pour particularités a) de ne pas posséder d’expression propre le distinguant du
singulier; b) de ne pas avoir de correspondant pluriel.

[To the singular/plural number contrast, one has to add that of collective. The
peculiarities of the latter are: a) it does not possess a dedicated expression
distinguishing it from the singular, b) it has no corresponding plural.]

There are indeed other African languages—also within the Atlantic family—for
which it is justiﬁed to assume a separate value of the category ‘number’, which is
called traditionally ‘collective’ (cf., e.g., Sapir 1965: 61, 64, on Diola-Fogny), or
‘collective plural’:

In addition to the ﬁrst plural, used with countable nouns, many nouns can
combine with a second plural, which is a collective plural for non-countable
quantities, or non-speciﬁed numbers of entities (Cobbinah 2010: 184)

The author, describing Baïnounk Gubaher, refers to triplets such as the following:

(25) a. ra-maːsix ran-de

-crab .-big
‘big crab’
b. ɲa-maːsix ɲa-naːk
-crab .-two
‘two crabs’ (count plural)
c. ɟa-maːsix ɟa-ŋaːn
-crab .-.
‘those crabs’ (collective plural)

Alternative terminologies include ‘pluriel limité ≠ illimité’ (Sauvageot 1967: 227 on

Baïnounk Gunyamolo) or ‘greater plural’ vs. unmarked plural (Corbett 2000: 31):
A potentially interesting case of a language with a greater plural is Banyun [ . . . ].
Nouns typically have singular and plural, distinguished by preﬁxes of the type
shared by many Niger-Kordofanian languages [ . . . ]. In addition there is a greater
plural (which Sauvageot calls ‘unlimited’) [ . . . ] which Sauvageot suggests is used
when the number cannot be counted or the speaker feels it unnecessary.¹⁸

¹⁸ To illustrate, Corbett (2000: 31) cites the paradigm bu-sumɔl ‘snake’ singular ≠ i-sumɔl ‘snakes’
plural ≠ ba-sumɔl ‘snakes’ greater plural.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

156  

Unlike in these languages, however, in Wolof there is never a three-way contrast

of the kind observed, for example, in Baïnounk Gubaher ((25)), and verb agree-
ment guarantees that the contrast is binary, singular vs. plural.
The same two pairs of NCs newly recognized in (21c)—singular vs. plural ji and
si—are crucial to illustrate the rise of morphological irregularities observed in the
paradigm of the indeﬁnite article. Its regular formation is schematized with three
nouns from different classes in (26b), compared with that of the deﬁnite article ((26a)):

(26) deﬁnite vs. indeﬁnite article formation in Wolof

sg pl sg pl sg pl
a. def xaj bi xaj yi muus mi muus yi till gi till yi
b. indf ab xaj ay xaj am muus ay muus ag till ay till
‘dog’ ‘cat’ ‘jackal’

The indefinite article, as shown above in (17a), is the only determiner in which the
class marker follows the class-invariable part, thus becoming the final consonant.
As exemplified in (26), and schematized in (27a), in the regular case there is
a correspondence between this final consonant and the initial one occurring as a
class marker in other determiners. In addition, however, as illustrated in (27b–c),
there are two irregular patterns:

(27) a. regular determiner   (= sg./pl. pairings of NCs)

paradigm
sg pl bi/yi, ki/yi, gi/yi, mi/yi, si/yi, wi/yi
def C1-i C2-i
indf a-C1 a-C2

b. irregular determiner agreement classes (= sg./pl. pairings of NCs):

paradigm
sg pl ki/ñi,gi/ñi, mi/ñi,si/ñi, bi/ñi, bi/ji, bi/si
def C1-i C2-i
indf a-C1 a-y

c. defective determiner agreement classes (= sg./pl. pairings of NCs):

paradigm
sg pl ji/yi, ji/ñi, li/yi, li/ñi
def C1-i C2-i
indf * a-y

Paradigm (27b) shows a deviation from the regular formation by which the
class-marking consonant yields to y- in the indeﬁnite plural, while in (27c) the
indeﬁnite article paradigm is defective, lacking the singular form. In the available
literature, the occurrence of ay instead of expected a-C₁ is usually recognized for
ñi plurals, seen in (3) above:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     157

(28) a-y/*a-ñ nit/góor/jigéen/mag/ndaw/gan

-. person/man/woman/adult/youngster/guest
‘(some) persons/men/women/adults/youngsters/guests’

In addition to the pairings involving ñi plurals, however, the list in (27b) also
includes the two ‘new’ plural NCs in (20b). In fact, as illustrated in (29b)–(30b),
plural ji and si both select the default class marker -y in the indeﬁnite article, on a
par with ñi, while indeﬁnite plural *aj and *as do not occur:

(29) a. a-b jeeg/janq ñów na/*na-ñu

-. lady/little girl arrive .3/-3
‘a lady/little girl has arrived’
b. a-y/*a-j jeeg/janq ñów na-ñu/*na
-. lady/little girl arrive -3/.3
‘some ladies/little girls have arrived’

(30) a. a-b sàmm/Séeréer/sëriñ ñów na/*na-ñu

-. shepherd/Seereer/healer arrive .3/-3
‘a shepherd/Seereer/healer has arrived’
b. a-y/*a-s sàmm/Séeréer/sëriñ ñów na-ñu/*na
-. shepherd/Seereer/healer arrive -3/.3
‘some shepherds/Seereers/healers have arrived’

This provides a further argument against the traditional analysis of Wolof

NCs in (20a), because singular si and singular ji, the classes with which our
two ‘new’ plural classes were earlier confused, do not behave in the same
way. Rather, the singular si class forms the indefinite article regularly, as seen
in (31a), while singular ji, as shown in (31b), exemplifies the other type of
irregularity observed in the paradigms of the indefinite article, that is, defect-
iveness ((27c)):

(31) a. a-s soxna /gor ñów na/*na-ñu

-. honourable lady /free man arrive .3/-3
‘an honourable lady/a free man has arrived’
b. *a-j/*a-y jigéen/yaay/jabar ñów na/*na-ñu
-. woman/mother/wife arrive .3/-3
intended: ‘a woman/mother/wife has arrived’

In fact, it is not possible at all to form the indeﬁnite article from this class.
In order to convey the same meaning, one has to have recourse to suppletion
and use instead the (regularly class-marked) form of the numeral C-enn ‘one’,
as shown in (32a). This defectiveness also concerns the li class, or the li/yi and
li/ñi pairings listed in (27c), as exempliﬁed in (32b) by ndab and ndaw,
respectively:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

158  

(32) a. j-enn/a-j/a-b jigéen/yaay/jabar ñów na/*na-ñu

.-one/-. woman/mother arrive .3/-3
‘a/one woman/mother/wife has arrived’
b. l-enn/*a-l/*a-b ndab/ndaw
.-one/-. dish/youngster
‘one/a dish youngster’

The scheme in (33) recapitulates the different kinds of irregularity found in the
paradigm of the indeﬁnite article ((33d)), compared with two regular function words,
highlighting (in boldface) the differences between singular and plural ji and si:¹⁹

(33) Irregularity in the indeﬁnite article in Wolof

 
a. class marker b- g- k- j- l- m- s- w- y- ñ- j- s-
b.   article bi gi ki ji li mi si wi yi ñi ji si
c. numeral ‘one’ benn genn kenn jenn lenn menn senn wenn yenn ñenn jenn senn
d.  article ab ag ak * * am as aw ay ay ay ay

To conclude, not only change in noun inflection but also change in agreement
target morphology has created new irregularities in Wolof, which add to com-
plexity in a way that had largely gone unnoticed under the traditional—but,
arguably, incorrect—view of Wolof NCs in (20a). This ‘local complexification’,
which yields a more realistic view of Wolof morphology and morphosyntax, can
be viewed as an ‘accident’ along a path in which the overall tendency is, for noun
morphology, from agglutinating towards isolating: not only are the inherited
prefixed NC markers long gone, but also the inflectional irregularities (stem
alternations) seen in (9), partly arisen from them, are on their way to disappear-
ing.²⁰ In other areas of inflectional morphology, while the verb maintains its
agglutinating structure, pronominal and adnominal agreement targets either
stay agglutinative (cf., e.g., (33b–c)) or develop paradigmatic irregularities, as
seen for the indefinite article in (27b–c), of the kind linguists usually associate
with inflecting-fusional type morphology.²¹ Contrary to those in noun morph-
ology, which are in the process of vanishing, the irregularities in the indefinite
article are stable as long as the NC system is stable. This, however, is not anymore
the case in contemporary urban varieties, which leads us to the last section.

¹⁹ Pozdniakov & Robert (2015: 565) provide a similar scheme, without the two plural classes ji and
si, and marking a blank for both neutralization (occurrence of ay for ñ- plurals as well as for y- plurals)
and defectiveness (non-existence of forms for singular j- and l-).
²⁰ In this transitional stage, however, as argued while concluding section 6.5.2, variation between
two cell-mates in the plural adds to overall paradigm complexity.
²¹ That verb and noun inﬂection can differ, in this respect, within one and the same language, ‘and
develop diachronically in typologically different directions’ (Dressler 2005: 7) has been shown by much
work on morphological typology (see, e.g., Haspelmath 2009: 25).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     159

6.7 External explanatory factors for structural simpliﬁcation

Even once the local increase in complexity in noun and determiner morphology
addressed above has been recognized, it remains true that, on the whole, Wolof
morphology is both less rich and less complex than that of the closely related
Atlantic languages mentioned above (and, hence, of the reconstructible common
ancestor, under either of the alternative classiﬁcations in section 6.2). This
impoverishment/simpliﬁcation, resulting in a ‘restricted system’ (Pozdniakov
& Robert 2015), may be traced back to external factors. In fact, Wolof, a
vehicular non-native language for a substantial share of its users, is a typical
case of a language spoken in an ‘exoteric niche’ (in Lupyan & Dale’s 2010
terms) or in a ‘Type 2 community’, or ‘an extreme “generalized outsider com-
munity” ’ (in Kusters’ 2008: 14 terms). The literature on linguistic complexity
has addressed the consequences on morphology that are often observed
when the percentage of non-native speakers becomes substantial, concluding
that languages spoken in such communities are expected to simplify their
morphology:

we may conjecture that when a language splits, and one variety becomes more
like a Type 1, and the other like a Type 2 community, we expect that the latter
becomes simpler in its inﬂectional morphology. (Kusters 2008: 15)

As McWhorter (2007: 2) puts it, ‘that heavy second-language acquisition

decreases structural complexity is thoroughly intuitive to most linguists’ (see
also McWhorter, Chapter 10, this volume). On the contrary, a language spoken
in a tightly-knit local community by small numbers of speakers may be a
favourable setting (as argued by Trudgill 2004b, 2009) for better maintenance of
linguistic complexity. If one compares Wolof with Seereer, this seems to provide
an explanatory framework, as the latter has slightly more than one million
speakers in Senegal and Gambia, and its inflectional morphology remains sub-
stantially richer and more complex than Wolof’s (see (10)). However, this is far
from yielding a deterministic explanation, as one easily realizes considering that
Fula’s inflectional morphology, as seen in (11)–(12), remains both richer and
more complex than Wolof’s in spite of the language being spoken by over twenty-
two million spread over eighteen countries.
Nonetheless, there is a crucial sociolinguistic fact about Wolof, concerning
language attitude and prestige hierarchies, that may be invoked as a precondition
of the observed simplification. For this language, in fact, the (conservative)
linguistic norm as reflected in school grammars and dictionaries, which is often
associated elsewhere with the maintenance of complexity, does not go hand in
hand with linguistic and social prestige. Rather, in the Wolof speech community
speaking correctly is not prestigious, and this holds true both in rural, socially
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

160  

traditional areas and in urban ones. In traditional society—as shown by the

seminal study by Irvine (1978) and much subsequent work in sociolinguistics
(or the ethnography of speaking)—linguistic elaboration and correctness, in
keeping with the conservative norm, is associated with griots, low-caste language
specialists, regarded as socially inferior in comparison with the ‘géer (“nobles”—
farmers, administrators, religious leaders)’ (Irvine 1978: 39). Irvine’s noble
informants in general speak in what is considered a less accurate way, involving
differences at all levels—as listed by Irvine (2011: 43–5)—from prosody (e.g.,
ﬂustering style, as opposed to clear voice) to syntax (e.g., incomplete phrase
structure, false starts). Simplifying the noun-class system ﬁts into this picture,
through what Irvine (1978: 41) labels an ‘appropriate-error strategy’, which
crucially involves the generalization of the default class markers bi/yi.
The same tendency is observed in urban Wolof as well, as seen in (4c-d) (cf., e.g.,
Mc Laughlin 2001: 158). Here, the overall strategy to achieve linguistic prestige
differs from what is observed in traditional rural social contexts: it is particularly
language mixing and extensive borrowing, especially from French in Dakar, which
serves the purpose. But all in all, rural and urban society converge, as Irvine (2011:
63f) remarks, in determining higher prestige for ‘bad’, incorrect language:

le ‘mauvais’ wolof urbain a quelque chose en commun avec le ‘mauvais wolof ’

des hautes castes rurales. Dans les deux endroits, la ‘plus belle langue wolof ’ n’est
pas attribuée aux gens les plus hauts placés.
[‘bad’ urban Wolof has something in common with the ‘bad Wolof ’ of rural high
castes. In the two settings, the ‘most beautiful Wolof language’ is not attributed to
the highest-placed persons.]

Thus, that of Wolophones is not only a Type 2 community, with many non-native
speakers, but also a community in which native speakers, in both traditional and
urban contexts, tend to adopt themselves, qua prestigious, modes of linguistic
behaviour favouring simpliﬁcation, a fact that can be plausibly invoked as an
explanatory factor for the overall structural simpliﬁcation of morphology and
morphosyntax that Wolof has undergone, compared with its antecessor within the
Atlantic language family.

Acknowledgements

Thanks to the editors and two anonymous reviewers for comments and constructive
criticism on a previous draft, as well as to Cheikh Anta Babou for joint ﬁeldwork on
Wolof. Usual disclaimers apply.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

II
T H E CR O S S L I N G U I S T I C
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

7
Canonical complexity
Johanna Nichols

7.1 Introduction

Of the various ways of measuring linguistic complexity (see the Introduction to

this volume; and Sinnemäki 2011), this chapter focuses on what I will call
enumerative complexity (EC) and canonical complexity (CC). EC is also known
as taxonomic complexity (Miestamo et al. 2008), resources (Dahl 2004), economy
(Kusters 2008), the principle of fewer distinctions (Di Garbo & Miestamo in press;
defining non-complexity), inventory complexity (my previous work), and other
terms. It is based on assessing the number of elements in an inventory or values in
a system, for some domain or domains such as the number of phonemes, genders,
tenses, derivation types, alignments, word orders, etc. It has been widely used in
typological surveys, chiefly of phonological complexity (Shosted 2006, Hay &
Bauer 2007, Nichols 2009, Donohue & Nichols 2011; Bickel & Nichols 2013 for
inflectional complexity of verbs), but it has disadvantages. It is straightforward to
survey for well-defined and consistently described subsystems such as the phon-
eme inventory, but guaranteeing comparability of categories elsewhere can raise
problems. For example, is it meaningful to compare the sizes of case inventories
when a language with few or no cases probably uses adpositions to the same end?
Are the number of contrasting members of a (vertically arranged) paradigm and
the number of potentially co-occurring morphemes in a templatic structure both
inventories and to be compared in the same way? Importantly, EC is not the kind
of complexity that figures most interestingly in studies investigating correlations
between linguistic complexity and sociolinguistic history, notably Trudgill (2011)
and Dahl (2004); there it is non-transparency, not inventory sizes, that is relevant.
The other type used here is close to what is known as descriptive complexity or
Kolmogorov complexity: the amount of information required to describe a system.
This is a better measure and captures well the non-transparency relevant to
learnability and sociolinguistic effects, but it is problematic to measure and
compare. Canonicity¹ theory (Corbett 2007, 2013a, 2015, and others), though

¹ Henceforth I use that term to refer to the theory and its body of exemplar studies, since it is used in
the foundational literature, but canonicality when I need to nominalize the adjective canonical (since
only canonicality is possible in my English).

Johanna Nichols, Canonical complexity In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Johanna Nichols.
DOI: 10.1093/oso/9780198861287.003.0007
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

164  

not a complexity measure in itself, can be used as a good approximation to

descriptive complexity and is straightforwardly measurable and comparable
(Nichols 2019; see Audring 2017 for a similar approach). The theory aims at
improving definitions and technical understanding of linguistic notions. It defines
a logical space (for a linguistic concept or structure or system) by determining the
central, or ideal, position in that space for each dimension and the whole set of
dimensions, and kinds of departures from that ideal. An element is non-canonical
to the extent that it departs from the ideal. Essential to defining the ideal position
is the structuralist notion of biuniqueness, or ‘one form, one function’: any
departure from that ideal is non-canonical. Such departures decrease transparency
between function and form or underlying and surface, so the extent or number of
non-canonical patterns in a system can also be used as a measure of its non-
transparency. The literature of canonicity theory offers a good deal of work on
morphological paradigms, which makes it a straightforward matter to identify the
non-canonical elements in a paradigm The approach has the further advantage of
being well-grounded in morphological theory yet applicable on its own without
requiring adoption of an entire formal framework.
To avoid cumbersome terms like non-canonicality-based complexity or non-
canonicity-based complexity, I will use the simpler if less logical phrase CC.²
Measuring CC is straightforward in principle: define types of systems and sub-
systems so as to maximize crosslinguistic comparability, and count the number of
non-canonical patterns or elements found in each, for each language.
Both EC and CC are what I will call structural measures of complexity: ones
that are based on structural analysis and comparison. (Calculations using the
measures can of course vary from classic typological method to computational
method.) There are non-structural methods as well: for example, various kinds of
complexity can be recovered computationally from text and lexical corpora (e.g.
Bentz et al. 2017, using entropy in parallel corpora), or by measuring the differ-
ence in size between compressed and uncompressed copies of a corpus (Juola
1998; Ehret & Szmrecsanyi 2016). However, adequate corpora do not always exist,
and the computational know-how or resources required may not be within reach
of, say, a fieldworker or historical linguist who wants to attribute a complexity
level to one language or describe relative complexity among a few languages.
Furthermore, automatically extracted measures and variables are not constrained
to reflect best practices in linguistic analysis and comparison, a fact that reduces
their validity and could eventually cut linguistic analysis entirely out of defining
linguistic complexity, thereby cutting linguistics out of an important segment of

² Or perhaps it is logical. Canonicity theory is concerned with whether linguistic elements are
canonical or not, while the goal in this fragment of complexity theory is to describe types of complexity.
In that theory, presumably the ideal in a space of complexity is maximal complexity, so in that sense
‘CC’ is logical.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  165

Big Data work. Independently of those considerations, typology needs more than
one kind of complexity measure.
To address these various needs and possibilities, this chapter proposes a
method for measuring CC (section 7.2) and presents results of a survey showing
that CC yields results that are revealing and do not duplicate those from EC but
complement them to make a stronger combined measure (section 7.3).

7.2 Method

7.2.1 Samples

For the CC measure, I used a partly convenience and partly diversity-based

sample of 113 languages, seeking coverage of some families and areas, and fairly
good coverage of northern Eurasia and North America, plus thinner coverage of
the rest of the world. The southern lands (Africa, Australia-New Guinea-Oceania,
South America) are thinly covered, South Asia not at all, and Southeast Asia by
only two languages.³ In addition to coverage, sample languages were chosen for
comprehensiveness and quality of descriptions. The sample languages are listed in
Appendix 7.3.
For the EC survey I drew on the mostly diversity-based set of 226 languages that
has grown from Nichols (2009), using the 105 of those languages that are also
found in the CC sample.
The combined complexity measure is the sum of the other two, available for
only the 105 languages of the sample intersection.
Where comparisons of the two kinds of complexity are at issue, I used only the
105-language sample intersection. Those involving only CC use the full 113
languages. There are also some comparisons of families and areas, using subsets
of the sample.

7.2.2 Survey objects

This study addresses only morphological complexity and speciﬁcally inﬂectional

morphology. I surveyed a set of morphological typological variables across seven
inﬂectional categories and three lexical classes (or parts of speech, henceforth
POS)—nouns, independent pronouns, verbs—and counted the number of

³ The denser coverage of the northern hemisphere is intentional, as I planned to test some of the
geographical distributions hypothesized in section 7.3. The coverage of the southern hemisphere is
thinner than planned because the survey proved more labour-intensive than anticipated and could not
be fully completed as projected.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

166  

non-canonical patterns in inflectional paradigms for each category and each POS.
The set of categories is a sample chosen because they are generally well-understood
and well-described (including that grammars make it relatively straightforward to
determine whether the category is present or absent, and if present what its values
are). They are present in enough languages to make frequency comparisons mean-
ingful. This section describes, first, the inflectional categories surveyed, then the
variables. Survey data consists of: (1) a text report on each language that includes
any definitions of categories and variables required and discussion of any coding
decisions, plus sources used. These reports discuss but do not fully replicate the
information available in grammars. Sometimes they include scans of published
paradigms. (2) A database page for each language showing the number of non-
canonical patterns in each intersection of POS and category. Appendix 7.1 lists the
categories and variables, and Appendix 7.2 gives the sum of entries in each
intersection of categories and variables, across the whole sample. Appendix 7.3
lists the sample languages. The entire database will be included in some future
release of the Autotyp database (Bickel et al. 2017 is the current release).
The inflectional categories surveyed are:

• Case. Dependent marking of argument roles. Only the core roles of A, S, O,

G, and T, as well as Poss (possessor) were surveyed.
• Gender. Lexically specified agreement categories of nouns, usually covert on
the noun itself and necessarily made overt in agreement. Only noun gender is
surveyed, and not pronoun gender as in English he, she, it.
• Number. Only singular and plural were surveyed. For nouns, presence vs.
absence of number marking was entered, but the plural paradigms for any
inflectional categories of nouns (typically case, gender, possessive marking)
were not surveyed.
• Person. Only 1-2-3 singular inflectional paradigms were surveyed; for inde-
pendent pronouns, only first and second persons (singular and plural).
Inclusive and exclusive, where they exist, are both included. Person inflection
on nouns is possessive inflection; on verbs it is argument indexation.⁴ Where
independent personal pronouns have a generic pronominal base and mark
person only in the form of the regular inflectional person markers, person is
counted as an inflectional category. Examples from Ainu are in (1); the same
person prefixes are also verb indexes and possessive markers. In languages
like those of Europe, person in pronouns is a lexical category and does not
enter into this survey at all.

⁴ Indexation is deﬁned as in Nichols (1992: 48–9): marking on dependent or head of a category of

the other, involving copying of relevant grammatical features from one member to the other. It is
opposed to registration, which notes the presence of the other member and its type but does not copy
features. (Nichols 1992 described only indexation and registration of dependents on heads, but in fact
both can go either way: see Nichols & Lander in press.)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  167

(1) Ainu (isolate; Japan and formerly Sakhalin) independent pronouns.

(Shibatani 1990: 30–1; see Bugaeva 2012: 471 for slightly different forms
from Southern Hokkaido dialects.)
Singular Plural
1 ku-ani a-oka
2 e-ani eci-oka
3 Ø-ani Ø-oka

• Person-number. Person and number are so often co-exponential (portman-

teau or otherwise opaquely fused) in inflectional paradigms that person-
number was treated as a separate single category (see Appendix 7.2). Most
languages with possessive inflection of nouns signal both the number of the
possessor and the number of the possessed noun, using a dedicated plural
affix for the number of the noun and co-exponential person-number mark-
ing for possessor indexation. (Sometimes the dedicated plural affix is pro-
miscuous in the sense of Leer (1991), indicating plurality of either noun or
possessor or both.) If there is a separate, dedicated marker of possessor
number, however, that is entered separately as number.
• Classifier. Following Fedden & Corbett (2017), I use this term to comprise
numeral classifiers as well as what they argue are second gender categories in
languages like Mian (Ok family, New Guinea) and several Amazonian
languages (e.g., Yagua, Yaguan family) but are called classifiers by tradition
or for convenience, since it is useful to distinguish classifiers from the other,
more canonical, gender category. For the present survey, the decision
whether an inflectional category is gender or classifier is less important
than ensuring that it is included somewhere; what figures at this early stage
is the total non-canonical points per language, not their distribution across
categories, POS, and variables.
Classifiers were counted if a classifier is (more or less) obligatory for
many or all nouns in contexts of quantification, and possible for most
numerals. More precisely, I consider occurrence with numeral classifiers
to be an inflectional property of nouns, while the number and predictabil-
ity of classifiers are properties of classifiers (and not surveyed here since
they are not among the three lexical classes targeted here).⁵ For most
classifier systems, the contexts of usage extend beyond phrases containing
numerals, and while some are primarily numeral classifier systems, for
others (particularly languages of Amazonia, e.g. Kwaza: Van der Voort
2006) the contexts considerably exceed those of prototypical numeral
classifiers. Only for Mian (Fedden 2011) have I treated what are called

⁵ Numeral classiﬁer systems often recruit regular nouns to the system, and in their capacity as
regular lexical nouns they are of course covered in this survey.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

168  

classifiers as a gender category and added entries for their number and
unpredictability. Thus the six classifiers of Mian are entered as a noun
category with six inherent values, all unpredictable, while the 150 or more
classifiers of Kwaza, like, for example, the ~50 of Mandarin, do not appear
in this database and do not contribute to the EC of noun inflection or to
non-canonicality in the form of unpredictability.
• Tense/aspect/mood (TAM). The survey seeks the most basic synthetic
present-like and aorist-like tense categories. (In terms of aspect these tend
to be imperfective and perfective respectively.) If one or both is absent, as it
is, for example, in Mawng (Iwaidjan, northern Australia), which has only a
future/non-future tense opposition, the closest basic tense opposition is used
(future and non-future in Mawng). If the language has no inflectional tense
(as Mandarin does not), basic imperfective and perfective are used if the
language has inflectional aspect; otherwise there is no entry for the TAM
category.
• General. Some of the variables are inherently difficult to ascribe to some
particular category. Examples are the numbers of stems per lexeme and stem
classes per language. They are entered as general rather than as pertaining to
paradigms of particular categories (usually with a comment in the data
report). Again, for the present survey the exact placement of an entry is
less important than ensuring that it is included somewhere and contributes
to the total.

For each language the database records for each category whether it is present
or absent (a yes/no, or 1/0, classification).
The variables surveyed are the following.⁶ For all of them the number, or the
presence vs. absence, of non-canonical patterns was entered for each of the survey
categories just listed. For what was counted as non-canonical see below. For every
variable and every category and value, irregular words, lexically specifiable excep-
tions, and small closed classes are disregarded. Sizable minority classes, and classes
that are open or specifiable as a class, are counted. For example, if possessive
inflection applies only to kin terms, or even only to consanguineal kin terms, this
is counted as a class.

[1] Inflectional classes. In the terms of Bickel & Nichols (2007) these are instances
of formative flexivity: classes distinguished by different sets of inflectional mor-
phemes (e.g., suffixes). Not all grammars explicitly account for the number of
declension or conjugation classes, and those that do often mix together, or at least
fail to distinguish, formative flexivity and stem flexivity (variable [5] below), so

⁶ Variables are numbered in square brackets and examples in ordinary parentheses.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  169

deciding whether a class involves formative ﬂexivity or stem ﬂexivity often

requires analysis and justification (laid out in data reports).
[2] Unpredictability of any inflectional classes. Sometimes the inflectional class
of a noun is predictable from semantics, gender, phonology, or some other
property, but often it is not. For example, the declension classes of conservative
Indo-European languages like Latin or Russian are not predictable overall (though
in each language there are some clusters of semantically similar words in each
class). For Russian, it is possible to predict gender from declension class with fair
accuracy, but not vice versa (Corbett 1982); close analyses like Corbett’s are not
usually available, so my practice was to regard classes as unpredictable unless the
grammar claimed otherwise and gave good grounds for the claim.
The number of inflectional classes and the number of unpredictable ones are
matters of EC, not CC. They are removed from some of the calculations here as
indicated below.
[3] Inherent categories. This applies primarily to gender classes of nouns,
which are marked by agreement on other words and are usually covert on the
noun. (Overt indication of gender on the noun itself does occur in a number of
languages, e.g. Bantu, or to some extent Nakh-Daghestanian. In such languages
gender was recorded as an inflectional category of nouns and its number of
inflectional classes and their unpredictability were recorded.) Where classifiers
are lexically specified for the noun (as is usually said to be the case for Mandarin,
e.g. Chao 1968: 589–93), they are also coded as inherent. The alternative is
relatively flexible choice of classifiers per noun depending on semantic properties.
[4] Unpredictability of inherent categories. Gender classes can be predictable
for some or all genders. Here the question asked is how many of the gender classes
are predictable (largely or entirely, i.e. for most or all of their nouns). Predictability
is sometimes described as phonological, but usually as semantic. Every language
with gender in the sample, and nearly every language on earth with gender, has
predictable gender for nouns referring to humans, which are usually masculine or
feminine depending on the sex of the referent but sometimes belong to a general
human category.⁷ What is counted here is not predictability but unpredictability,
since that is non-canonical.
Counting the number of unpredictable classes amounts to EC, and it also
contributes to a rapidly inflating scale.⁸ Instead of counting classes I have used
the following values for applicability:

⁷ I know of only one language where human nouns have arbitrary gender: Uduk (Koman, Africa;
Killian 2015), where the cutoff point for gender predictability is set even higher on the animacy
hierarchy: it is predictable for ﬁrst and second person pronouns but not for human nouns.
⁸ Cole (1967) describes most of the non-human gender classes of the Bantu language Luganda
(which number fourteen singular-plural concord pairs by his count) as ‘miscellaneous’ (these number
ten), a large number for one cell of this survey. Most Bantu grammars describe the classes as having a
semantic basis with some unpredictable members, but in languages with only one description the
decision on predictability has to be taken at face value.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

170  

(2) Applicability thresholds

0 Applies to none or very few of the words in the class (here, nouns in the
gender class).
1 Applies to an appreciable minority of the words in the class, and/or the
set of words is open or deﬁnable rather than requiring enumeration.
2 Applies to all or most of the words in the class.

and the following semantic criteria:

Human nouns. Unpredictability of their gender by the above values.

Non-human nouns. Unpredictability of their gender by the above values.
Human gender cross. Non-human nouns are found in human gender classes or
vice versa.

As a further note, most languages with a sex-based gender opposition for human
nouns also apply it to a few non-human animate nouns, typically large and
important domesticates. This kind of individual lexical exception falls under
value 0 of the applicability scale.
Table 7.1 shows a few languages and how they are treated in this classiﬁcation.
Languages with a zero score have no unpredictable gender classes, either because
their gender is entirely predictable (Avar) or because they have no gender, either
of nouns (English) or of pronouns (Finnish).

[5] Number of stems per lexeme. This is what Bickel & Nichols identify as stem
flexivity: declension or conjugation classes based on changes in the stem, such as
ablaut, extensions, or allomorphy conditioned by the survey categories. For example,
in Nakh-Daghestanian languages, many or most nouns have distinct nominative and
oblique stems in the singular, with the oblique stem formed by adding an extension
suffix (Kibrik 1991, 2003). This is coded as two stems per lexeme. In English and
other Germanic languages, the sizable but minority class of strong verbs has different
stems, marked by ablaut, in the two survey tense categories (English sits, sat); this is
also two stems per lexeme. A word or class is counted if it involves all, most, or a
sizable or open subset of the relevant words, following the thresholds in (2).
[6] Number of stem classes per language. The Nakh-Daghestanian languages
with extensions in oblique stems mostly have two stems per lexeme, but the
number of oblique extension suffixes ranges from one to over a dozen in different
languages. This, plus the (usually minority) class of nouns with a single stem, is the
total number of stem classes per language. Following the criteria in (2), the
number entered in the database is the number of such classes that are sizable,
productive, and/or open.
[7] Unpredictability of those stem classes (per language), by the same criteria
as for [2] and [4] above.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  171

Table 7.1. Gender unpredictability for some example languages

IE Ingush Avar Bantu Nama BGW Uduk* English* Finnish

Human: 0 0 0 0 0 0 2 0 0
Non-human: 2 2 0 1 2 1? 2 0 0
Cross: 2 1 0 0 2 0 2* 0 0
Total: 4 2 0 1 4 1 6 0 0

Notes: Languages: IE: Generic conservative Indo-European (e.g., Latin, Russian). Three genders:
masculine (M), feminine (F), neuter (N). The neuter gender contains relatively few nouns, so most
non-human nouns are M or F, arbitrarily classified. Ingush: Nakh-Daghestanian (Caucasus). There is a
dedicated gender for human males, a gender containing human females and some inanimates (though
if the survey counted singular-plural gender pairings these would be different genders as plurals have
different genders for human females and non-humans), and two non-human genders with arbitrary
membership. Avar: Nakh-Daghestanian (Caucasus). There are three genders with total semantic
predictability: M (human males), F (human females), N (all else). Bantu: Subbranch of Benue-Congo
(Africa). Generic entry applicable to most Bantu languages including Luganda in this survey. There is a
dedicated human gender and a number of non-human genders (the number varies among languages)
which most descriptions present as having a semantic core or prototype plus a limited number of
arbitrary members. Usually there are also a few dedicated genders for such things as non-finites or
particular deverbal derived nouns. Nama (Khoekhoe): There are two genders, M and F, containing all
human males and all human females respectively, and other nouns are arbitrarily divided between
M and F. BGW (Bininj Gun-Wok; Gunwingguan, northern Australia): M and F genders contain all
human nouns plus some arbitrary members. The other genders also have a semantic core and some
arbitrary members. Uduk (Koman; Africa): Two genders; all nouns arbitrarily classified; first and
second person pronouns have predictable gender (all have gender 2). English: No noun gender. Finnish:
No gender of either nouns or pronouns.
* Not in sample. For Uduk, see footnote 7 above in text.

[8] Arguments indexed. The number of core arguments indexed on the verb,
counted for the verb type with the most core arguments. The maximum number
of core arguments possible is three (A, G, and T), but not all languages have
ditransitives, and for those that do not the maximum is two. Arguments indexed
are counted only for simple clauses without valence-related derivations such as
causatives or applicatives.
[9] Co-exponence, that is, portmanteau, cumulative, or otherwise opaquely
fused marking of categories. Examples are the gender-number-case sufﬁxes of
nouns and adjectives in conservative Indo-European languages. Co-exponence
violates the one-form-one-function tenet of canonicality, as one form has three
functions (marking gender, number, and case). A language is coded as having co-
exponence if all, most, or a sizable minority of its words in the relevant categories
(e.g., nouns and their case paradigms) have co-exponent markers; it is so coded for
all of the categories involved (e.g., for Indo-European, gender, number, and case).
[10] Syncretisms: identical formatives in two or more categories that are non-
identical elsewhere in the language. Consider the German articles in (3):
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

172  

(3) Deﬁnite articles in German (syncretism patterns numbered with subscripts)

M F N Plural
Nominative der die1 das1 die1
Accusative den die1 das1 die1
Dative dem der2 dem den
Genitive des der2 des der

What is counted is not individual syncretic endings or words but patterns of

syncretism. In the German examples, feminine, neuter, and plural paradigms
display the same pattern of nominative-accusative syncretism; dative-genitive of
feminines is another. German has two syncretism patterns here. In German, case
and gender are marked on determiners, of which the articles are the most
frequent. Where categories are marked on articles but not the nouns themselves,
they are still coded as noun categories, though also as wordhood discrepancies
(variable [13] below).
The database lists the number of syncretism patterns per category, but the
counts and totals in section 7.3 below use only presence vs. absence of syncretism
per category, as explained under variable [16] below.⁹
[11] Allomorphy. Deﬁned elsewhere in linguistics as two different forms for a
single morpheme or paradigmatic cell, conditioned grammatically or lexically but
not phonologically; phonological conditioning is not counted here since it can be
considered automatic. An example is nouns of masculine gender in most Slavic
languages, which have different accusative endings for animate and inanimate
nouns. For example, three cases of Russian masculine nouns:

(4) ‘brother’ ‘table’

Nominative brat-Ø stol-Ø
Accusative brat-a stol-Ø
Genitive brat-a stol-a

There is one allomorphy here in noun case inﬂection (accusative -a vs. -Ø), and
also two patterns of case syncretism.¹⁰

⁹ Syncretism is clearly non-canonical (Corbett 2013a, 2007, and other works), as it makes for non-
biuniqueness, but reviewers and audience members often object that syncretism does not increase the
amount of information required to describe a language. This shows that canonical and Kolmogorov
complexity are not identical; it is the only respect I am aware of in which they are different. I believe the
difference arises because Kolmogorov complexity is concerned only with the information required to
describe the text as string alone and not the full text including its message. For the message even at the
minimal level of determining which case is intended as in (3), resolving syncretism requires bringing in
additional information.
¹⁰ There are debates in the Slavistic literature as to whether animacy is an additional gender
category, or for that matter a subgender or supergender. It is also sometimes called a case split or a
gender split, but I have not tried to distinguish allomorphy from splitting.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  173

I did not encounter examples where it was difﬁcult to decide whether some-
thing was allomorphy (within one category) or a syncretism (in paradigms that do
not have that allomorphy), but there may be such cases. If so, the important thing
is to enter it somewhere, for this survey in which total numbers of non-canonical
points are compared.
[12] Position discrepancies. In some languages the forms of a single category are
distributed between two different positions, e.g. Pazar Laz (Kartvelian, Turkey;
Öztürk & Pöchtrager 2011: 485) subject person agreement in verbs (present tense):

(5) Pazar Laz subject agreement morphemes

1 v-/p-/p’-/b-
2 Ø-
3 -s

First person is a prefix, second person zero (presented as a prefix because the
object prefixes that compete hierarchically for the same slot have an overt 2
object form), and third person a suffix. Discrepant position is analogous to
different forms for one category (albeit the forms are slots rather than mor-
phemes), hence non-canonical.
[13] Category discrepancies. I used this variable to account for infrequent
examples like verb inflection in many Slavic languages, which have agreement
for person-number in the non-past tense and gender-number in the past tense.
The survey category is TAM rather than just one tense; if there were only one
survey tense there would be no discrepancy. In these languages verbs were coded
as having the categories of person-number, gender, and TAM, with a category
discrepancy for TAM.
[14] Wordhood discrepancies. These are discrepancies between such statuses as
independent word, clitic, affix, and non-linear marking such as ablaut, within a
single paradigm. For example, in Slovene, singular pronouns have both tonic and
clitic forms but plural ones have no clitic forms; in Bulgarian, Romanian, and
Ossetic, subject indexation is suffixal while object indexation uses clitics.
Languages like German or Mian (Ok, New Guinea) have noun gender marked by
articles; this is a wordhood violation for gender not as an inherent category but as an
agreement category (in languages without the wordhood violation it is usually
marked affixally, as with the noun class prefixes of nouns in Bantu languages).
[15] Partial marking: Only some of the otherwise eligible words inflect for the
category. An example is gender in Nakh-Daghestanian languages, which is gen-
erally marked by prefixation or initial consonant mutation of the verb, but not for
all verbs (the verb roots that do take it range in different languages from about
30% to the great majority of verbs). Another example is number: probably all
languages that have number inflection on nouns apply it only to some nouns.
Most common is drawing the line between count and mass nouns, with mass
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

174  

nouns taking no number marking, but it is also fairly common to find the line
drawn between animate and inanimate or human and non-human nouns. I did
not code number as a partial category for any of these: for count vs. mass nouns it
is clearly due to semantics, and for the distinctions higher up there is a case to be
made that those are semantically akin to the count/mass distinction. Some
languages have only a handful of nouns that make number distinctions, for
example Yurok (Algic, California), where only nine nouns, not representing a
coherent semantic group, form plurals (Robins 1958: 23); these are my only cases
of non-semantic, purely lexically specified, plural marking, but the nouns involved
are too few in number to count in this survey.
Partial marking is not common in the survey languages; Nakh-Daghestanian
gender contributes most of the examples.
[16] Multiple marking. Even rarer among the survey languages is marking of
an inflectional category more than once in a wordform. For example, Bardi
(Nyulnyulan, Australia) marks person-number on verbs with person enclitics,
and can add an optional additional person-number enclitic to mark plurality of
the object; this amounts to marking person twice. Yurok has A and O agreement
in person-number, and in some verb classes and categories one-argument verbs
fill both slots and thereby mark subject person-number twice (Robins 1958: 69ff).
[17] Other. This entry column handles the occasional uncertainty in classifi-
cation, but primarily contains calculations of the number of categories or dimen-
sions involved in co-exponential marking. Noun inflectional paradigms of Indo-
European languages preserving the original design of co-exponential gender-
number-case inflection abound in such non-canonical phenomena as syncretisms,
unpredictable declension classes, unpredictable gender classification, human
crossgender, and others. (For some illustrations, see Nichols 2019.) These give
them extremely high CC values if the number of syncretism patterns is counted,
and this skews comparisons. Therefore I coded not the number of such patterns
but the number of categories involved in them, treating those as dimensions of
freedom within which syncretism might appear. Similarly, for complex systems of
verb argument indexation where person-number and role (A, O) are marked by
co-exponential and often opaque markers, I counted the number of categories
involved (usually person-number and role, sometimes also gender).¹¹ This pro-
cedure levels out the possible complexity ranges of case-inflecting languages like
Indo-European and complex head-marking languages like many in the Americas.
But even with the obvious heavy contributors neutralized, section 7.3 shows that
the languages of western Eurasia still reach overall higher CC levels than even the
polysynthetic languages of the Americas. I judge this high level to be non-
artifactual as measured, implying less opacity for polysynthetic inflection than

¹¹ Recognizing role as involved in the categories is also a way of accounting for the mix of direct and
hierarchical marking of person in such systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  175

for co-exponential case inﬂection—and indeed the notable complexity of poly-

synthetic languages lies not so much in inflectional non-transparency as in their
templatic ordering, mix of lexical and inflectional categories, and sheer number of
grammatical categories, and not primarily in the transparency or non-
transparency of their core argument marking. This is also what the comparison
of CC and EC levels in section 7.3 below implies: polysynthetic languages have
more categories and slots, not more opacity.¹²
The variables are summarized in Appendix 7.1.¹³
CC is what I call a composite variable: one that can be stated as a single
typological variable (in this case, the CC value) but that composite consists of a
number of separately defined variables. These subvariables are not a random set of
variables and not just a thematically related set but the total set of grammatical
phenomena that cover the categories and POS and each of which defines some
aspect of non-canonicality. They are not drawn from an existing database, and in
fact only one of them—the number of arguments indexed—is a variable presently
in the Autotyp database.

7.3 Results

Appendix 7.4 is a graphic display of the levels of CC in the sample languages,

separately for the CC total involving all datapoints and the one omitting those
datapoints that enumerate categories (and are therefore a leak of EC into the CC
count). They are similar except in absolute values. On either one, the sample
languages can be described as spanning the complexity range from Mandarin
(lowest) to Skolt Saami (highest).
The rest of this section tries out CC by comparing how well CC and EC perform
in tests for various kinds of correlations.

7.3.1 CC and enumerative complexity

There is no correlation between CC and EC (linear correlation coefﬁcient -0.023; p

= 0.819, Spearman’s rank correlation test, two-tailed). This means that they can be
used as independent typological variables.

¹² Differential complexity of noun vs. verb inﬂection and head vs. dependent marking, and
measuring the complexity of hierarchical patterns and polysynthetic structure, will be covered in a
separate paper. At that point the dimensions of co-exponential marking will be given a term and a
separate dedicated variable.
¹³ The variables used for EC are much as deﬁned in Nichols (2009). Publication of an updated
version of that list is planned for the next year or two.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

176  

7.3.2 Complexity and gender

Nichols (2019) found that there was no correlation between EC and the
presence of gender in a language, concluding that gender and the well-known
complexity of many gender systems are not simply byproducts of overall
complex morphology. I replicated that study on the smaller and different
language set used here, and using a correlation test, with the same result:
there is no correlation between EC and presence of gender. For CC, there is a
slight positive correlation but it is far from signiﬁcant (correlation coefﬁcient
0.089, p = 0.233).¹⁴

7.3.3 Geography: continents and areas

I calculated the mean CC for a number of areas and families, and asked whether
the range of mean 1 standard deviation for each area overlapped with others,
using the breakdowns in Table 7.2. Ranges for local areas and families are in
Table 7.2. Figure 7.1 gives a graphic display.
Non-overlap of the ranges means signiﬁcantly different populations.
Macrocontinents and continents overlap each other considerably, which means that
the largest groups all represent the same population. Of the local areas, the Circum-
Baltic has a very large standard deviation, that is, very little areality, and overlaps

Table 7.2. Areal and family breakdown

Macrocontinents: Africa, Eurasia, Australasia (Australia, New Guinea, Oceania), Americas

Selected continents: Western Eurasia (to the Urals), North Asia (Siberia and northern
Central Asia), North America, Central and South America
Local areas: Balkan, Caucasus, Circum-Baltic, North Inner Asia (non-Paciﬁc Siberia and
northern Central Asia), North Paciﬁc Rim (coastal and near-coastal from Japan to northern
California)
Families: Balto-Slavic, Uralic, Nakh-Daghestanian, Tungusic, Uto-Aztecan

Notes: Figure 7.1 shows the mean CC 1 standard deviation for all groups. Northern continents
(Eurasia, North America), the Caucasus, and the Uralic and Nakh-Daghestanian families are well-
sampled; other areas and families are compiled opportunistically from languages in the sample and are
less well covered.

¹⁴ For these calculations, to avoid circularity the points contributed by gender were subtracted from
the total complexity. (If that is not done, CC yields a highly signiﬁcant but spurious correlation. EC
does not, because the contribution of gender to its total is much less than for CC.) For CC I use the two-
tailed value since I had no advance expectation about whether or how CC might correlate with gender.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  177

CC: Macrocontinents CC: Continents

60.0 60.0

50.0 50.0

40.0 40.0

30.0 30.0

20.0 20.0

10.0 10.0

0.0 0.0
1 2 3 4 1 2 3 4

Africa Eurasia Australasia Americas W. Eurasia N. Asia N. America C-S America

CC: Areas CC: Families

60.0 60.0

50.0 50.0

40.0 40.0

30.0 30.0

20.0 20.0

10.0 10.0

0.0 0.0
1 2 3 4 5 1 2 3 4 5

Balkan Caucasus Circum- N. Inner N. Pacific Balto-Slavic Uralic Nakh- Tungusic Uto-
Baltic Asia Rim Daghestanian Aztecan

Figure 7.1. Mean CC 1 standard deviation for three areal breakdowns and selected
families
Notes: Groups are deﬁned in Table 7.2. The mean and range for the entire sample are very similar to
those for Africa.

most others. The Caucasus has a relatively large standard deviation (unsurprisingly,
as its languages range from the fairly simple Lezgi to the very complex Ingush and
Khinalug), and its status as an area is debated (con: Tuite 1999, pro: Chirikba 2008;
I side with Tuite). The other three are well-known areas and have small standard
deviations and little or no overlap. Mean complexity levels differ considerably
among the areas, suggesting that regression to some neutral complexity level is
not a consequence of areality.
The five families show relatively little overlap. Uralic, one of the older and more
widely distributed families and the most thoroughly surveyed here, has a large
standard deviation. The others have clearer family profiles.
Overall, then, continents and macrocontinents are not greatly different from
one another or from world totals while local areas and families are more discrete
from each other and for the most part internally fairly consistent in their com-
plexity levels. These figures are very preliminary; in particular, standard deviations
will probably shrink as the sample adds more members per area and family,
reducing overlaps.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

178  

7.3.4 Large-scale geography

Since a number of typological variables form worldwide east-to-west clines in the

northern latitudes (low in Europe, high in eastern North America, or vice versa;
Nichols 2017), I tested whether CC and EC have such a distribution. Figure 7.2
plots complexity (CC or EC) against longitude in a series of graphs. (Longitude is
universal longitude, not split into east and west but continuous from 0 to 360 .)
The plots all have the same design: the vertical scale is the number of CC or EC
points and the horizontal scale is longitude, running from west to east as in
Figure 7.1. (The plot begins 10 west of Greenwich so that westernmost Europe
and Africa will be counted with those continents and not with the Americas.)
A worldwide cline will show up as a pronounced overall upward or downward
slope to the pattern of dots. Each graph has a trendline showing slope, which can
be regarded as indicating the approximate magnitude of difference between west
and east. (The trendline is calculated on the rectangular plot used here, i.e. on a
flat-earth model with parallel longitude lines, so for the real earth it has no precise
meaning. The visible differences between slopes in different plots do, however,
make for a useful comparison that may be graphically clearer than the raw pattern
of dots. Statistical significance is not calculated on the plot but on the actual
ranked longitude values and does not have the flat-earth problem.) Figure 7.2(a)
shows CC values running much higher in the west (the left side) than in the east
(the right side), and there is a pronounced though not steep downward slope.
Figure 7.2(b) plots only the languages in the northern continents;¹⁵ the slope is
similar. For both the correlation of CC with longitude is highly significant.
Figure 7.2(c) plots only the southern languages; the pattern is much more dis-
persed and the slope noticeably less steep, and there is no significant correlation.
The interpretation is that (as with several other variables, surveyed in Nichols
2017) there is a worldwide west-to-east gradient, in this case with higher values in
the west and lower values in the east, and it is stronger in the northern continents
than in the south.¹⁶
Due to the sample structure and the composition of the western Eurasian
linguistic population, much of the strength of the CC correlation comes from
Indo-European languages. To counter their impact, I tested the sample with the
four outliers at the upper left of Figure 7.2(a) removed (three are Slavic languages:
Russian, Sorbian, Slovene; but highest of all is Skolt Saami, a Uralic language).
Impact on the slope and significance was negligible.

¹⁵ Northern continents are Eurasia and North America. Southern ones are Africa, Australia-New
Guinea, and Central and South America.
¹⁶ In Figures 7.2(a)–(b), what appear to be dense vertical stacks of dots at some places are regions
that are densely sampled and/or have high linguistic diversity at a similar longitude: at left, at about 45 ,
the Caucasus; at right, at about 230 , the Paciﬁc coast of North America.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  179

CC x longitude: Whole sample (n= 113)

70
60
50
40
30
20
10
0 p = 0.00001
–10 40 90 140 190 240 290

CC x longitude: Northern continents (n = 82)

70
60
50
40
30
20
10
0 p = 0.00002
–10 40 90 140 190 240 290

CC x longitude: Southern continents (n = 31)

50
45
40
35
30
25
20
15
10
5
0 p = 0.104 (n.s.)
–10 40 90 140 190 240 290

Figure 7.2. Complexity x longitude

Notes: Longitude (horizontal axis) runs from the Atlantic coast of Europe and West Africa on the left to
the Atlantic coast of North and South America on the right: (a) CC x longitude, all languages; (b)
northern continents; (c) southern continents. EC shows a highly signiﬁcant correlation in the opposite
direction, with lower values in Europe and Africa and higher values in the Americas (p = 0.0011,
conﬁrming what was reported, using a different sample and values, in Nichols 2009).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

180  

A map of languages and their CC levels (to be continuously expanded as work on

this project proceeds) is at https://lingconlab.github.io/opacity_Johanna/index.html

7.3.5 Sociolinguistics

Dahl (2004) and Trudgill (2011) show that what Trudgill calls sociolinguistic isolation
tends to allow languages to grow more complex over time, while sociolinguistically
expansive languages tend to simplify. Sociolinguistic isolation means that a language
absorbs little or no immigrant or language shifting population, so that nothing hinders
the further growth of complexity. An expansive language (this is not Trudgill’s term;
I take it from Janhunen 2008) absorbs appreciable numbers of adult L2 learners, and
their inﬂuence tends to simplify the language. This section describes the four language
groups in this chapter’s sample for which enough is known of the history of expansion
and non-expansion to permit predictions about relative complexity levels. The groups
and the complexity levels are listed in Table 7.3.

• Altitude in the Caucasus. In mountain ranges with a central crest, languages

generally spread uphill from the economically more important lowlands to more
isolated highland communities, which are dependent on the lowlands for trade,
commerce, and winter pastures (Nichols 2005, 2013). Highlanders know low-
land languages but rarely vice versa; this makes uphill language spread possible
and downhill spread unlikely, and likewise for diffusion of individual forms,
Table 7.3. Complexity values for four historical groups of languages

CC EC CC + EC

(a) Avar sphere Andic mean 28 10 38

Avar 36 10 46
Hinuq (Tsezic) 33 9 42
Hunzib (Tsezic) 49 11 60
Lak 42 10 52
Ic’ari Dargwa 45 15 60
Tsakhur (Lezgian) 41 9 50
(b) Samur sphere Lezgi 27 4 31
Udi 41 7 48
Archi 36 11 47
Tsakhur 41 9 50
(c) Slavic Russian 57 8 65
Lower Sorbian 56.5
Slovene 51 11 62
Bulgarian 43 11 54
(d) Uto-Aztecan Pipil 21 7 28
Hopi 36 11 47
Cupeño 39 12 51
Tümpisa Shoshone 27 12 39
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  181

categories, etc. That is, downhill languages are expansive while uphill ones are
more sociolinguistically isolated. Thus we expect higher complexity in highland
languages. Nichols (2013) finds a correlation between EC and altitude in the
Daghestanian branch of the Nakh-Daghestanian family, and Nichols (2016)
finds a stronger correlation using non-transparency of just gender marking.
Nichols & Bentz (2018) show that a correlation of altitude with complexity is a
significant worldwide tendency on several different measures. The sample used
here is smaller but yields similar results. Both CC and EC correlate appreciably
with altitude, and combined CC+EC yields a notably strong correlation for the
small sample (Figure 7.3).

(a) CC x altitude in Daghestan

3000
Altitude (metres)

2000

1000

0
0 10 20 30 40 50
CC
(b) EC x altitude in Daghestan
3000
Altitude (metres)

2000

1000

0
0 10 20
CC
(c) Combined CC+EC x altitude in Daghestan
3000
Altitude (metres)

2000

1000

0
0 10 20 30 40 50
CC

Figure 7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three
complexity counts
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

182  

• Spreads and isolation in the Caucasus: The Avar sphere. The eastern
Caucasus is compactly settled by the 40+ descendants of the Daghestanian
branch of Nakh-Daghestanian; Daghestanian may be of about Indo-
European-like age. The eastern Caucasus has been inhabited by settled
food producers for some 8,000 years. For at least the last few millennia the
highland populations have followed an uncommon kind of transhumance:
the entire working-age male population leaves the highlands for the winter
half of the year, taking livestock to markets and winter pastures and usually
finding seasonal work or maintaining businesses in lowland cities. There is
what seems to have been a long-standing centre of language spread in the
northeastern Caucasus and foothills, dominated from at least c.1000 
by the Sarir Kingdom. The canyons of the Avar Koisu, Andi Koisu, and their
confluence in the Sulak were the avenues of trade and transhumant migra-
tion for most of Daghestan, and large markets formed in the Sulak lowlands.
The language spoken at and near the confluence—in recent historical times,
Avar—had major economic importance and was the language of work and
everyday life for half of the year for much of the male population of
Daghestan. This has led to contact effects among the languages of western
Daghestan, including a distinctive structural type marked among other
things by highly transparent gender systems, lack of verbal prefixation, and
of course many Avar loans.
Three episodes of uphill spreading can be traced in the Avar sphere (Nichols
in prep.): most recently Avar, earlier Andic, still earlier Tsezic. These three
make up one branch of Daghestanian, with this structure: [ Tsezic [ [Andic]
Avar ] ].¹⁷ Avars apparently became rulers in the Sarir Kingdom on its
conversion to Islam (at which point it became the Avar Khanate), and the
final battles for control between Andi and Avar took place only in the
seventeenth to eighteenth centuries (Aglarov 1988: 24). Avar has been an
expansive language, serving as lingua franca along the Andi Koisu for about
three centuries and along the Avar Koisu for probably somewhat longer; it
has spread well uphill and spilled over the crest to Georgia and Azerbaijan,
but patchily, with many non-Avar enclaves. Andic is probably about 1,500
years old, during most of which time it has been expansive and its daughters
have spread uphill; their settlement of the Andi Koisu is compact. Tsezic may
have separated some 3,000 years ago in an earlier uphill spread; Tsezic
languages are now at the uppermost highlands of both the Avar Koisu system
and the Andi Koisu. The Andic languages can be expected to show more
pronounced effects of spreading than Avar does. The western Tsezic languages
(Hinuq in this sample) have been under strong Andic and Avar influence;

¹⁷ Avar is one language, Andic a close-knit group of about ten, and Tsezic ﬁve more disparate.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  183

the eastern Tsezic languages (Hunzib in this sample) had less Andic contact
and held winter pastures not in the Avar-Andic lowlands but in Georgia to the
south. Consistent with this history, Hinuq has considerable Avar influence and
a very Andic-like grammar; Hunzib is markedly different, with southeastern
Daghestanian-like traits. At the edge of the Avar sphere, the isolate branch Lak
was not part of the Avar Khanate but used the same trade and transhumance
routes and shows Avar lexical and grammatical influence; isolated in a high-
land plateau, it has no known history of spreading. Beyond Lak are the Dargwa
languages, for which the Caspian coastal cities and trade routes were import-
ant, lessening Avar influence. To the south of Avar, languages of the Lezgian
branch are spoken along the southeast-flowing Samur and its tributaries, and
Tsakhur is at the high end of this line of communication and also at the high
end of the Koisu-Sulak line. The sample here includes representatives of most
of these stages. Thus we expect the descending order of spread effects along
and near the Andi Koisu and Avar Koisu systems shown in (6):

(6) Languages of the Avar sphere and their sociolinguistic histories

Andic (long expansive; decomplexification expected)
languages
Avar (recently expansive; some decomplexification expected)
Hinuq (early expansion, much subsequent Avar-Andic contact)
Hunzib (early expansion, less Avar-Andic contact)
Lak (isolated, but fairly large and unified)
Ic’ari Dargwa (isolated, fairly small)
Tsakhur (isolated, small; complexification expected)

Table 7.3(a) shows the complexity values. CC conforms very well to this
scale; the only non-conformities are Hinuq, which clusters with Andic as is
unsurprising, and Ic’ari (Dargwa), which belongs to the Caspian coastal
sphere. EC is not very informative. The combined total is again in good
conformity (unsurprisingly, as it adds the fairly uniform EC scores to the CC
scores). For the Avar sphere and its periphery, then, CC reflects the socio-
linguistics of spreading and isolation better than EC does, and the combined
measure differs little from the CC scores.
• The Samur sphere. The delta of the Samur River, which drains the southeast
Caucasus and flows into the Caspian Sea, is a highly productive agricultural
region and long a nexus of trade and tax collection along the East Caspian
commercial route. It is the second most important avenue (after the Sulak)
for transhumant migration. The Lezgian branch, an old and diversified
branch of Nakh-Daghestanian, originated in this vicinity and spread both
uphill and into the Alazani valley in eastern Georgia and the lower Kura
valley in northern Azerbaijan. The sample contains four Lezgian languages,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

184  

two in the highlands and two in the lowlands. Lezgi is a large, expansive,
and inter-ethnic language centred on the lower Samur and nearby. Udi,
which descends from a probably expansive inscriptional language of the
early to mid first millennium (Caucasian Albanian [Gippert et al. 2009] is
its ancestor), has since shrunk to three isolated enclaves in Azerbaijan and
Georgia. Archi, noted for its morphological quirks (Corbett 2013b; Bond
et al. 2016), and the complex Tsakhur are isolated at high ends of river
canyons and have no known history of spread (apart from reaching the
highlands in the first place, however that happened). The complexity
figures in Table 7.3(b) reflect this history well. Tsakhur has much higher
CC than the rest; Archi has higher EC; lowland Lezgi, with its known
history of expansion, is low on both counts. Udi is mixed, high on CC
and lower on EC, suggesting that CC complexifies faster than EC after the
end of expansion.
For both Caucasus surveys, EC picks out as most complex one language
that is isolated at a high end with connections in more than one direction
(Ic’ari Dargwa, Archi), CC appears to reflect spreading more than isolation,
and the combined total gives a workable unified complexity scale that
correlates reasonably well with altitude and isolation.
• Slavic. Of the four Slavic languages in the sample, Russian has a long
history of expansion and absorption of Baltic and Finnic populations;
Sorbian reflects the leading edge of the Proto-Slavic expansion (c. sixth
to ninth centuries) but has been sociolinguistically isolated and receding
since then (largely absorbed by the German expansion); Slovene remains
close to the homeland and has no known history of expansion other
than uphill spread into the Austrian and Slovene Alps; Bulgarian belongs
to the Balkan Sprachbund and has undergone drastic structural changes
as a result, including loss of cases and thereby of the case-number-
gender co-exponence that makes Slavic noun declension so complex.
Complexity levels (Table 7.3(c)) are not greatly different for the lan-
guages preserving case inflection, while Balkanized Bulgarian is much
less complex.
• Uto-Aztecan. The Uto-Aztecan family is probably 5,000 years old and has
undergone a gradual spread from a probably northern Mexican homeland
followed by two large recent spreads: in the south, ancestral Nahuatl spread
with the Aztec expansion and empire beginning in the thirteenth century,
and in the north the Numic branch spread rapidly from the Sierra Nevada
foothills across the Great Basin beginning in approximately the same time
frame (Fowler 1972; Miller 1983; Madsen & Rhode 1994; Hill 2001, 2010;
Merrill 2012). The languages in the sample, south to north, are Pipil
(Nicaragua), Hopi (Arizona), Cupeño (southerneastern California), and
Tümpisa Shoshone (east central California). Pipil is the southernmost
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  185

descendent of Aztec and a surviving language probably of a military garrison.

Cupeño is an isolated small language spoken in an eastern Sierra Nevada
oasis with no history of expansion. Hopi is a pueblo language that gives some
evidence of early admixture with a more southern Uto-Aztecan language
(Merrill 2012) but has a long history of isolation. Tümpisa Shoshone is from
the Numic branch. Table 7.3(d) shows that the complexity levels of expan-
sive Pipil and Tümpisa Shoshone are lower than those of Hopi and Cupeño,
as predicted. The difference is mostly due to CC, consistent with what is
suggested by the Avar sphere.

Though these samples are small, the results are generally consistent with
predictions of higher complexity for sociolinguistically isolated communities.
CC appears to be the better mirror of sociolinguistic history, and EC points in
the same direction but unevenly. Nonetheless, combined CC + EC tends to yield
very good correlations with present and prehistoric sociolinguistics: sociolinguis-
tically isolated languages are more complex and expansive languages less complex.

7.4 Discussion and conclusions

To summarize, CC makes something very similar to informational (or Kolmogorov)

complexity straightforwardly measurable using standard structural analysis and well-
worked out theoretical principles. I hope it will make it possible for any linguist to
measure and compare the complexity of other languages. The initial hope for CC as
first attempted (Nichols 2015) was that it would be a replacement and improvement
on EC and more cost-effective. It actually turned out to be not a replacement and no
less labour-intensive but a useful complement; combining the two can give a very
serviceable complexity measure which, as intended, is capable of reflecting sociolin-
guistic history and shows interesting geographical distributions.
This chapter has laid out a method for describing and measuring CC in
inflectional morphology, as a set of seventeen separate variables which for this
first attempt were simply added together without weighting. These represent a
well-defined and crosslinguistically well-represented subset of inflectional morph-
ology; for both CC and EC, in order to make surveys manageable in time cost,
inflectional morphology must be sampled rather than covered fully.
In a survey of just over a hundred languages, CC and EC proved to be
independent of each other and, independently or combined, give quite revealing
results. In terms of geography, CC and EC values both follow worldwide east-west
clines in the upper northern latitudes (as do all other composite variables I have
surveyed). The continents surveyed all have similar means and ranges of diversity
in their complexity values; local areas can vary more, and families can differ still
more. For an area with a large range of values, one can question whether it is
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

186  

genuinely an area (though a good answer will require surveying more than
complexity). Both EC and CC correlate positively with altitude, a geographical
factor that is not the cause of complexity levels but reflects the sociolinguistics of
isolation. The four families adequately represented in the sample all display
some positive correlation of complexity with sociolinguistic isolation, supporting
principles advanced in historical linguistics and sociolinguistics.
The definitions and coding used here were arrived at using the autotypologizing
principle (Bickel & Nichols 2002) of no fixed ontology and constant redefining
and recoding as the categories emerge from analysis of more and more languages.
Arriving at the current typology has been very labour-intensive, making this pilot
survey inordinately time-consuming. By now, though, the typology has stabilized
to the point that language surveys themselves are not unduly labour-intensive.
This line of inquiry can be improved by expanding the sample to give all
continents and areas comparably dense coverage to what has been done here for
northern Eurasia and North America, and covering thoroughly a larger number of
families and local areas. Methods of weighting the variables, and different calcu-
lations using different combinations of variables, need to be proposed and tested;
among other things this will give firm grounding to comparisons of the relative
complexity of Indo-European noun inflection and polysynthetic verb inflection.
For stem classes and inflectional classes, which as mentioned are rarely distin-
guished in grammars, we need improved and consistent descriptive coverage.
We also need consensus definitions and criteria for characterizing the numbers
of conforming and non-conforming members of classes that have some semantic
or other basis, such as gender classes; descriptions like ‘miscellaneous’ (compos-
ition of a class), ‘predictable’ (class membership), ‘arbitrary’, etc., are not consist-
ently used. The applicability thresholds used here (section 7.2.2)—few or no
members predictable, a sizable minority predictable, most or all predictable—
seem workable but require some quantification, however approximate, of the class
membership and openness.
Inflectional paradigms are ideally suited to an approach like this one. The same
approach works well for some domains of derivational morphology but not all.
For phonology and syntax and probably some derivational morphology,
non-transparency will probably need to be described with a measure of the
distance between underlying and surface.
I see this kind of study as moving linguistics in the direction of the data
sciences. Variables that form geographically very large patterns, or that correlate
with such things as sociolinguistics, expansions, and other human population
developments raise the prospects of multifactorial interdisciplinary collaboration.
A single variable surveyed in a 113-language sample is not what one would call Big
Data, but behind the convenient single number representing the CC value lie
seventeen variables surveyed across three POS and eight categories—a total of
over 200 datapoints per language or over 20,000 for the hundred-language sample.
Massive scope, making possible close comparison with the differently distributed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  187

data of other ﬁelds, might require some 400–500 languages plus similarly massive
data for a few other composite variables or many simple ones. Creating such a
resource is an ambitious but entirely feasible project.

Appendix 7.1 Categories and variables used here

For deﬁnitions and discussion, see section 7.2.2.

Variables
* = entries are number of categories in the paradigm; others are presence vs.
absence (calculated as 1 and 0 in total complexity figures).
1. Inflection classes*
2. Unpredictability of inflection classes*
3. Inherent categories*
4. Unpredictability of inherent classes*
5. Stems per lexeme*
6. Stem classes per language*
7. Unpredictability of stem classes*
8. Arguments indexed
9. Co-exponence
10. Syncretisms
11. Allomorphy
12. Position discrepancies
13. Category discrepancies
14. Wordhood discrepancies
15. Partial marking
16. Multiple marking
17. Other
Grammatical categories surveyed here
Case. Case marking of A S O G T and Poss only.
Gender. Noun gender only.
Number. Singular and plural only.
Person. 1-2-3 singular inflectional paradigms; 1-2 singular and plural for
independent personal pronouns.
Person-number, where these two are co-exponential.
Classifier. Chiefly numeral classification; but used for a second set of gender
categories in languages with two gender systems (here, only Mian).
TAM. The most basic synthetic present-like and aorist-like tense categories,
where distinguished; where lacking, two other basic tense categories; where
there is no tense, no entry.
General. Where a variable cannot easily be attributed to any one category.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

188  

Appendix 7.2 Cell totals per category and variable

Case Gender Number Person Pers-No Classiﬁer TAM General Total

Inﬂection categories 166 53 199 36 131 7 108 0 700
Inﬂection classes 206 44 164 36 185 5 122 62 824
Unpredictability 8 26 25 8 41 0 4 9 121
Inherent categories 0 152 11 0 0 15 0 0 178
Unpredictability 0 110 4 0 37 6 4 9 170
Stems per lexeme 56 3 10 4 7 1 27 320 428
Stem classes per lg. 49 2 11 7 13 1 47 405 535
Unpredictability 5 2 2 0 9 0 26 108 152
Arguments indexed 0 0 0 0 0 0 0 178 178
Fusions 4 28 2 4 106 0 5 21 170
Syncretisms 54 25 7 5 43 0 1 4 139
Overlaps 13 0 0 2 1 0 0 0 16
Allomorphy 57 6 11 9 21 0 8 6 118
Position discrepancies 7 7 5 6 15 1 3 5 49
Category discrepancies 0 4 2 0 2 0 0 2 10
Wordhood discrepancies 13 4 1 1 10 0 0 7 36
Partial marking 1 27 2 0 3 1 0 0 34
Multiple marking 0 1 1 2 11 0 0 0 15
Other 0 0 2 12 43 0 0 4 61
TOTAL 638 494 459 132 678 37 355 1140 3934

Appendix 7.3 Sample

Classiﬁcation and geography of the 113 sample languages.

* = languages with only CC data and no EC data. Languages where the stock
name is identical to the language name are isolates.

Language Stock Continent Area

Fula N. Atlantic Africa
Lango Nilotic Africa
Luganda Benue-Congo Africa
Jamsay Dogon Africa
Fur Fur Africa
Haro Ta-Ne Omotic Africa
Somali Cushitic Africa
Dahalo Cushitic Africa
Nama Juu Africa
Basque Basque (isolate) W Eurasia
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  189

German Germanic W Eurasia Circum-Baltic

Russian Indo-European W Eurasia Circum-Baltic
Lithuanian Indo-European W Eurasia Circum-Baltic
Sorbian * Indo-European W Eurasia Circum-Baltic
Slovene Indo-European W Eurasia
Bulgarian Indo-European W Eurasia Balkan
Romanian Indo-European W Eurasia Balkan
Albanian Indo-European W Eurasia Balkan
Greek Indo-European W Eurasia Balkan
Ossetic Indo-European W Eurasia Caucasus
Kabardian West Caucasian W Eurasia Caucasus
Ingush Nakh-Daghestanian W Eurasia Caucasus
Avar Nakh-Daghestanian W Eurasia Caucasus
Karata Nakh-Daghestanian W Eurasia Caucasus
Tindi * Nakh-Daghestanian W Eurasia Caucasus
Godoberi Nakh-Daghestanian W Eurasia Caucasus
Hinuq Nakh-Daghestanian W Eurasia Caucasus
Hunzib Nakh-Daghestanian W Eurasia Caucasus
Lak Nakh-Daghestanian W Eurasia Caucasus
Icari Nakh-Daghestanian W Eurasia Caucasus
Udi Nakh-Daghestanian W Eurasia Caucasus
Tsakhur Nakh-Daghestanian W Eurasia Caucasus
Lezgi Nakh-Daghestanian W Eurasia Caucasus
Archi Nakh-Daghestanian W Eurasia Caucasus
Khinalug Nakh-Daghestanian W Eurasia Caucasus
Svan Kartvelian W Eurasia Caucasus
Pazar Laz Kartvelian W Eurasia Caucasus
Saami (Kildin) Uralic W Eurasia
Finnish Uralic W Eurasia
Mordvin Uralic W Eurasia
Mari Uralic W Eurasia
Hungarian Uralic N Asia
Khanty (E.) Uralic N Asia
Khanty (N.) * Uralic N Asia
Nganasan Uralic N Asia
Tundra Nenets Uralic N Asia
Ket Yeniseian N Asia
Evenki Tungusic N Asia
Even * Tungusic N Asia
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

190  

Udehe Tungusic N Asia

Nanai Tungusic N Asia
Manchu Tungusic N Asia
Yakut Turkic N Asia
Chuvash * Turkic N Asia
Mongolian Mongolic N Asia
Yukagir (Tundra) Yukagir N Asia
Ainu Ainu (isolate) N Asia
Nivkh Nivkh (isolate) N Asia
Itelmen Chukchi-Kamchatkan N Asia
Chukchi Chukchi-Kamchatkan N Asia
Aleut Eskimo-Aleut N Asia
Mandarin Sino-Tibetan S&SE Asia
Paiwan Austronesian S&SE Asia
Bininj Gun-Wok Gunwingguan Australia
Mawng Iwaidjan Australia
Bardi Nyulnyulan Australia
Diyari Pama-Nyungan Australia
Kuniyanti Bunuban Australia
Djingulu Mindi Australia
Mian Ok New Guinea
Usan Madang New Guinea
Tawala Austronesian New Guinea
Yimas Lower Sepik North America
Koiari Koiarian North America
Central Alaskan Yup’ik Eskimo-Aleut North America N Pacific Rim
Zuni Zuni (isolate) North America
Acoma Keresan North America
Lakhota Siouan North America
Kiowa Kiowa-Tanoan North America
Hupa Athabaskan North America N Pacific Rim
Cree Algic North America
E. Pomo Pomoan North America N Pacific Rim
Seneca Iroquoian North America
Thompson Salish North America N Pacific Rim
Yurok Algic North America N Pacific Rim
Karok Karok (isolate) North America N Pacific Rim
Nuuchahnulth * Wakashan North America N Pacific Rim
Tümpisa Shoshone Uto-Aztecan North America
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  191

Yokuts Utian North America

Maidu Maiduan North America
Southern Sierra Miwok Miwokan North America
Wappo Yuki-Wappo North America
Wishram Chinookan North America N Paciﬁc Rim
Nez Perce Klamath-Sahaptian North America
Klamath Klamath-Sahaptian North America
Chimariko Chimariko (isolate) North America N Paciﬁc Rim
Cupeño Uto-Aztecan North America
Koasati Muskogean North America
Hopi Uto-Aztecan North America
Jamul Tiipay Yuman North America
Pipil Uto-Aztecan Central America
Tzutujil Mayan Central America
Cayuvava Cayuvava South America
Movima Movima South America
Kashibo-Kakataibo Panoan South America
Jaqaru Aymaran South America
Aymara Aymaran South America
Huallaga Quechua Quechua South America
Mapudungun Mapudungun South America
Kwaza Kwaza (isolate) South America
Paez Paesan South America

Appendix 7.4 CC levels in the survey languages

(a) Including count of categories (though this approximates EC). Lowest, in

increasing order: Mandarin, Diyari, Manchu, Lango. Highest, in increasing
order: Ket, Lower Sorbian, Russian, Skolt Saami. The scale is 9–68; median=mean
(arrow) is 32.
(b) Excluding count of categories to give a more strictly CC total. Lowest, in
order: Mandarin, Manchu=Diyari, Lango=Klamath=Kashibo-Kakataibo. Highest:
Russian, Lower Sorbian, Slovene, Skolt Saami. The scale is 7–60; mean 26.4,
median (arrow) 25.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

192  

(a) Including count of categories

(b) Not including count of categories

0
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

8
The complexity of grammatical gender
and language ecology
Francesca Di Garbo

8.1 Introduction

This chapter is a qualitative investigation of the sociohistorical correlates of

diachronic change in the domain of grammatical gender agreement. I define
grammatical gender systems as systems of nominal classification that presuppose
agreement marking and thus highly grammaticalized patterns of inflection, often
involving shared exponence with other nominal categories (e.g., number), syn-
cretism, and other types of coding asymmetries. In languages with grammatical
gender, nouns are assigned to different classes. These categorizations are not
necessarily, or not only, encoded on nouns. On the contrary, gender marking is
displaced on words that are engaged in a morphosyntactic relationship with nouns
(e.g., adnominal modifiers, verbs, pronouns) and whose inflections point at the
gender of the noun.
During the last couple of decades, a number of studies have brought qualitative
and quantitative evidence in support of the idea that the evolution of morpho-
logical complexity (both at the syntagmatic and paradigmatic level) is sensitive to
sociohistorical dynamics concerning language population (see, among others,
Lupyan & Dale 2010; Trudgill 2011; Bentz & Winter 2013; Bentz et al. 2015).
Complexities in certain domains of morphology represent a challenge for the
adult learner and tend to be eroded with the increase of the number of adult
learners at a given point in the history of a speech community. This adaptive
response of language structures to social factors has been claimed to be also crucial
to understand how gender systems change through time and how they are
distributed worldwide (Trudgill 1999; Nichols 2003; McWhorter 2007). For a
number of language families around the world (e.g., Indo-European and Niger-
Congo) grammatical gender can be reconstructed as a feature of the proto-
language, and as one of the most long-lived. Yet, even though stable at the
family-level, the gender systems of individual languages within a gendered family
may undergo reduction and loss due to language-internal processes of morpho-
phonological erosion and/or reanalysis that, at least in some cases, pair up with a
situation of prolonged contact and bilingualism with languages lacking gender

Francesca Di Garbo, The complexity of grammatical gender and language ecology In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Francesca Di Garbo.
DOI: 10.1093/oso/9780198861287.003.0008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

194   

(on the role of language contact in the loss of grammatical gender, see the recent
study by Igartua 2019; for a broader discussion of loss of morphology and
imperfect language learning, see the contributions by McWhorter, Chapter 10,
and Berdicevskis & Semenuks, Chapter 11, both in this volume). It has also been
observed that gender systems tend to cluster geographically and to be best
preserved in languages surrounded by other languages with gender (Nichols
1992, 2003). Thus languages that undergo complete gender loss are expected to
be neighbours with each other or to have languages without gender as their closest
neighbours (Nichols 2003: 299–304).
While instances of gender reduction and loss under contact situations are
relatively well documented in the literature, the role of language contact in the
rise of gender systems has, so far, been poorly explored, and scholars generally
agree on that gender systems very seldom arise within language families that
normally lack gender (Nichols 2003: 308). This is directly connected with the fact
that full-fledged gender marking systems are commonly associated with rather
pervasive patterns of agreement, which are notoriously unlikely to be borrowed
(for a similar argument, see Igartua 2019: 209). However, recent research (Stolz
2012, 2015; Di Garbo & Miestamo 2019) shows that elementary patterns of gender
agreement may emerge as a result of borrowing of noun phrases from contact
languages with gender, and that, albeit rare, these types of systems are spread
across unrelated languages and in different areas of the world.
Existing research on the stability and evolution of gender systems under contact
situations focuses either on the decline or on the rise of gender systems, and the
two processes are rarely discussed together. Here I argue that, in order to fully
understand to which extent morphological complexity in the domain of gram-
matical gender ties up with factors pertaining to the social history of a speech
community, a comprehensive survey of the evolutionary dynamics of gender
systems—focusing not only on loss and emergence, but also on reduction and
expansion—is in place. In addition, given that, by definition, gender systems are
bound to the existence of productive agreement patterns (Corbett 1991), I contend
that complexification and simplification in the morphological encoding of gender
distinctions must be primarily studied through the analysis of agreement pat-
terns.¹ Within contact linguistics, it is generally assumed that contact-induced loss
or emergence of agreement presupposes long-term contact, heavy borrowing and/
or extensive bilingualism between speech communities (Thomason 2001: 71).
However, to date, and to the best of my knowledge, there have been no studies
that systematically tackle the issue of which factors may account for the occur-
rence of these opposite patterns of change, agreement loss and emergence, under

¹ Focusing on patterns of gender agreement does not mean, of course, to underestimate the
importance that nominal gender marking has in languages that display it (for a more thorough
discussion, see section 8.2).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     195

allegedly similar sociohistorical scenarios. The present study attempts to ﬁll in this
gap by investigating loss of gender agreement in language families characterized
by the presence of this feature and, conversely, the insurgence of gender agree-
ment in languages with no inherited gender systems. Beside loss and emergence,
I also study the reduction and expansion of gender agreement patterns within
gendered language families. With respect to sociohistorical variables, the study
especially focuses on language contact dynamics, with particular attention to
asymmetries between the populations in contact, both in terms of the demo-
graphic structure (population size) and prestige differences.
The chapter is structured as follows. In section 8.2, I discuss in what respects
gender systems, as a grammatical and functional domain, can be relevant to the
study of morphological complexity. The sampling methodology and data collec-
tion procedure are outlined in section 8.3. In section 8.4, I provide an overview of
the patterns of language change attested in the data set, and illustrate their
geographic distribution in section 8.5. Section 8.6 discusses the sociohistorical
factors that are associated with the patterns of change attested in the languages of
the sample. A summary of the results and some concluding remarks are given in
section 8.7.

8.2 Grammatical gender and morphological complexity

Recent research on linguistic complexity and the typology of gender systems

(Audring 2014; Di Garbo 2016) suggests that three dimensions of variation can
be relevant to a typologically informed, descriptive² account of the complexity of
gender systems:

• The number of gender distinctions, under the assumption that the higher the
number of distinctions, the more complex the gender system.
• The number and nature of assignment rules, under the assumptions that: (a)
a gender system where gender assignment is both semantic and formal is
more complex than a system where gender assignment is only semantic or
only formal, and (b) a gender system with ﬂexible assignment is more
complex than a system with rigid assignment.
• The pervasiveness of gender marking, under the assumption that the higher
the number of word classes and syntactic domains that are subject to gender
marking, the more complex the gender system.

² In this chapter, the notion of descriptive, absolute complexity is kept distinct from the notion of
difﬁculty. Under the former approach, complexity is operationalized in terms of description length
(Dahl 2004; Miestamo 2008). Under the latter approach, complexity is a measure of difﬁculty and costs
in language learning and use (Kusters 2003). For a discussion of these and related topics, see Arkadiev
& Gardani, Chapter 1 in this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

196   

The suggested dimensions are based on established typological parameters for

the classification of gender systems, but do not exhaust all possible ways in which
gender systems may vary, and they can be in turn broken down into a number of
subdimensions. For a detailed analysis of how the complexity of gender systems
can be further differentiated, see Audring (2017, 2019).
While the first and second dimensions of the proposed complexity metrics are
not directly linked to morphological complexity, the third dimension (pervasive-
ness of gender marking) directly hinges on morphology. Gender marking presup-
poses the existence of morphology that is dedicated to the expression of gender.
This applies both to nominal gender marking (also known as overt gender) and to
non-nominal gender marking (also known as gender agreement). If we consider
non-nominal gender marking first, grammatical gender systems can be associated
with morphological complexity both syntagmatically and paradigmatically. At the
syntagmatic level, patterns of gender agreement are sets of inflections that may
occur on various entities within an utterance (e.g, articles, adjectives, demonstra-
tives, verbs, personal pronouns) and that point at one of multiple classes to which
nouns can be assigned (e.g, in Italian, the masculine and feminine class). At the
paradigmatic level, each of the items that carry gender inflection in a language
typically possesses as many forms as there are gender values to be distinguished,
and the number of available forms is even higher if, for instance, a language
expresses gender distinctions both in the singular and in the plural. In Italian
(Indo-European, Romance),³ the form of the definite article varies between il/lo,
la, i/gli, le, depending on whether the noun marked as definite is masculine
singular, feminine singular, masculine plural, or feminine plural.⁴ Moving on to
overt gender marking, in several languages, gender marking is not only restricted
to agreement but also affects nominal morphology, with gender distinctions being
overtly marked on nouns. Overt gender marking features higher syntagmatic
complexity, inasmuch as it increases the number of word classes where gender
is flagged within an utterance. It also increases paradigmatic complexity, in that it
leads to higher lexical diversity, given that each noun may in principle have as
many forms as there are gender values to be distinguished. Nominal gender
marking is, for instance, very pervasive in Atlantic-Congo gender systems, as
illustrated in (1) with an example from the Bantu language Chichewa.

³ In this chapter, language classiﬁcation is based on Glottolog (Hammarström et al. 2019).

⁴ This type of morphological paradigmatic complexity is defined by Bentz et al. (2015: 2) as an
instance of lexical diversity, which they describe as the ‘distribution of word forms or word types’ that
languages ‘use to encode essentially the same information’. In the domain of definiteness marking,
Italian exhibits higher lexical diversity than, say, English, because different forms of the definite articles
are used depending on the gender and number values of nouns, whereas definite articles in English are
gender and number invariant.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     197

(1) Gender marking in Chichewa (Atlantic-Congo, Bantu; Kiso 2012: 18)

chi-nkhanira cha-chi-kazi chi-ku-dzi-kanda
7-scorpion -7-female 7.---scratch
‘The female scorpion is scratching itself.’

In (1), the markers of class 7 (the singular form of gender 7/8 in Chichewa) occur
on the adnominal modifier, the verb, and the noun itself.
The relationship between nominal and non-nominal (agreement-based) gender
marking is not trivial. In some languages, as it is the case in Bantu, nominal and
non-nominal marking can have similar means of expression from the point of
view of the phonological appearance of the morphemes used to encode gender
distinctions. However, this formal correspondence may only apply to parts of the
system rather than to all nouns and all agreement targets. In addition, nominal
marking and agreement marking may have different sources and undergo differ-
ent types of diachronic developments. For instance, as is also the case in Bantu
languages, animacy-based marking may develop in the domain of agreement
without affecting nominal marking. Thus, in languages that have both nominal
and agreement-based marking of gender distinctions, it is important to consider
these as two separate dimensions that may, but need not interact with each other.
In this chapter, I restrict my focus to patterns of change in the domain of
agreement marking and their effect on the complexity of gender systems. The
reason behind this choice is twofold. On the one hand, while agreement marking
is definitional to gender (there is grammatical gender only if there is displaced
marking of classificatory distinctions through agreement), nominal marking is not
(many languages mark gender distinctions only via agreement). On the other
hand, while agreement marking directly hinges on inflectional morphology, in
that gender agreement targets obligatorily inflect for gender, nominal gender
marking resides more in the domain of lexicalized distinctions and/or word
formation rules, which can be argued to be less central to morphological com-
plexity. The patterns of change in the domain of agreement marking that the study
focuses on are presented and discussed in section 8.4.

8.3 Method and data

8.3.1 Sampling methodology and variables in focus

The study is based on a sample of 36 languages distributed among 15 sets of

closely related languages. Each language set contains two to three languages with
the exception of Chamorro, a language isolate within the Austronesian family, and
the mixed language Michif. The geographical distribution and genealogical afﬁli-
ation of the sample languages are shown in Figure 8.1. Even though language sets
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

198   

Legend
Balto−Slavic Insular Celtic
Bantu Iranian
Basque Khasian
Chamorro
Lezgic
Central Gunwinyguan
Germanic Mek
Ghana−Togo−Mountain Michif
Greek Thebor

Figure 8.1. The language sample

Note: See also Di Garbo & Miestamo (2019).

from at least ﬁve of the six world’s macro-areas are represented in the sample, the
data set is largely skewed towards Eurasia. The reason behind this bias is twofold.
First, along with Africa, Eurasia is one of the areas of the world where gender
systems are most frequent. Second, for many of the Eurasian genealogical units
included in the sample, diachronic developments in the domain of nominal
morphology have been studied with the support of historical-comparative data,
and the social history of many of these speech communities is also relatively well
documented. The languages of Eurasia thus qualify as an appropriate starting
point to explore the evolutionary dynamics of morphological complexity in the
domain of gender marking and their sociohistorical correlates. At least one
genealogical unit for all other macroareas (except for South America) has been
added. A complete list of the languages sampled for each of the genealogical units
is given in Appendix 8.1.
Each language set consists of one conservative language and at least one
innovative language with respect to gender agreement marking, with the excep-
tion of the Thebor (Bodic) languages Shumcho and Janshung, both of which
represent instances of emerging gender agreement patterns within the family.
Languages within one and the same set can be mutually intelligible with each
other (as in the case of Kelasi and Kafteji within the Northwestern Iranian set), or
more distantly related (as in the case of Nalca and Eipo within the Mek set). The
patterns of language change accounted for are: loss, reduction, emergence and
expansion in the domain of gender agreement. These are compared with either the
retention of gender agreement (in case of reduction, loss and expansion) or with
its absence (in case of emerging gender agreement). These diachronic processes
are investigated by examining the morphosyntactic domains of gender marking in
a language (e.g., attributive modiﬁers, predicates, pronouns), and the way in which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     199

these vary across genealogically related languages: what are the word classes that
inﬂect for gender in language X as opposed to the closest relatives Y and Z? Do all
targets of gender agreement mark the same kind of gender distinctions or is there
a split between, say, adjectives, articles and demonstratives distinguishing between
masculine, feminine and neuter gender, and personal pronouns distinguishing
between animate and inanimate gender? The relevance of these questions for the
understanding of the complexity of gender systems is discussed in section 8.4.
In addition to representing more or less conservative languages in the domain
of gender agreement, the sampled language sets and the individual languages
within each set, were selected so as to attempt to capture diversity at the socio-
historical level. In this respect, variables such as demography, domains of use, and
history of contact were considered.
This sampling methodology, which aims to capture both structural and socio-
historical diversity within sets of closely related languages, has been already
applied to studies of the relationship between language structures and social
structures. An example of this approach is the study of morphosyntactic com-
plexity and language contact by Maitz & Németh (2014), where morphosyntactic
complexity in three varieties of German is investigated to the effect that these
varieties represent three different sociohistorical proﬁles: one standard, and rela-
tively high contact language (Standard German), two contact languages (the
pidgin Kiche Duits and the creole Unserdeutch), and one low contact variety
typically learned as L1 only (Cimbrian).

8.3.2 Data collection

Data were collected by using a questionnaire, which was sent out to experts of
individual languages, as well as by means of descriptive resources. For those
languages for which questionnaire responses could not be obtained, I used the
questionnaire as a guideline to conduct more informal consultations with lan-
guage experts and to gather information from descriptive resources.
The questionnaire consists of two parts. Part 1 focuses on language ecology and
language contact and aims at capturing information on the present and past
geographical and sociohistorical environment in which a given language is/was
used, with a set of ﬁne-grained questions ranging from demography to domains of
language use, issues of language identity and prestige, code switching practices
and language contact in the past.⁵ Part 2 focuses on grammatical gender and aims
at capturing information on number and type of gender distinctions, gender
assignment rules, the morphology and syntax of gender marking and the
diachrony of a given gender system. The questionnaire is based on two different

⁵ Not all of these questions could be answered for all languages in the sample.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

200   

pre-existing typological questionnaires. Part 1 is based on John Bowden’s ques-

tionnaire on language contact in East Nusantara (Eastern Indonesia). Part 2 is
based on Greville Corbett’s questionnaire on gender and number.⁶

8.4 Patterns of change under study: an overview

Here I provide an overview of the patterns of language change that appear to

foster the reduction, loss, expansion and emergence of gender agreement in the
languages of the sample. I ﬁrst discuss patterns of reduction and loss, moving to
emergence and expansion thereafter. I also discuss how each of the patterns in
focus may contribute to the increase and/or decrease of aspects of morphosyn-
tactic complexity in the domain of gender marking. A description of patterns and
contexts of change in each of the sampled languages is given in Appendix 8.1. For
a detailed discussion of the patterns of change attested in the languages of the
sample and summarized herein, see Di Garbo & Miestamo (2019).

8.4.1 Reduction and loss of gender marking

The reduction and loss of gender agreement in the languages of the sample may
result from two distinct processes of language change: (1) morphophonological
erosion and (2) redistribution of agreement patterns. Under morphophonological
erosion, gender marking is eroded or disappears as a result of sound changes that
lead to the loss of segmental morphology. Under redistribution of agreement
patterns, one gender agreement pattern spreads at the expenses of others, leading
to the partial or complete neutralization of gender distinctions. Both processes
exhibit properties of directionality, but the preferred directionalities differ under
one or the other process: morphophonological erosion is found to often spread
from the domain of attributive modifiers whereas the redistribution of gender
agreement patterns often has its onset in the domain of anaphoric pronouns.
An example of partial loss of gender marking as a result of morphophonological
erosion is Standard Swedish (Indo-European, North Germanic). In Standard
Swedish, two different systems of gender distinctions are attested. Within the
noun phrase, the language distinguishes between two genders: the Common
Gender and the Neuter Gender, en person ‘a person’ and ett hus ‘a house’. This
distinction is marked on definite and indefinite articles, demonstrative modifiers,
and adjectives. In the domain of third person pronouns, a Masculine/Feminine

⁶ Both questionnaires can be freely accessed through the repository for ‘Typological tools for ﬁeld
linguistics’ from the website of the former Department of Linguistics at the Max Planck Institute for
Evolutionary Anthropology in Leipzig (http://www.eva.mpg.de/lingua/tools-at-lingboard/question
naires.php).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     201

Table 8.1. Third person pronouns in standard Swedish

Hum. and Higher Anim. M F P⁷

han ‘he’ hon ‘she’ de ‘they’
Inanim. C N P
den ‘it’ det ‘it’ de ‘they’

type of gender distinction is marked if the pronoun antecedent is a human or a

higher animate. If the pronoun⁷ antecedent is a inanimate noun, the Common/
Neuter gender distinction, which is active elsewhere, applies. This split is illus-
trated in Table 8.1.
This split in the domain of gender marking is the result of the merger between
masculine and feminine inflections on adnominal modifiers, which occurred through
a combination of various morphophonological processes, such as the erosion and loss
of the masculine suffix -er from the inflectional paradigm of strong adjectives, the
loss of the masculine suffix -r before the definite suffix in the nominative form of
the noun, and the loss of final consonant length in the inflectional paradigm of the
definite suffixes (Duke 2010: 652–4). Many nonstandard varieties of Swedish, such as
Elfdalian Swedish, still retain the tripartite distinction between Masculine, Feminine,
and Neuter Gender all throughout the gender marking system.
Complete loss of gender inflections as a result of morpholphonological erosion
is attested in the Northwestern Iranian language Kelasi. Kelasi’s closest genea-
logical and geographic neighbour, Kafteji, still retains productive masculine and
feminine gender agreement patterns. Lack of gender marking in Kelasi and
presence of gender marking in Kafteji are exemplified in (2) and (3), respectively.

(2) No gender agreement in Kelasi (Northwestern Iranian; Stilo 2019: 45)

a. <img>m œm<img>d-e ziœ-Ø ní-œ.
this P.N-. son-. .-3
‘This (or ‘he’) is not Ahmahd’s son.’
b. <img>m œm<img>d-e dét-Ø ní-œ.
this P.N-. daughter-. .-3
‘This (or ‘she’) is not Ahmahd’s daughter.’

(3) Masculine and feminine gender agreement in Kafteji (Northwestern Iranian;

Stilo 2019: 45)
a. <img>m-Ø œm<img>d-ə zeœ-Ø ní-œ.
this-. P.N-. son-. .-3.
‘This (or ‘he’) is not Ahmahd’s son.’

⁷ The Masculine/Feminine distinction is also marked in the accusative and genitive forms of the
pronoun. Cf. honom (3..) vs. henne (3..), and hans (3..) vs. hennes (3..).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

202   

b. <img>m-œ œm<img>d-ə dét-œ ne-áya.

this-. P.N-. daughter-. .-3.
‘This (or ‘she’) is not Ahmahd’s daughter.’

As examples (2) and (3) show, utterances in Kelasi and Kafteji look (and sound)
practically the same and the two languages are highly mutually intelligible. One of
the few striking structural differences between the two languages is, in fact, the
presence of gender inflections in Kafteji (in the form of zero-marked Masculine
and marked Feminine) and its complete absence in Kelasi. Stilo (2019) describes
loss of gender in Kelasi as the result of morphophonological erosion in the domain
of nominal inflection, whereby the possibility to omit overt gender marking on
nouns in certain morphosyntactic contexts triggers the systematic erosion of
gender marking elsewhere. No information is however given about the ordering
of loss of gender inflection on the various agreement targets.
Loss of gender by the redistribution of agreement patterns is attested, among
other languages, in Cappadocian Greek (Indo-European, Greek), where it results
from the generalization of neuter agreement to all instances of masculine and
feminine gender agreement (Karatsareas 2009, 2014). Comparative evidence from
closely related dialects, such as Pontic Greek, allows us to infer how the process of
redistribution took place. In Pontic Greek, grammatically masculine and feminine
nouns denoting inanimate entities trigger neuter agreement on all agreement
targets but the prenominal articles. This is shown in (4), with the example of
the inanimate feminine noun pórta ‘door’, which triggers neuter agreement on the
past participle anixtón ‘open’, but feminine agreement on the prenominal definite
article i.

(4) Argyroúpolis Pontic (Indo-European, Greek; Karatsareas 2014: 79)

i pórta (...) móno ímoson óran estéknen anixtón
.. door.. (...) only half.. hour.. stay..3 open..
‘The door would stay open for only half an hour.’

Conversely, in Standard Modern Greek, the same controller noun selects

feminine agreement on all targets.

(5) Standard Modern Greek (Indo-European, Greek; Karatsareas 2014: 80)

i pórta móno misí óra émene anixtí
.. door.. only half.. hour. stay..3 open..
‘The door stayed open for only half an hour.’

In Pontic Greek, the redistribution of the neuter gender agreement pattern is

semantically motivated. Neuter agreement is associated with inanimate referents,
and inanimate nouns select neuter agreement irrespectively of their grammatical
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     203

gender. In Cappadocian Greek, where the generalization of the neuter agreement

patterns has taken over, no trace is left of this semantically based redistribution.
Morphophonological erosion and agreement redistribution are not two mutu-
ally exclusive processes. An example of heavily reduced gender agreement system
where both morphophonological erosion and redistribution of agreement patterns
are at play is Karleby Swedish (Indo-European, North Germanic), a variety of
Swedish spoken in the town of Karleby, which is located in the Finnish region of
Ostrobothnia. In Karleby Swedish, gender inflections have been lost everywhere
except for the unbound form of the definite articles and the personal and
demonstrative pronouns, all of which still inflect as masculine or feminine, but
only when the controller nouns denote human beings (Hultman 1894: 229;
Huldén 1972: 47). Similarly, gender marking has undergone severe reduction
and near-loss across different varieties of Tamian Latvian. According to the
recent analysis by Wälchli (2017), the erosion of gender distinctions started out
with the loss of short vowels in final syllables. This occurred first on nouns,
leading to the neutralization of the masculine and feminine distinction in the
accusative plural form, and later extended to agreement marking, starting from
the demonstratives. This initial process of morphophonological erosion was
followed by multiple processes of redistribution in other domains of gender
marking, which led to the generalization of the masculine agreement pattern at
the expense of the feminine. Traces of feminine marking are still found, to
different extents and different degrees of productivity, in nearly all varieties of
Tamian Latvian.
For a sociohistorical analysis of these developments, see section 8.6.
While it can be assumed that complete loss of gender agreement marking is a
straightforward process of morphosyntactic simplification which decreases the
overall number of grammatical meanings that must be expressed in a given
morphosyntactic context (e.g., on adnominal modifiers, anaphoric pronouns,
predicates), partial losses and redistributions of gender marking are harder to
classify as straightforward simplification. Here I base my assessment of morpho-
syntactic complexity in reducing gender systems on recent work by Audring
(2017), where different aspects of gender marking are broken into a multidimen-
sional space of variation.
Partial loss of gender marking as a result of morphophonological erosion can
pave the way to split gender agreement systems such as the one attested in
Standard Swedish. Here, not all targets of gender marking are sensitive to the
same type of gender distinctions: the personal pronouns make a sex-based dis-
tinction that is not found in the domain of adnominal modification. Furthermore,
sex-based marking on personal pronouns is conditional, and only occurs if the
pronoun’s antecedent is a human being or a higher animate. According to the
complexity metric proposed by Audring (2017), split gender agreement systems
and conditional gender marking feature higher complexity than absence thereof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

204   

This is captured by two different dimensions of Audring’s metric, both pertaining

to the domain of ‘target complexity’:

(6) a. Matching values < Mismatching values

b. Targets match controller in value < Targets do not match controller in
value
(Audring 2017: 63–4)
In (6), the symbol ‘<’ reads as ‘is less complex than’. Accordingly, the statement
in (a) entails that gender systems where different agreement targets align with
different types of gender distinctions are more complex than gender systems
where all targets match with respect to the gender values that they are able to
flag through their inflectional morphology. Mismatching values across types of
targets impact the complexity of gender marking systems both at the paradig-
matic and syntagmatic level. Paradigmatically, the inventory size of available
gender markers differs across target types. Syntagmatically, given one and the
same controller noun, different gender values may be indexed by different
targets. For instance, given the Swedish noun kvinna ‘woman’, adnominal
modifiers will index that the noun belongs to the Common Gender while third
person pronouns will index that the referent of the noun is a female. In both
cases, this leads to longer description length, and thus higher complexity. The
statement in (b) targets the relationship between controller nouns and agree-
ment targets in languages that exhibit split systems of gender agreement. In
Standard Swedish nouns are lexically Common or Neuter in gender while
natural gender distinctions do not feature in the range of lexicalized gender
values. Masculine/Feminine marking on personal pronouns indexes referential
properties of nouns, that do not match the inherent gender of the controller
noun. Once again, this mismatch engenders higher descriptive complexity than
its absence.
A similar reasoning applies to gender marking systems in which agreement
patterns undergo partial redistribution. For instance, in the Pontic Greek example
presented in (4), the generalization of the neuter agreement pattern is constrained
by type of target and type of controller. With nouns that are grammatically
masculine and feminine, but semantically inanimate, neuter agreement is selected
on all agreement targets but prenominal definite articles, which still inflect
according to the grammatical gender of nouns. The statements in (6) also encom-
passes these types of phenomena. More generally, according to Audring (2017),
nouns with mismatching lexical and referential semantics (also known in the
literature as hybrid nouns), such as the masculine and feminine inanimate nouns
of Pontic Greek, are likely to trigger inconsistent agreement patterns and thus
feature higher complexity in the domain of gender marking than other nouns.
This is formalized in (7):
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     205

(7) Consistent controller < Hybrid controller

(Audring 2017: 60)

8.4.2 Emergence and expansion of gender marking

Moving on to emergent patterns of gender agreement, in the languages of the

sample these are associated both with language-internal developments and with
language contact. Contact-induced emergent gender systems, which are most
central to the objectives of the present study, are attested in Chamorro
(Austronesian), Lekeitio Basque (Basque), Shumcho and Jangshung (Thebor).
These emergent patterns of gender marking presuppose borrowing of inflected
forms, which in turn become at least semi-productive in the recipient languages.
Irrespective of the specific contact scenarios from which they originate, these
borrowed patterns of gender agreement appear to share a number of character-
istics. First of all, the encoding of gender distinctions is, in all attested cases,
noun-phrase internal, with adnominal modifiers as the sole target of gender
marking. Second, the use of gender inflection is highly lexically restricted. In
Shumcho, for instance, only adjectives borrowed from neighbouring Indo-
Aryan languages carry productive gender inflections. Example (8) illustrates
masculine and feminine suffixal marking on adjectival modifiers of Indo-
Aryan origin in Shumcho.

(8) Masculine and feminine gender inﬂections on adjectival modiﬁers in

Shumcho (Thebor; Huber 2011: 68)
a. <img>ar-a /laʈ-a phobəlaŋkh
beautiful- deaf- man
‘beautiful/deaf man’
b. <img>ar-e /laʈ-e/-i jobəlaŋkh
beautiful- deaf- woman
‘beautiful/deaf woman’

Furthermore, all instances of borrowed gender agreement in the sample

are strictly semantic in nature, in that they index referential properties of
discourse referents such as animacy and natural gender. In Shumcho, with
nouns denoting male human beings, speakers systematically select the masculine
form of the gender-inﬂecting adjectives; all other nouns (denoting female
human beings and everything else) trigger feminine agreement (Huber 2011:
75, based on elicited data). In natural discourse, sex-based semantic gender
assignment is also extended to domestic animals (Christian Huber, personal
communication).
According to the metric by Audring (2017), purely semantic systems of gender
assignment feature lower complexity than systems of gender assignment that are
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

206   

based both on semantic and formal rules. This is reﬂected in the statement in (9),
about number of rules.

(9) Single type of rule < Several types of rule

(Audring 2017: 65)

The borrowed gender marking systems attested in the sample qualify as simple
also with respect to amount of formal marking, in that gender marking occurs
only in one syntactic domain and, more speciﬁcally, on one type of agreement
target, adnominal modiﬁers. This is captured by the statement in (10).

(10) Single domain < Several domains

(Audring 2017: 64)
Conversely, such emergent systems of gender marking rank high on Audring’s
complexity scale with respect to the criteria of productivity and obligatoriness.
These systems are not fully productive because they only apply to a restricted
portion of the lexicon, that is, only to those adnominal modiﬁers that are
borrowed from the contact language. They are not fully obligatory either in that,
in all attested cases, their usage varies to a large extent across speakers and genres.
According to Audring (2017: 62), low productivity and optionality increase
description length and thus the overall complexity of gender systems. This is
captured by the statements in (11).

(11) a. Gender marking is fully productive < Only a subset of lexical items per
agreement target mark gender
b. Gender marking is obligatory < Gender marking is optional
(Audring 2017: 60)

The patterns of borrowed gender marking attested in the languages of the

sample feature a combination of complexity increasing and decreasing processes
that cannot be subsumed under one dimension of analysis.
Coming to the expansion of gender agreement morphology, the languages of
the sample do not yield to a uniform account of this grammatical development.
On the one hand, in some of the sampled languages gender agreement expansion
is the result of purely language-internal grammaticalization paths that increase the
number of syntactic domains in which gender inﬂections appear. This is, for
instance, the case of the Austroasiatic languages Khasi and Pnar as compared
with their close relative Lyngngam. In Lyngngam, gender is only marked on
pronouns and deictic bases, whereas in Khasi and Pnar gender marking also
occurs on nouns, adnominal modiﬁers, and in Khasi even on verbs. On the
other hand, in at least two of the sampled languages, the expansion of gender
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     207

agreement morphology appears to be rather tied to external events, such as

mechanisms of language transmission under intense language contact, as in the
case of the mixed language Michif, or issues of language planning and standard-
ization, as in the Bantu language Makanza Lingala. These latter cases will be
discussed in detail in section 8.6. Whichever the language-speciﬁc paths that lead
to the expansion of domains of gender marking, their impact on the overall
complexity of gender system is always the same. Based on the statement in (10),
an increase in amount of formal marking features higher complexity of the gender
system.

8.5 Distribution of the patterns of change and clustering

effects within Eurasia

The distribution of the patterns of change attested in the languages of the sample
is presented in Figure 8.2.
Given the limited size of the sample, it is not possible to formulate any
generalization on the relative frequencies of the observed patterns of change;
these, however, tend to be represented evenly within the data set.
One striking fact about the geographic spread of the phenomena under study is
that, within Eurasia (where the majority of the sampled language sets come from),
instances of complete or near-complete loss and of emergence of gender agree-
ment tend to cluster around language family edges, that is, within geographic
areas in which languages belonging to families with a strong bias towards the
presence of gender systems (e.g., different branches of Indo-European) are in
contact with languages belonging to families that are biased towards the absence

Legend
Emergence = 5/36 Reduction = 8/36
Loss = 7/36 Retention = 8/36
Expansion = 6/36 Lack = 2/36

Figure 8.2. Patterns of change in the language sample

Note: See also Di Garbo & Miestamo (2019).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

208   

Table 8.2. Clustering of patterns of change at language-family edges within Eurasia

Target geneal. unit Contact geneal. Pattern of change Relevant languages

unit

Greek Turkic Loss Cappadocian Greek

Balto-Slavic Finnic Loss Tamian Latvian
Lezgic Turkic Loss Aghul, Udi
North Germanic Finnic Near-loss Karleby Swedish
Northwestern Turkic Loss and Kelasi, Kafteji
Iranian expansion
Basque Ibero-Romance Emergence Lekeitio Basque
Thebor Indo-Aryan Emergence Shumcho,
Jangshung

of gender (e.g., varieties of Basque, different Turkic and Finnic languages). The
configuration of these language family edge zones within Eurasia, and the patterns
of change observed in the domain of gender agreement across individual lan-
guages within these zones, are presented in Table 8.2.
One exception to this trend is Irish. Frenda (2011) reports ongoing reduction in
the domain of gender agreement in contemporary urban varieties of Irish, which
he classifies as non-native, and which he compares with more conservative
varieties documented trough recordings from the 1960s and classified as native.
The study provides diachronic corpus evidence that, over the past forty years,
gender distinctions in the domain of personal pronouns have been reorganized
and restructured around a purely semantic type of opposition between ‘female
referents’ vs. ‘everything else’, with the masculine pronouns systematically being
selected not only to refer to semantically and grammatically masculine nouns (as
attested in conservative varieties), but also to semantically inanimate and gram-
matically feminine nouns. This semantic realignment of gender distinctions in the
pronominal domain is explained by Frenda (2011) as one of the many conver-
gence phenomena that Irish is undergoing under the influence of the dominant
language English, and as a result of language attrition. As a result of this process, at
least in the pronominal domain, the gender system of present-day Irish has
converged with the pronominal gender system of English, which is also organized
on the basis of semantic oppositions involving natural gender. According to
Frenda (2011: 20), the major difference between pronominal gender in English
and present-day Irish varieties is that ‘masculine pronouns in Irish fulfil the
functions of both the neuter and masculine pronouns in English’. The contact
situation observed in Irish thus differs from those surveyed in Table 8.2 in that the
contact language (which is a genealogically, yet distantly, related language) does
have gender, albeit a very reduced type of system, with agreement patterns
restricted exclusively to the pronominal domain.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     209

In addition to clustering around language family edges, within Eurasia, outlier

languages (i.e., languages that have lost or developed patterns of gender agreement
as opposed to their closest relatives) appear to be often neighbours with each
other. For instance, of the four Nakh-Daghestanian languages that have com-
pletely lost gender—Aghul, Lezgi, Southern Tabasaran, and Udi⁸—all but one
(Udi) are spoken within the same territorial pocket, and at the border with
Azerbaijani (Turkic) speaking communities. Udi, on the other hand, does not
have close relatives as geographic neighbours, being surrounded mostly by
Georgian and Azerbaijani speaking communities (Nichols 2003: 303). Similarly,
in Northern India, the borrowing of gender agreement patterns from Indo-Aryan
languages is attested both in Shumcho and its close relative Jangshung.
These clustering effects within Eurasia align with the observations by Nichols
(1992, 2003): the stability of gender systems is a matter of both diachronic
persistence within language families and areal convergence within geographic
zones. Clustered distributions outside Eurasia cannot be veriﬁed on the basis of
the present data set, which is skewed towards Eurasian languages. While the
reasons for this bias in the structure of the language sample are explained in
section 8.3, this is an important issue that calls for further investigation.

8.6 The evolution of gender agreement systems

and language ecology

Two broad domains of sociohistorical variation are surveyed in the attempt

to unravel the types of external factors that may be associated with speciﬁc
developments in the evolution of gender agreement systems (see also section
8.3): (1) language contact dynamics, with particular attention to demographic
and prestige differences between the languages in contact; (2) language policies.
The ﬁndings are discussed in section 8.6.1 (demography and prestige dynamics),
section 8.6.2 (language policies), and section 8.6.3 (gender marking as identity
markers). More detailed information concerning contexts of change in each of the
sampled languages is given in Appendix 8.1.

8.6.1 Demographic factors in gender agreement loss

and emergence

In line with what has been observed in previous literature on contact-induced

change in the domain of agreement morphology, in the languages of the sample, a

⁸ Lezgi and Southern Tabarasan are not included in the sample.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

210   

clear connection between loss and emergence of gender agreement and language
contact dynamics is observed only in situations of prolonged contact and exten-
sive bilingualism. Let us illustrate some examples starting with the loss of gender
agreement.
Innovations in gender agreement morphology are attested throughout all Asia
Minor Greek dialects⁹ but Silliot Greek. As demonstrated by Karatsareas (2009,
2014) on the basis of historical-comparative evidence, the onset of these dia-
chronic developments in the domain of gender agreement precede the intensiﬁ-
cation of contact with Turkish. However, only in Cappadocian, the variety with
the longest and most intensive records of contact with Turkish, and the one which
was geographically most isolated from the rest of the Greek speaking communi-
ties, has gender agreement been completely lost. An illustration of noun-phrase
internal agreement morphology in Cappadocian Greek, as compared with
Standard Modern Greek, is given in example (12).

(12) Loss of gender agreement in Cappadocian Greek (Indo-European, Greek;

Karatsareas 2014: 79–80)
a. Axó Cappadocian
t spitçú ta ndix(u)s xtizména
.. house.. . wall. built.
‘The walls of the house (are) built.’
b. Standard Modern Greek
i tíçi ine xtixméni
.. wall.. be..3 built..
‘the walls are built.’
c. Sílata Cappadocian
to íra ívran to qapadiméno
. door. ﬁnd..3 3. closed.
‘They found the door shut.’
d. Standard Modern Greek
i pórta móno misí óra émene anixtí
.. door.. only half.. hour. stay..3 open..
‘The door stayed open for only half an hour.’

Axó Cappadocian has no traces of gender agreement. In (12a), the only

grammatical information that is encoded through agreement is number: the
noun for ‘wall’ is inﬂected as plural and triggers plural agreement on the past

⁹ The label Asia Minor Greek dialects is used in the literature to refer to a set of varieties of Greek
that, prior to the population exchange occurred between Greece and Turkey in 1923–4, used to be
spoken in Turkey (Asia Minor). Within the sample, Asia Minor Greek dialects are represented by
Cappadocian, Pontic, and Rumeic.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     211

participle. On the other hand, in Standard Modern Greek the noun for ‘wall’ is
both masculine and plural and triggers masculine and plural agreement on the
past participle (12b). Similarly, in (12c), from Sílata Cappadocian, the noun for
‘door’ only triggers singular agreement while in (12d), from Standard Modern
Greek, it triggers both feminine and singular agreement.¹⁰ As discussed in section
8.4.1, loss of gender agreement in Cappadocian Greek resulted from the general-
ization of the former neuter agreement pattern to all instances of masculine and
feminine agreement. As Karatsareas (2009: 224) puts it ‘these processes were
probably aided and accelerated by Cappadocian–Turkish bilingualism and subse-
quent cross-linguistic influence from Turkish’ (see also Karatsareas 2014: 99).
Moving on to contact-induced emergence of gender agreement patterns, in
Chamorro (an independent branch within the Austronesian language family),
Spanish feminine and masculine inflectional endings have been borrowed along
with borrowed adjectival modifiers. These inflectional endings are used by
Chamorro speakers to encode a ‘feminine vs. everything else’ type of opposition
in agreement with nouns of both Spanish and Chamorro origins. Inflecting
adjectives are exclusively of Spanish origin. Examples are given in (13).

(13) Patterns of gender agreement in Chamorro (Austronesian)

a. Feminine Gender (Stolz 2012: 106)
i bunit-a guaiyayon yan ti tulaikayon na palao’an
 nice- love. and  exchange.  woman
‘the beautiful, lovable and not exchangeable woman’
b. Non-Feminine Gender (Stolz 2012: 107)
i bibu, bunit-u yan guagan na kareta
 fast nice- and expensive  car
‘The fast, pretty, and expensive car’

While (13a) illustrates the use of feminine agreement with a noun denoting a
female entity, in (13b) the Spanish masculine agreement pattern is reanalysed as a
marker of ‘everything else’ as opposed to the feminine. It is worth noting that the
controller of non-feminine agreement in (13b), kareta ‘car’, is grammatically
feminine in Spanish. This testifies to the fact that even though the morphological
means through which gender-like distinctions are encoded in Chamorro are
copied from Spanish, their use is not, and rather hinges on Chamorro-specific
assignment rules. Chamorro is spoken in the Northern Marian Islands, which
counted as Spanish territory between 1665 and 1899. According to Stolz (2012:
104) the use and influence of Spanish in the Marian Islands was at its highest

¹⁰ Examples (12c) and (12d) are not translation equivalents. Nevertheless, they are sufﬁcient to show
absence of gender contrast in Cappadocian and presence of feminine gender marking in Standard
Modern Greek.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

212   

between the early and mid nineteenth century ‘when a strong current of
Hispanisation put the survival of Chamorro at stake’. As a result of this influence,
and in spite of the fact that Spanish almost completely disappeared from the
linguistic landscape of the Marian Islands after World War I, present-day
Chamorro retains traces of this heavy process of hispanization at the lexical
and, to a lesser extent, also at the grammatical level. The gender agreement
patterns exemplified in (13) are part of this heritage.
In addition to aligning with previously documented tendencies in contact-
induced loss and emergence of agreement, the distribution of the patterns of
change surveyed in this study suggests that, given similar contact situations in
terms of degree and duration of contact, one additional factor that appears to be
associated with the direction of the observed changes (towards either loss or rise
of gender agreement patterns) is the asymmetric nature of the relationship
between the languages in contact, both in terms of proportions of speakers and
prestige dynamics between populations in contact. In the following, I use the
notion of dominant language to refer either to the language that, in a given
contact situation, has the highest number of speakers, or to the one that is
more prestigious.
On the one hand, patterns of change in the domain of gender agreement
tend to proceed towards reduction and loss when the dominant language in a
given contact zone lacks grammatical gender or displays an already reduced
gender system. This is, for instance, the case of Cappadocian Greek under the
influence of Turkish. Gender reduction and loss can also occur as a result of
language shift when the number of non-native speakers of a given language
outnumbers the number of native speakers. This is what Thomason (2015)
defines as shift-induced interference. In the languages of the sample, a clear cut
instance of shift-induced interference is Tamian Latvian where loss of gender
is typically explained as one of the results of Livonian speakers shifting to
Latvian.
On the other hand, asymmetries in the structure of the population and/or in the
prestige dynamics between the languages in contact also account for the emer-
gence of gender agreement patterns under language contact. In such cases,
extensive borrowing in the nominal domain (involving both nouns and adnom-
inal modifiers) from dominant languages with grammatical gender may lead to
the emergence of marginal instances of gender agreement in languages that are
otherwise devoid of gender. In the languages of the sample, this is, for instance, the
case of Chamorro.
To the best of my knowledge, the effect of language dominance, as defined in
this chapter, on changes related to the evolution of gender systems had gone
unnoticed so far. In the languages sampled for this study, this effect can be
observed in nearly all the genealogical units for which the complete or near-loss
and the emergence of gender agreement patterns can reliably be classified as a
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     213

Table 8.3. Direction of change and asymmetries in the structure of the population
and/or prestige dynamics

Languages Change Demographically dominant/ Gender in the

More prestigious contact dominant lngs
lng(s)

Aghul, Udi (Lezgic) Loss Azerbaijani (Turkic), Absent

Georgian (Kartvelian)
Cappadocian Greek (Greek) Loss Turkish (Turkic) Absent
Karleby Swedish (North Near loss Finnish (Finnic) Absent
Germanic)
Igo (Ghana-Togo-Mountain) Loss Ewe (Gbe) Absent
Irish (Insular Celtic) Reduction English (West Germanic) Present
(reduced
system)
Tamian Latvian (Balto-Slavic) Loss Livonian and Estonian Absent
(Finnic)
Chamorro (Chamorro) Emergence Spanish (Romance) Present
Lekeitio Basque (Basque) Emergence Spanish (Romance) Present

result of contact-induced language change.¹¹ An overview is given in Table 8.3;

illustrations follow.
The standard variety of Igo, a Kwa language of the Ghana-Togo-Mountain
group¹² spoken in Ghana and Togo, displays a bipartite gender system based on
the semantic opposition between animate and inanimate nouns. The gender
system of standard Igo is heavily reduced compared with that of the related
language Sεlεε, also included in the sample. Sεlεε has a non-sex-based gender
system with eight distinguished agreement classes (Agbetsoamedo 2014). Gblem-
Poidi (2007: 57) mentions that the original gender system of Igo was based on
eleven, non-sex-based distinctions, which are still traceable in oral religious texts
(prayers to the ancestors). The historical developments leading to the complete
semantic restructuring of gender in Igo are, however, not described in detail.
In addition to the animacy-based restructuring, in spoken use and recent years,
the morphology of gender agreement in Igo has undergone massive erosion, and
signiﬁcant differences in the use of gender agreement patterns can be observed

¹¹ The Thebor languages Shumcho and Jamshung, where suffixes are productively used to encode
gender distinctions on adjectival modifiers of Indo-Aryan origin (as illustrated in example (8)), are
excluded from Table 8.3. This is because the specifics of the linguistic area in which the two languages are
spoken cannot be clearly characterized in terms of dominance and prestige dynamics between languages
in contact. Even though Hindi is the dominant lingua franca of the area, contact and multilingualism
between Tibetan and Indo-Aryan languages in this area go beyond the influence of Hindi, both
historically as well as in the dynamics of present-day interactions between neighbouring languages.
¹² The question of whether Ghana-Togo-Mountain languages constitute an independent genea-
logical grouping within Kwa or rather an areal grouping of genealogically more distantly related
languages is still debated. See Blench (2009) for a discussion.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

214   

across the last three generations of speakers. While older speakers still product-
ively mark gender distinctions through segmental and tonal morphology, the
younger speakers tend to omit segmental agreement markers and to retain their
tones as ﬂoating tones or to lose all traces of agreement marking altogether. This
contrast between conservative (segmental and tonal) and reduced agreement
morphology is illustrated in example (14), for the Animate and Inanimate gender.

(14) Patterns of gender agreement in Igo (Atlantic-Congo, Ghana-Togo-

Mountain; adapted from Gblem-Poidi 2007: 58)
a. Segmental agreement patterns for Animate Singular (obsolete)
Ùŋò ù-we ù-ma ù ɖa túsà
person ..-all ..- ..  be.afraid
ú kὲ
.. go.away
‘Any person who is afraid can just go away.’
b. Tonal agreement patterns for Animate Singular (current)
Ùŋò wèe màa ɖàa túsàá ú
person ..-all ..-  be.afraid ..
kὲ
go.away
‘Any person who is afraid can just go away.’
c. Segmental agreement patterns for Inanimate Singular (obsolete)
ɔti ki-le ani kì-màa ɔwε ɖàa mu
tree .- as .- you . see.over.there
aɖuù kì na gbɔ agaga
   break quickly
‘A tree as the one you see over there breaks easily.’
d. No agreement marking left for Inanimate Singular, neither segmentally
nor tonally (current)
ɔti le ani màa ɔwε ɖàa mu aɖuu na
tree . as . you . see.over.there  
gbɔ agaga
break quickly
‘A tree as the one you see over there breaks easily.’

Gblem-Poidi (2007) describes this ongoing pattern of change as one of the results
of the process of language attrition that Igo is undergoing under the inﬂuence of
Ewe, the dominant second language of the area, which is genealogically related to
Igo (they both belong to the Kwa family), but lacks grammatical gender.
According to the author, this development needs to be understood in the
context of a highly bilingual society, where grammatical structures in the minority
language (Igo) have begun to align with those of the dominant second language
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     215

(Ewe), and particularly so for the younger speakers. The onset of this fast
unfolding change, which goes hand in hand with a series of other innovations
affecting the vowel harmony system and the numeral system of Igo, can be dated
back to the beginning of the twentieth century (Honorine Gblem-Poidi, p.c.).
Similar sociohistorical contingencies are observed in all the other languages of
the sample for which reduction and loss of gender agreement are associated with
a situation of prolonged and intensive contact with dominant genderless lan-
guages (see Table 8.3). The pace at which these developments seem to rise and
spread varies, however, from language to language. In contact situations that
involve not only extensive, long-term bilingualism, but also language attrition,
rates of change are faster. This is, for instance the case of Igo and Irish where
reduction of gender agreement morphology is reported to have taken place
within the space of a couple of generations (Gblem-Poidi 2007; Frenda 2011).
In other contexts, diachronic developments fostering the reshufﬂing, reduction,
and, in the most extreme cases, the loss of gender agreement may spread over a
larger time span (as, for instance, in the case of Karleby Swedish), and the onset
of these patterns of language change may even precede the intensiﬁcation of
contact and the establishment of bilingual practices with the dominant gender-
less language (as in the case of Cappadocian Greek and closely related Asia
Minor Greek dialects).
Coming to the emergence of gender agreement patterns via borrowing, in at
least two of the relevant sampled languages (see Table 8.3), the use of these
constructions is reported to be subject to a considerable amount of intraspeaker
variation (Stolz 2012, with respect to Chamorro), and to be avoided in formal
registers (Jose Ignacio Hualde, p.c., with respect to Lekeitio Basque). The marking
of gender distinctions in adjectives borrowed from Spanish (e.g., altu/alta ‘tall./
tall.’) is widely attested across different varieties of Basque spoken by Basque/
Spanish bilinguals, and is reported to be rather frequent in spoken registers.
Lekeitio Basque (a variety spoken 53 kilometres away from Bilbao) is rather
unique among Basque varieties in that verbs derived from Spanish adjectives
through Basque word formation strategies maintain the overt coding of the
masculine/feminine distinction. This is shown in example (15).

(15) Deadjectival verbs indexing natural gender in Lekeitio Basque (Hualde et al.
1994: 109)
a. morenotu = ‘to become tanned (a male)’ derived from moréno ‘dark
(male)’
b. morenatu = ‘to become tanned (a female)’ derived from moréna ‘dark
(female)’
c. majotu = ‘to become handsome (a male)’ derived from májo ‘handsome
(male)’
d. majatu = ‘to become handsome (a female)’ derived from mája ‘hand-
some (female)’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

216   

Nevertheless, while rather common in informal speech, Lekeitio Basque

speakers tend to avoid the use of this construction in formal registers (Jose
Ignacio Hualde, p.c.).
Intraspeaker and intraregister variation in the use of borrowed patterns of
gender agreement could be explained by the fact that these constructions are
not pervasive enough in the grammar of the recipient languages, and thus not
established in use. On the other hand, the avoidance of these constructions in
formal registers could be interpreted as a means of signalling the speaker’s own
attitude towards the contact language in a speciﬁc communicative context (see
section 8.6.3 for further discussion). The issue of degree of productivity across
registers and idiolects is crucial to assess to which extent languages with borrowed
gender agreement patterns should be classiﬁed as gendered or genderless (see
Stolz 2012, 2015 for similar considerations).

8.6.2 Language contact, language policies, and

the expansion of gender agreement

As mentioned in section 8.4, rather language-speciﬁc processes of change are

responsible for the expansion of domains of gender agreement in the languages of
the sample. In at least two cases, sociohistorical factors pertaining to demography,
language planning, and standardization appear to be relevant to understand the
contexts in which gender agreement expansion unfolds. These are the mixed
language Michif and Makanza Lingala (Bantu).
Michif is a French- (Indo-European, Romance) and Cree- (Algic, Algonquian)
based mixed language spoken in Western Canada and adjacent areas in the US,
and whose survival is currently under threat (Bakker 2013: 158). It originated
during the ﬁrst decades of the nineteenth century (Bakker 2013: 158) from the
intermarriage between French Canadian fur traders and Amerindian women,
who, most likely, spoke Cree as a second language and had other indigenous
languages as their mother tongue. Michif has become a textbook case in the
literature on the dynamics of language contact and language evolution due to
the fact that it exhibits a sharp split between the grammar and lexicon of the noun
phrase (which, with the exception of demonstratives, are entirely French-based)
and the grammar and lexicon of the verb phrase (which are entirely Cree-based).
This impacts gender agreement morphology in that, as opposed to many other
contact languages with no grammatical gender or highly reduced gender systems,
Michif has a fully grammaticalized gender system that is more than just a copy of
the French or the Cree system. The gender system of Michif is, in fact, a combined
version of the French and Cree gender systems, with an opposition between
domains of encoding of gender distinctions that runs through the division
between French-based and Cree-based nominal and verbal morphosyntax.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     217

Gender agreement patterns and gender assignment rules within the noun phrase
are sex-based (as in French). Gender agreement patterns and gender assignment
rules on demonstratives and verbs are animacy-based (as in Cree). As a conse-
quence, all Michif nouns have two lexical genders: either masculine or feminine,
and either animate or inanimate.
The gender agreement system of Michif is illustrated in example (16). (16a)
exempliﬁes a noun that is masculine according to the French system and animate
according to the Cree system. (16b) exempliﬁes a noun that is feminine according
to the French system and inanimate according to the Cree system.

(16) Gender agreement in Michif (Bakker 1997: 92)

a. li za:br mIšItI-w
.. tree be.big-3
‘The tree is big.’
b. la bwεt mIša-w
.. box be.big.-3
‘The box is big.’

Example (17) illustrates the two other possible combinations, masculine-

inanimate (17a) and feminine-animate (17b). Unfortunately, it was not possible
to recover examples that would show these combinations across the same set of
agreement targets. While (17a) illustrates agreement on the definite article and the
demonstrative, (17b) exemplifies gender marking on the definite article and the
verb. For a more detailed discussion see Bakker (1997).

(17) Gender agreement in Michif (Bakker 1997: 87)

a. nIja u:ma mũ: papji
1 .. my.. paper
‘This is my paper.’
b. la žyma: ki:aja:w-e:w <img> pči pul<img>
... mare -have-.3!3 ... little foal
‘The mare had a foal.’

As shown in the examples, gender marking in Michif is characterized by

systematic mismatches in the gender values that are syntagmatically encoded by
different agreement targets for one of the same controller noun. As discussed in
section 8.4, according to the complexity metric proposed by Audring (2017),
mismatches of this type, whereby different targets inﬂect according to different
systems of gender distinctions, feature an increase of the descriptive complexity of
a gender system.
Lingala is a Bantu language descendant of the Bangala pidgin, which was
spoken in the Bangala state post on the northwestern banks of the Congo River
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

218   

and developed from the river trade language Bobangi. Kinshasa Lingala is the variety
of Lingala which is structurally closer to the two ancestor contact languages and
nowadays also the most prestigious variety spoken in the Democratic Republic of
Congo. The gender system of Kinshasa Lingala is rather atypical compared to that of a
‘canonical’ Bantu language. Kinshasa Lingala has only two genders, Animate and
Inanimate, there is no gender agreement within the noun phrase, but only pronouns
and verbs inflect for gender. Nouns are marked by number-sensitive prefixes, the
historical remnants of the original Bantu gender marking system, which is however no
longer productive at the level of agreement marking. Animate and inanimate gender
agreement as marked on subject prefixes is illustrated in examples (18) and (19).

(18) Animate gender agreement in Kinshasa Lingala (Atlantic-Congo, Bantu;

Meeuwis 2013: 30)
Mw-ana a-ko-kweya
1-child 3.--fall
‘The child will fall.’

(19) Inanimate gender agreement in Kinshasa Lingala (Atlantic-Congo, Bantu;

Bokamba 1977: 188)
a. Mu-nkanda e-ko-kweya
3-book 3.--fall
‘The book will fall.’
b. Ndako mɔkɔ e-ko-kweya
9.house one 3.--fall
‘The house will fall.’

This reduced animacy-based gender system is one of the many structural

heritages of the process of pidginization that is at the origin of Lingala and its
ancestor languages. On the other hand, the Makanza variety of Lingala, traditionally
spoken in the northwestern Congo region, has a gender system that responds to the
traditional proﬁle of Bantu gender systems. According to the description by Boeck
(1904), the Makanza Lingala gender system has seven distinguished sets of singular
and plural noun class pairings, and seven distinguished agreement classes. Different
kinds of attributive modiﬁers, relative markers, pronouns, and verbs agree in gender
with nouns. Gender assignment is non-sex-based. The gender system of Makanza
Lingala is illustrated in (20); the same nouns and verb phrases as in (18) and (19) are
used as a means of comparison between the two varieties.

(20) Gender agreement in Makanza Lingala (Atlantic-Congo, Bantu; Bokamba

1977: 187)
a. Mw-ánaa a-ko-kweya
1-child 1--fall
‘The child will fall.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     219

b. Mu-nkanda mu-ko-kweya
3-book 3--fall
‘The book will fall.’
c. Ndako yɔkɔ e-ko-kweya
9.house one 9--fall
‘The house will fall.’

As illustrated in (20), gender agreement patterns in Makanza Lingala match the

system of gender distinctions overtly coded on nouns and do not comply to
animacy. Thus, inanimate nouns taking two different noun class markings (as
in (20b) and (20c)) trigger two different patterns of gender agreement. This, at
first sight conservative, system of gender marking is not the result of a process of
natural language transmission, but rather the product of a massive campaign of
language planning and standardization, which was implemented by the Scheutist
missionaries arrived at Nouvelle-Anvers (the former Bangala Station) between
1901 and 1902. The missionaries acknowledged the importance of Bangala/
Lingala as the dominant language of communication in the area, but decided to
eliminate what they deemed as ‘rudimentary’ and ‘broken’ pidginized structures
(Meeuwis 2013: 25), and to make it look more like a ‘proper’ Bantu language. The
systematic reintegration of long lost gender distinctions and of their correspond-
ing inflections on a variety of word classes, which otherwise no longer inflected for
gender, was part of this standardizing endeavour. This engineered variety of
Lingala did not spread outside the northwestern Congo region, but is still spoken
in the area in the form designed by the missionaries ‘or in forms close to it’
(Meeuwis 2013: 26).
The gender system of Makanza Lingala features higher complexity than
the system of Kinshasa Lingala with respect to at least two of the dimensions
of the complexity metric by Audring (2017). First, gender assignment is
both formal and semantic, as opposed to Kinshasa Lingala where it is purely
semantic (see (9)). Second, gender marking appears in far more domains than
in Kinshasa Lingala, where it is restricted to the pronominal and verbal domain
(see (10)).

8.6.3 The symbolic function of gender agreement morphology

Kusters (2003: 38–9) classifies the functions that language fulfills in its contexts of
use under two main types: the communicative function, which encompasses
instances of language use as a means of depicting states of affairs and communi-
cating them to the hearer as clearly and as efficiently as possible, and the symbolic
function, which encompasses the use of language and language structures as a
means of communicating and reinforcing group identity and speakers’ attitudes.
That speakers may intentionally manipulate language structures and, more
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

220   

speciﬁcally, inﬂectional morphology to ‘establish, or emphasize, the group’s iden-

tity as distinct from that of other groups’ is also pointed out by Thomason (2015:
38–41), who further substantiates her claim with an example of symbolic use of
grammatical gender marking. In Uisai, a dialect of Buin (South Bougainville),
gender assignment is systematically reversed as compared with the other closely
related dialects: nouns that are masculine in any of the closely related dialects are
always feminine in Uisai, and vice versa. Thomason (2015) interprets this differ-
ence as a remarkable instance of how morphological distancing between closely
related languages can be used as a strategy to construct and index linguistic (and
group) identity.
Within the languages of the sample, there are at least two cases in which gender
agreement morphology in contact situations may be argued to fulfill a symbolic
function in the sense outlined above.
The first case, already discussed in section 8.6.1, is Lekeitio Basque where the
use of patterns of noun-adjective gender agreement borrowed from Spanish is
widely attested in informal speech, but carefully avoided in formal registers. It
seems plausible to read this tendency as a way through which, under specific
communicative contexts, speakers of Lekeitio Basque index ‘proper’ language use
by purposely distancing themselves from the influence of the contact language,
which is otherwise very pervasive especially in the speech of bilinguals. It remains
an open question whether this interpretation can be extended to the analysis of
intraspeaker and across-register variation in the other languages of the sample
where borrowed gender agreement patterns are attested.
The second possible instance of symbolic use of gender agreement morphology
in the languages of the sample brings us back to the Northwestern Iranian
languages of the Tatic subgroup Kelasi and Kafteji, which were briefly discussed
in section 8.4. Kelasi and Kafteji are very close geographic and genealogical
neighbours; they are spoken less than 12 kilometres apart from each other and
are completely mutually intelligible. One of the few striking structural differences
between the two languages is the presence of a gender system in Kafteji¹³ and its
complete absence (due to morphological erosion) in Kelasi. This was illustrated in
examples (2) and (3). Both Kaftejis and Kelasis speak, on average, four languages:
their own native variety plus Rudbari (Northwestern Iranian, Caspian), Gilaki
(Northwestern Iranian, Caspian), Persian (Southwestern Iranian), and in some
cases also Azerbaijani (Turkic) and, possibly, Talyshi (Northwestern Iranian,
Tatic). Interestingly, none of the contact languages has grammatical gender.

¹³ It is worth mentioning that, based on the classiﬁcation proposed in this chapter, Kafteji counts as
an instance of expansion of gender agreement (and not just retention). In Kafteji, within the domain of
verbal morphology, gender inﬂections have extended to all singular persons of all past tenses of
intransitive verbs, and all tenses of the verbs for ‘be’ (Stilo 2019: 49–65). Within the sample, similar
developments, which are not generalized to all Northwestern Iranian languages with gender, are also
attested in Eshtehardi.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     221

Even though contact with genderless languages can reasonably account for the
loss of gender in Kelasi, it remains to be explained why loss of gender does not
occur in Kafteji, too. One possibility would be to interpret the retention of gender
agreement in Kafteji as a sort of distancing strategy through which Kafteji speakers
set themselves apart from speakers of all neighbouring genderless languages (Stilo
2019: 75). While this is extremely difﬁcult to prove based on the data at hand,
I would argue that the connection between gender marking and identity marking
should not be a priori ruled out as one of the factors at play in the distribution of loss
and maintenance of gender agreement in Kelasi and Kafteji. Fieldwork data and,
ideally, metalinguistic data from sociolinguistic interviews with speakers of these
languages would be needed in order to investigate this possibility further.
While the two cases brieﬂy discussed in this section do not have any direct
bearing on a theoretical account of morphological and morphosyntactic complex-
ity in the domain of gender marking, they do provide relevant insights on how
patterns of gender marking can be manipulated by speakers under situations of
intense language contact. How these patterns of use may affect the evolution of
gender marking systems and their transmission over time is an open question,
which cannot be answered here.

8.7 Summary and concluding remarks

In this chapter, I presented the results of a crosslinguistic investigation of the

evolution of gender agreement systems and its ties to sociohistorical factors based
on a sample of thirty-six languages. Gender marking systems are typically highly
grammaticalized patterns of inﬂectional morphology, whose complexity varies
according to a multifaceted range of dimensions. Diachronic changes in the
domain of gender marking and their relevance to the overall complexity of a
gender system were discussed based on examples from the languages of the
sample. Associations between these patterns of change and a number of socio-
historical factors—ranging from demography to language policies and language
attitudes—were analysed in a qualitative fashion.
To my knowledge, this is the ﬁrst study which investigates the stability of
gender agreement systems under contact situations by means of a comprehensive
account of multiple patterns of language change. Contrary to previous literature,
where the focus has been restricted to the study of either gender loss (for the most
part) or gender emergence (more rarely), four diachronic developments—
reduction, loss, expansion and emergence of gender agreement—were examined
here. The study unravelled a number of tendencies about the crosslinguistic
distribution of these patterns of change and their sociohistorical ties.
First, it was found that within Eurasia, radical reduction, loss, and emergence of
gender agreement tend to cluster around language family edges, that is, in areas
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

222   

that are located at the crossroad between families characterized by the presence of
stable gender systems and families characterized by the absence of gender.
Second, both loss and emergence of gender agreement were found to occur in
linguistic areas with long historical records of intense language contact and
bilingual practices between diverse speech communities. However, the data
revealed that, given similar contact scenarios, asymmetries in the structure of
the bilingual population and/or in the prestige dynamics between the languages in
contact tend to favour one development more than the other. Loss of gender
agreement tends to prevail under circumstances in which the demographically
dominant and/or more prestigious language lacks grammatical gender. On the
other hand, borrowing of gender agreement patterns may be favoured when the
demographically dominant and/or more prestigious language has grammatical
gender.
Third, and last, the data suggest that gender marking, which has often been
described as a redundant and seemingly afunctional phenomenon in grammar
(Trudgill 1999; McWhorter 2007), may in fact have important ties to the way in
which speakers and speech communities construe their linguistic identity in
opposition to that of their neighbours. This appears to be even more evident
when, as in the case of Makanza Lingala, gender distinctions and gender agree-
ment patterns that have got lost as a result of natural language evolution are
reintegrated through policies of language planning and standardization.
In order to better frame the relevance of these results, some words of caution
are also in place. The data presented in this chapter are based on qualitative
observations of a small crosslinguistic sample and no claims are made here on the
quantitative signiﬁcance of these distributions. Moreover, observed associated
distributions between certain patterns of change in the domain of gender marking
and certain sociohistorical factors are not assumed to necessarily imply causation.
In some contexts—for example, the rise of gender agreement through borrowing
or the expansion of gender agreement through language planning—the causal
connection between grammatical changes and social factors is obvious. In other
cases—for example, the reduction and loss of gender agreement in the context of
highly multilingual linguistic areas or concomitantly with historical changes in the
prestige dynamics between languages in contact—the observed distributions often
result from a ﬁne interplay between language-internal dynamics of change and
aspects of the social history of a given speech community. In such cases, no causal
relation was posited, unless this was explicitly argued for in the sources and by the
experts consulted.
The purpose of this investigation was not to establish systematic causal rela-
tionships between the patterns of language change and the sociohistorical factors
in focus, but rather to carry out an exploratory analysis of possible associations
between the two, as observed through a small crosslinguistic sample and via
qualitative analysis. The tendencies unravelled with this procedure could be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     223

used as hypotheses to be further tested on larger datasets and with the support of
quantitative methods.

Appendix 8.1 Patterns and contexts of change

in the languages of the sample

The Appendix lists patterns and contexts of change in the domain of gender agreement
for each of the sampled languages. Languages are grouped based on the macroarea and
genealogical unit they belong to. Genealogical units within each macroarea and
individual languages within each genealogical unit are listed alphabetically.

Language Glottocode Pattern of change Context of change Source

AFRICA
Bantu (Atlantic-Congo)
Kinshasa ling1263 Reduction. Gender Kinshasa Lingala is Bokamba 1977;
Lingala agreement is based on the direct descendant Meeuwis 2013
animacy and of the Bobangi and
restricted to Bangala pidgins,
anaphoric pronouns which, as typical of
and argument contact languages,
markers on verbs. displayed heavily
reduced gender
agreement
morphology.
Makanza ling1269 Expansion. Seven The expansion of de Boeck 1904;
Lingala non-sex-based gender agreement Bokamba 1977;
agreement patterns morphology was Meeuwis 2013
have been implemented via
reintroduced. Gender language planning.
agreement is
extensively marked
within the noun
phrase, on various
types of pronouns,
relative constructions
and verbs.
Ghana-Togo-Mountain (Atlantic-Congo)
Sεlεε sele1249 Retention NA Agbetsoamedo 2014
Igo igoo1238 Reduction, via the Fast-unfolding Gblem-Poidi 2007;
erosion of segmental change, language p.c.
gender agreement attrition due to
morphology. pressure from Ewe.
Ikposo ikpo1238 Loss. No traces of The language is in Soubrier 2013; Ines
gender agreement close contact with Fiedler p.c.
morphology are left Ewe, and is also a

Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

224   

Language Glottocode Pattern of change Context of change Source

in the language; some language of wider

remnants of preﬁxal communication
gender marking on within the Ghana-
nouns. No Togo-Mountain
information about the group. This might
diachrony of gender have played a role in
loss. the loss of gender.
AUSTRALIA
Central Gunwinyguan (Gunwinyguan)
Kunwinjku gunw1252 Retention NA Evans 2003
Kundjeyhmi gunw1252 Reduction. The context of Evans 2003
Neutralization of the change is not clear.
opposition between
Neuter and Vegetable
gender (in favour of
the Vegetable
gender).
Kune gunw1252 Loss. Complete loss of The context of Evans 2003
gender distinctions, change is not clear.
with the masculine What is known is
gender agreement that gender loss
pattern being occurs all across the
generalized all over eastern Central
the system. Gunwinyguan
varieties (like Kune).
EURASIA
Balto-Slavic (Indo-European)
Latvian latv1249 Retention NA Balode & Holvoet
2001; Andra
Kalnaca, p.c.
Tamian latv1249 Loss. Loss of gender is The phenomenon is Balode & Holvoet
Latvian the result of unanimously 2001; Koptjevskaja-
morphophonological described as a result Tamm & Wälchli
erosion and of substratum 2001; Thomason
agreement interference from 2015; Wälchli 2017
redistribution. Livonian and
Masculine agreement Estonian L1 speakers
patterns take over and of Latvian as an L2.
gender distinctions
are neutralized on a
wide range of targets.
Variation is found
across different
Tamian varieties.
Basque
Standard basq1248 Lack NA Hualde & de Urbina
Basque 2003; p.c.
Lekeitio bisc1236 Emergence. Feminine Contact-induced Hualde et al. 1994;
Basque (Bisqay) and masculine gender change. The p.c.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     225

inﬂections are construction is

borrowed through the avoided in formal
borrowing of nouns registers.
and adjectival
modifiers from
Spanish.
Greek (Indo-European)
Standard mode1248 Retention NA
Modern
Greek
Pontic Greek pont1253 Reduction. Animacy- Prolonged isolation Karatsareas 2009,
based semantic from mainland Greek 2014 (Q)
agreement spreads: varieties; innovation
masculine and in the gender system
feminine inanimate precedes
nouns trigger neuter intensification of
agreement on all contact with Turkish.
targets except
prenominal definite
articles.
Rumeic mari1411 Reduction. Neuter Rumeic is a Pontic Karatsareas 2009,
Greek agreement with Greek variety spoken 2014
inanimate nouns is in Crimea. The
generalized to all gender agreement
agreement targets; the system of Rumeic can
gender system is be seen as a direct
completely offspring of the
semanticized: Pontic gender
Masculine = male agreement system,
entities; Feminine = where semantic
female entities; agreement is still
Neuter = Inanimate. syntactically
constrained.
Cappadocian capp1239 Loss. Neuter The Asia Minor Karatsareas 2009,
Greek agreement takes over Greek variety with 2014
the gender agreement the longest and most
system, leading to the intense history of
complete loss of contact with Turkish.
gender distinctions. Innovations in
several
morphosyntactic
domains along with
gender.
Insular Celtic (Indo-European)
Irish iris1253 Reduction. Semantic Fast-unfolding Frenda 2011
agreement spreads in change; language
the domain of attrition under the
anaphoric pronouns: influence of English;
masculine pronouns convergence with the
are used for English pronominal
anaphoric reference gender system.

Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

226   

Language Glottocode Pattern of change Context of change Source

to grammatically
feminine, but
semantically
inanimate, nouns.
Irish (Ros conn1243 Retention NA Frenda 2011
Much)
Khasian (Austroasiatic)
Khasi khas1269 Expansion. Gender Language-internal Anne Daladier, p.c.
marking on development
pronouns, deictic
bases, pre-nominal
clitics, verbs.
Lyngngam lyng1241 Retention. Gender NA Anne Daladier, p.c.
marking on personal
pronouns and deictic
bases only.
Pnar pnar1238 Expansion. Gender Language-internal Anne Daladier, p.c.
marking on personal development
pronouns, deictic
bases, pre-nominal
clitics.
Lezgic (Nakh-Daghestanian)
Archi arch1244 Retention NA Michael Daniel, Nina
Dobrushina (Q)
Aghul aghu1253 Loss. No information Within Nakh- Nina Dobrushina
about the diachrony Daghestanian, gender (Q)
of gender loss. loss is restricted to
the Lezgic branch.
Genderless Lezgic
languages tend to be
neighbours with each
other and to share a
long-term history of
contact with
genealogically
unrelated languages
that also lack gender
(Azerbaijani;
Georgian)
Udi udii1243 Loss. No information Udi does not have Nichols 2003;
about the diachrony any genealogically Wolfgang Schulze
of gender loss related neighbours, (Q).
but it is surrounded
by languages that
lack gender
(Azerbaijani and
Georgian).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     227

North Germanic (Indo-European)

Elfdalian dic (ISO) Retention NA Åkerberg 2012,
Östen Dahl (Q)
Standard swed1254 Reduction. The Language internal Duke 2010; Mikael
Swedish Masculine/Feminine process of erosion in Parkvall (Q)
distinction is the domain of
neutralized on nominal
agreement targets morphology.
within the noun
phrase. The
distinction is retained
on anaphoric
pronouns.
Karleby oste1241 Reduction. Gender Gender reduction is Hultman 1894;
Swedish marking has been lost described in the Huldén 1972
via literature as a
morphophonological possible result of
erosion and substratum
agreement interference from
redistribution. Finnish L1 speakers
Gender inflections are who spoke Swedish
maintained on as an L2.
definite articles and
personal pronouns,
but only for nouns
denoting human
beings.
Northwestern Iranian (Iranian, Indo-European)
Eshtehardi esht1238 Expansion. Language internal Stilo 2019;
Grammaticalization development. Yarshater 1969
of gender inflections
on 1st and 2nd person
singular forms of
copula verbs.
Kafteji kaba1276 Expansion. Language internal Stilo 2019; p.c.
Grammaticalization development.
of gender inflections Contact languages
on copula verbs and are all genderless;
past forms of possibility that
intransitive verbs. gender marking is
perceived as identity
marker cannot be
ruled out.
Kelasi kaba1276 Loss. Highly multilingual Stilo 2019; p.c.
Morphophonological community. All
erosion of gender contact languages but
marking on nouns Kafteji are genderless.
and agreement Contact-induced
targets. change is a possible
explanation.

Continued
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

228   

Language Glottocode Pattern of change Context of change Source

Thebor (Bodic, Tibeto-Burman)

Shumcho shum1243 Emergence. Feminine Contact-induced Huber 2011,
and masculine gender change. Long-term Christian Huber, p.c.
inflections borrowed contact,
through the multilingualism with
borrowing of nouns Indo-Aryan
and adjectival languages.
modifiers from
contact Indo-Aryan
languages.
Jangshung jang1254 Emergence. Feminine Contact-induced Huber 2011,
and masculine gender change. Long-term Christian Huber, p.c.
inflections borrowed contact,
through the multilingualism with
borrowing of nouns Indo-Aryan
and adjectival languages.
modifiers from
contact Indo-Aryan
languages.
NORTH AMERICA
Mixed Language
Michif mich1243 Expansion. Noun- Michif is a mixed Bakker 1997
phrase sex-based and language; patterns of
verb-phrase animacy- gender agreement are
based gender. a combination of the
gender agreement
systems of French
and Cree.
PAPUNESIA
Chamorro (Austronesian)
Chamorro cham1312 Emergence. Feminine Contact-induced Stolz 2012
and masculine gender change. Radical
inflections borrowed hispanization
through the between early and
borrowing of nouns mid- nineteenth
and adjectival century.
modifiers from
Spanish.
Mek (Nuclear Trans New Guinea)
Nalca nalc1240 Emergence Language-internal Wälchli (2018)
development.
Eipo eipo1242 Lack NA Wälchli (2018)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     229

Acknowledgements

I am very grateful to one anonymous reviewer, Peter Arkadiev, Francesco Gardani, and
Maria Konoshenko for constructive criticism and insightful comments. I am also
thankful to Kaius Sinnemäki for reading and commenting through previous versions
of this chapter. Any remaining mistakes and shortcomings are my sole responsibility.
This work is the outcome of a project on ‘Gender systems, grammatical complexity
and stability: A crosslinguistic study of language pairs’, funded by the Wenner-Gren
Foundations. Later ﬁnancial support from the Anna Ahlström and Elllen Terserus’
Foundation is also gratefully acknowledged. The data set examined in the chapter is
the same as the one used by Di Garbo & Miestamo (2019).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

9
Morphological complexity, autonomy,
and areality in western Amazonia
Adam J. R. Tallman and Patience Epps

9.1 Introduction

Amazonian languages have presented us with numerous exceptions to wider

typological and theoretical generalizations. Yet the fact that many such features
appear relatively unexceptional within the Amazonian context itself calls up further
questions, particularly regarding how they may be developed and maintained across
space and time. The morphological profiles of the region’s languages—and the ways
in which they may be characterized as ‘complex’—are a case in point.
Payne (1990) was among the first to draw attention to the conceptual challenges
posed by the verbal morphologies of many Amazonian languages. She referred in
particular to the flexible positioning of verbal affixes (with corresponding changes in
scope) and to the repetitions of the same suffix in a given string (where one is typically
more lexicalized with a root and another is acting more productively with respect to
the rest of the string). Payne observed that verb structure in these languages is best
understood in terms of ‘successively embedded layers of morpho-semantic structure’
(1990: 231), and raised an intriguing question: ‘Is morphology in these languages
more akin to syntax in an Indo-European language?’ (1990: 234).
This question, which we take as the focus of this chapter, bears directly on the
idea of morphological autonomy—that morphology is phenomenologically dis-
tinct from syntax (Anderson 1992, 2015a; Aronoff 1994; Booij 1997; Maiden 2005;
Blevins 2016b). As the Introduction to this volume points out, a robust cross-
linguistic evaluation of this assumption is non-trivial, and the arguments for
morphological autonomy themselves depend crucially on criteria that relate to
measures of morphological complexity, such as the existence of allomorphy,
deviations from biuniqueness, and principles of morphological combination
(Aronoff 1994; Booij 1997; Anderson 2015a). Many Amazonian languages
have been described as highly polysynthetic, that is, as having a high morpheme
per word ratio (e.g., Payne 1990; Dixon & Aikhenvald 1999), a feature often
characterized as indicating morphological complexity par excellence. However,
they contrast with typologically more ‘canonical’ cases of polysynthesis, in which

Adam J. R. Tallman and Patience Epps, Morphological complexity, autonomy, and areality in western Amazonia
In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020).
© Adam J. R. Tallman and Patience Epps.
DOI: 10.1093/oso/9780198861287.003.0009
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      231

the word-internal organization of morphemes (morphotactics) can be easily

distinguished from the word-external or phrase-level organization of words
(syntax) (e.g., Sapir 1921; Mithun 1996, 1998; Anderson 2015a; Woodbury
2017), as observed by Payne. At the same time, our understanding of morphology
and its potential for autonomy is challenged by a lack of invariant or convergent
criteria for determining word segmentation and status across languages (Russell
1999; Schiering et al. 2010; Haspelmath 2011; van Gijn & Zúñiga 2014; Bickel &
Zúñiga 2017). As noted by Zúñiga (2017: 112), with reference to Boas’s observa-
tions a hundred years before, challenges to a clear morphology-syntax divide are
particularly evident in indigenous languages of the Americas, which ‘suggest that
morpheme types are less clear-cut than ordinarily assumed, and that words are
more elusive than customarily thought.’
Proponents of morphological autonomy recognize that some elements may be
only partially integrated into the morphological system (see Maiden 2013;
Blevins 2016b), although their theoretical importance is downplayed compared
to broader patterns of differentiation between morphology and syntax. Yet the fact
that such deviations are possible raises the question as to whether the morpholo-
gies of some languages are more or less autonomous from syntax than others—and
whether large-scale regions may share similar areal profiles in this domain. In this
chapter, we propose that the concept of morphological autonomy may be recon-
ceptualized as a typological index on which languages can vary, analogous to the
traditionally recognized indices of synthesis and fusion (Sapir 1921; Greenberg
1960), and that patterns of crosslinguistic variation in this domain may be areally
driven.
We consider the possibility of regional variation here with reference to western
Amazonian languages. We focus specifically on this region in light of its extensive
linguistic diversity, coupled with recent strides in building up our descriptive
knowledge of its languages, and the fact that the earlier observations regarding
Amazonian morphological profiles cited above made particular reference to this
region. Recent work supports a typological division between eastern and western
South America (Muysken et al. 2014), as suggested earlier by Payne (1990), with a
stronger tendency toward complex verbal morphology in the western zone, as well
as elaboration in particular domains, such as nominal classification and eviden-
tiality (see Epps & Michael 2017 for discussion). However, the broad east-west
division in fact suggests that Andean languages pattern much like those of western
Amazonia in many respects, such that western Amazonian languages are unlikely
to form an exclusive ‘type’ in their own right. We leave the question of defining the
precise geographic range of the phenomena we address here for future work and
more extensive surveys.
Distinctions between morphology and syntax hinge on historical processes
that derive new morphology both from syntax (Givón 1971) and from other
morphology (cf. Anderson 2015a), and which can be subsumed under the label
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

232  . .    

of grammaticalization. As Anderson (2015a: 24) points out, morphological

complexity—including the features that distinguish morphology from syntax—
accumulates over time. Crucially, there is a gradual component to these processes,
such that their early stages may be characterized by a gray area within which
morphology and syntax are closely associated. It is therefore plausible that where a
language is undergoing relatively rapid and pervasive processes of grammatical-
ization, these would foster fuzziness in the syntax-morphology distinction.
However, the opposite is also true: if a language exhibits very productive processes
of compounding and incorporation, for example, such that the same morphosyn-
tactic slots can be occupied by both open-class and closed-class elements, then this
would facilitate reanalysis and thus a rapid cycle of syntax to morphology.
Likewise, fluidity between morphology and syntax would make it easier for
innovative morphological forms to undergo extension to new morphosyntactic
contexts, thereby delinking from the constructions in which they originated, and
could even make it possible for such forms to maintain more and less tightly
bonded instantiations concurrently over time.
Payne (1990: 218) argues for such a historical interpretation of Amazonia’s
typologically unusual morphological profiles: It is the prevalence of compounding
and incorporation in the region, she proposes, that feeds ‘the gradual nature of the
development of bound derivational morphology from lexically free verb roots’
(1990: 230). Zúñiga (2017: 124) makes a similar observation, referring to mor-
phological forms in languages like Mapudungun (Araucanian) that ‘consist of
root-like elements of verbal origin that come in different forms and show different
degrees of grammaticalization’. Hup (Naduhupan) exemplifies a morphological
template that facilitates a rapid turnover from lexical root to morphological affix:
It combines a highly productive process of verb compounding with a series of
suffix ‘slots’, such that the penultimate slot(s) can be occupied by either verb roots
or suffixes (Epps 2008). Within the compound structure, final verbs develop
auxiliary functions, and from there are easily reanalysed as suffixes. Example (1)
illustrates this process: in constructions like this one, the form -hɔ̃- can be
understood as either the verb ‘make noise’ (a) or the non-visual evidential (b).
While in other constructions ambiguity is prevented by these morphemes’ asso-
ciations with different morphosyntactic slots, the grammaticalization of the evi-
dential from the verb must have arisen in contexts like this one (Epps 2005).

(1) Hup (Naduhupan; Epps 2008: 650)

himǔn=hɔb d’oʔ-d’әh-ʔáy hám, yúp
paxiuba.tree=hollow take-send-. go. that
nɔh-kәd-hi-hɔ̃́-ǎn
fall-pass-descend--
‘Go fetch a paxiuba-tree-hollow, a. that one that just fell, making noise.’
b. . . . that one that just fell (I heard it).’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      233

An understanding of the historical processes evident in western Amazonian

languages must also make reference to language contact. The prevalence of
phenomena like compounding and incorporation that facilitate grammatical-
ization is almost certainly associated with diffusion, and in various cases the
elaboration of particular domains and the development of morphological material
to fill them are also demonstrably contact-driven (see, e.g., Aikhenvald 2002; Epps
2005; Seifart 2011). While our understanding of the effects of contact on
Amazonian languages is still very limited, there is no question that it has played
a significant role (and that much of it has likely involved children as learners,
though the degree to which adult acquisition figured in traditional contexts is not
well understood; cf. McWhorter, Chapter 10, this volume). There is considerable
evidence for diffusion within local zones; the Vaupés region is the best explored of
these, showing radical restructuring of multiple languages’ grammatical systems
(Aikhenvald 2002; Epps 2007a; Stenzel & Gomez-Imbert 2009; Stenzel 2013a;
inter alia), but evidence for contact has also been observed in the Guaporé-
Mamoré (Crevels & van der Voort 2008) and other areas, such as the
neighbouring Guianas (Carlin 2006) and Gran Chaco (Comrie et al. 2010;
Campbell 2012). Interestingly, much of this diffusion has involved the restructur-
ing of grammatical systems, but is accompanied by relatively low levels of direct
lexical borrowing (Epps forthcoming). More widespread diffusion is also evident;
proposals of widely shared typological characteristics have made reference to mor-
phological complexity, among other features, as noted above (Derbyshire 1987;
Derbyshire & Payne 1990; Payne 1990; Dixon & Aikhenvald 1999: 8; Muysken
et al. 2014). Further evidence of large-scale contact and areal effects includes phono-
logical patterns (Michael et al. 2014), widespread Wanderwörter and calques (Epps
2013; Haynie et al. 2014), and discourse practices (Beier et al. 2002).
In this chapter, our investigation of Amazonian morphological patterns in the
context of language contact and change is informed by Anderson’s (2015a)
schematization of morphological complexity, as summarized in Table 9.1.
Anderson defines two major categories, (1) system complexity and (2) complexity
of exponence (cf. Ackerman & Malouf’s (2013) partially comparable distinction
between ‘enumerative complexity’ and ‘integrative complexity’ referenced else-
where in this volume and discussed in the final Chapter 13 by Dahl). Each of these
categories can be evaluated on a number of parameters, and related to the

Table 9.1. Anderson’s (2015a) schematization of morphological complexity

System Complexity Exponence Complexity

a. Number of elements in a system a. Realization of individual elements

b. Number of afﬁxes per word b. Inter-word relations
c. Principles of morphological combination c. Allomorphy
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

234  . .    

observations concerning Amazonian languages pointed out above. In the domain

of system complexity, (1a) the number of elements in a given system refers to the
overall number of morphological affixes, etc., in a language, but can also be
understood with reference to a given functional domain, such as the elaborate
classifier inventories mentioned above. Parameter (1b), the number of affixes per
word, relates to the observations concerning Amazonian polysynthesis and chal-
lenges to determining woodhood status. Finally, (1c) principles of morphological
combination has to do with the degree to which morphological templatic structure
is arbitrary and/or distinct from that followed by the language’s syntax, and bears
directly on the observations relating to variable morpheme ordering and scope
brought up by Payne (1990). With respect to complexity of exponence, relevant
parameters include (2a) complexity in realization of individual elements (i.e.,
deviations from biuniqueness, e.g., circumfixes); (2b) complexity in inter-word
relations, involving the number and variety of paradigmatic patterns (relating to
phenomena such as syncretism and defectiveness), and (2c) complexity of allo-
morphy. These parameters relate to the observation that Amazonian languages
tend toward a highly agglutinating profile with relatively little fusion (Payne 1990;
Dixon & Aikhenvald 1999).
In the following discussion, we begin from the perspective of system complexity
by exploring a set of functional domains that tend to display morphological
proliferation across unrelated western Amazonian languages—nominal classifica-
tion, tense, evidentiality, and valence-adjusting (section 9.2). These domains were
chosen in light of their tendency to exhibit relative internal coherence (paradig-
maticity) coupled with elaborate repertoires, and to be reasonably well-described
for many languages of the region.¹ For each of these domains, we investigate how
the number of elements interacts with other measures of morphological complex-
ity and wordhood status, particularly regarding principles of morpheme combin-
ation and bondedness. We observe that high system complexity across these
grammatical domains tends to correspond to morphology-syntax fuzziness (or
analytic indeterminacy), and that in many cases this association can be linked to
historical processes of grammaticalization and language contact. In the following
section 9.3, we move from this relatively anecdotal perspective to a more system-
atic approach, in which we address morphological behaviour over a sample of
eleven western Amazonian languages. Here we focus primarily on exponence
complexity (EC), as realized by three measures: multiple expression (relating to
Anderson’s parameter of complexity in realization of individual elements, focus-
ing on infixes and circumfixes), number of allomorphs, and the presence of
suppletion (relating to complexity of allomorphy). We establish a metric for
gauging relative EC across languages, and compare this to criteria of wordhood

¹ However, we note that other domains are also interesting candidates for such an investigation,
such as associated motion (Guillaume 2016). We hope to address more of these in future work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      235

status (bound status, prosodic dependence, and contiguity). The correlation

between these two parameters provides a principled means of assessing relative
morphological autonomy across the languages.
Our exploration of these questions leads us to propose that western Amazonian
languages tend to conform to a similar general proﬁle of morphological complex-
ity, which is closely associated with a fuzzy distinction between morphology and
syntax. This morphology-syntax indeterminacy can itself be argued to both result
from and facilitate processes of grammaticalization in these languages. These
processes in turn are associated with language contact, which may both directly
impel the restructuring of particular grammatical systems, and shape broader
areal typological preferences for productive compounding and incorporation,
which feed the rapid generation of new morphological material.

9.2 System complexity in western Amazonia

In this section, we investigate four grammatical domains in which extensive

morphological proliferation can be observed across unrelated languages, such
that the category is composed of a large or even open set of coded distinctions.
For each of the domains we consider here—nominal classiﬁcation (section 9.2.1),
tense (section 9.2.2), evidentiality (section 9.2.3), and valence-adjustment (section
9.2.4)—we also consider the degree to which they exhibit a fuzzy morphology-
syntax distinction, with a particular focus on principles of morpheme combin-
ation and their relationship to wordhood status. As noted above, we take the
perspective that a relatively loose division between morphology and syntax may be
associated with the elaboration of morphological inventories in two ways: by
facilitating the grammaticalization of new elements on the one hand, and by
reﬂecting the ‘youth’ of those forms on the other in their relative regularity and
transparent association with lexical elements. Accordingly, we consider evidence
for the elaboration of each of these domains via grammaticalization, and the role
of contact as a potential driver of these processes. We note that our discussion in
this section is relatively anecdotal, and is guided largely by the availability of in-
depth descriptive and historical information regarding these languages, which is
still relatively limited across the region. A more systematic perspective is provided
in section 9.3.

9.2.1 Nominal classiﬁcation

As we observe in section 9.1, nominal classiﬁcation systems in western

Amazonian languages exhibit a high degree of morphological elaboration.
Characteristics associated with this complexity give these systems a distinct
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

236  . .    

typological proﬁle in comparison to systems encountered elsewhere in the world,

and challenge their identification as either canonical noun class systems (i.e., more
morphological) or noun classifier systems (i.e., more lexico-syntactic; Payne 1987,
Derbyshire & Payne 1990, Aikhenvald 2000, Grinevald & Seifart 2004, Seifart &
Payne 2007, Krasnoukhova 2012).
A noteworthy feature of western Amazonian classifier systems is their tendency
toward very large or open sets of forms, as can be seen in Tatuyo:²

(2) Tatuyo (East Tukanoan; Gomez-Imbert 2007: 409)

-ro/to/~do ‘general’ -~páí ‘blade’
-ro/to~do ‘oblong, concave’ -kaa ‘row’
-ɨ/kɨ ‘cylindrical’ -~toó ‘bunch’
-a/ka ‘rounded’ -bári ‘parallel rows’
-wɨ ‘tubular’ -rape ‘cylindrical container’
-wo ‘heap’ -rɨkɨ ‘cutting’
-~we ‘ﬁliform’ -ope ‘hole’
-~wa ‘path’ -wíi ‘house’
-ja ‘river’ -peta ‘port’
-~jo ‘palm’ -wéhe ‘manioc garden’
-rɨ ‘pot’ etc. . . .

As is typical for such large systems in western Amazonian languages, some

classifiers in the Tatuyo inventory are formally identical to semantically parallel
lexical items (e.g., -wíi ‘:house’ and wíi ‘house’; labeled ‘repeaters’ by, e.g.,
Aikhenvald 2000) while others are semantically more abstract and formally less
clearly associated with a putative lexical source (e.g., -wɨ ‘tubular’, cf. pií ‘basket’).
As Gomez-Imbert points out, these differences point to a system whose members
have entered it at different times and have accordingly undergone different
degrees of grammaticalization. Such languages can be understood as having a
constructional classifier ‘slot’ that facilitates the absorption of new nouns into the
classifier inventory.
A further characteristic of classifiers in these languages is their tendency to
perform both derivational functions (as in what are understood as more typolo-
gically canonical classifier systems) and agreement functions (as in canonical
noun class/gender systems), thus making their assignment to one or the other
typological category problematic (see, e.g., Grinevald & Seifart 2004). Where
agreement functions are at play, ‘repeater’ (or otherwise more lexically transpar-
ent) classifiers are typically less likely to co-occur with the corresponding lexical
noun than are more distinct and semantically abstract classifiers. This tendency
can be seen in languages like Movima (see also, e.g., Seifart 2005: 94 for Bora),

² The preceding tilde in the Tatuyo forms indicates morpheme-level nasalization.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      237

although the overt occurrence of the classiﬁer is possible in both contexts, as

illustrated in (3a) (with the repeater classiﬁer ‘animal’) and (3b) (with the more
abstract classiﬁer ‘round’):

(3) Movima (Isolate; Haude 2006: 210–11)

a. chammo-maj-poy di’ po~poy-kwa
bush--animal  ~animal-
‘Animals which (are) forest animals.’
b. jayna don-ba:-ni as maropa
 rot-:- . papaya
‘The papaya is already rotten.’

Another typologically noteworthy feature of western Amazonian classiﬁer

systems is their capacity for flexible assignment, such that the same set of
classifiers can occur in multiple morphosyntactic environments (e.g., with
numerals, demonstratives, verbs, etc.).³ Such systems have been termed ‘multiple’
or ‘multifunctional’ classifier systems (by Aikhenvald 2000 and Krasnoukhova
2012, respectively; see also Payne 1987). Example (4) from Tariana illustrates the
multifunctional potential of such a system:

(4) Tariana (Arawakan; Aikhenvald 2000: 204)

ha-dapana pa-dapana na-tape-dapana
.-: one-: 3-medicine-:
na-ya-dapana hanu-dapana heku
3--: big-: wood
na-ni-ni-dapana-mahka
3-make--:-..
‘This one big hospital of theirs has been made of wood’

The features of western Amazonian classification systems reviewed here can all be
understood as outcomes of a relatively porous boundary between morphology
and syntax, and highlight its relationship to processes of grammaticalization. The
existence of a large or open class of elements and flexible class assignment
are relatively syntactic characteristics, while the agreement function and the
presence of semantically abstract, phonologically reduced elements within the
inventory are more morphological. The phenomenon of ‘repeater’ classifiers, by
which virtually any noun may fill a classifier slot, facilitates the elaboration of the

³ We note that our use of the term ‘ﬂexible assignment’ in this context should not be confused with
an alternative usage found in the literature on grammatical gender, referring to the possible assignment
of more than one gender to a given noun in order to convey different construals of the referent; for
example, feminine gender with inherently masculine inanimate nouns in Berber communicates a
diminutive meaning. Thanks to Francesca di Garbo for bringing this point to our attention.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

238  . .    

system by easily sucking new elements into the classifier inventory. Once within
the inventory, they begin to undergo grammaticalization, and over time become
more morphological in their form and behaviour. Similarly, the close association
between the derivational function of classifiers and noun compounding (particu-
larly involving part-whole relationships like ‘banana-leaf ’) provides a route for
classifier systems to emerge via the grammaticalization of generic ‘bound nouns’,
as argued by Epps (2007b) for Hup (Naduhupan) and Payne (2007) for Yagua
(Peba-Yaguan; see also Facundes 2000 for Apurinã (Arawakan) and Ospina 2002
for Yuhup (Naduhupan)).
These processes of grammaticalization are highly sensitive to contact. As
Aikhenvald (2000: 383) points out, ‘the more lexico-syntactic the noun categor-
ization is, the easier it is to diffuse’. The development of a classifier system from
noun compounding in Hup and Yagua is attributed in both cases to contact, from
Tukanoan and Boran languages respectively (Epps 2007b; Payne 2007). That
diffusion has been widespread is strongly suggested by the similarities across
western Amazonian classifier systems, and their contrast to canonical systems
elsewhere in the world. In the northwest area, Seifart & Payne (2007: 384–5) note
the ‘close correspondences of—sometimes very specific—nominal classification
structures across Tucanoan, Witotoan, Peba-Yaguan, and some Arawak
languages . . . [pointing to] widespread processes of areal diffusion,’ and van der
Voort (2005) makes a similar observation for the languages of the Guaporé-
Mamoré region in the southwest (see also Krasnoukhova 2012: 263). Similar
trends can be observed beyond the western Amazonian region as well. For
example, the occurrence of possessive classifiers (and particularly their association
with domesticated animals)—otherwise relatively rare in the Americas—is iden-
tified as an areal feature of the Chaco region (Comrie et al. 2010; Campbell 2012;
Ciucci 2014), and Aikhenvald (2000: 383, citing Aikhenvald & Green 1998)
likewise observes the diffusion of possessed classifier constructions in the north-
eastern part of South America, from Cariban into North Arawakan languages. We
may also compare the contact-driven emergence and loss of gender systems in
languages in Eurasia, as explored by Di Garbo (Chapter 8, this volume).
Further evidence of contact in classifier systems involves the more fine-
grained restructuring of existing systems to affect specific semantic values and
morphosyntactic patterns. In the Vaupés region, for example, Gomez-Imbert
(1996) demonstrates how Baniwa (Arawakan) influence has caused Cubeo (East
Tukanoan) to change its strategies for classifying animate entities, from prioritiz-
ing animacy (the Tukanoan pattern) to shape (the Arawakan pattern), whereas
Aikhenvald (2002) reports that Tariana (Arawakan) has experienced exactly the
opposite change under the influence of Tukano (East Tukanoan).
Finally, while the examples above have all dealt with the contact-driven restruc-
turing of native material to fit system-level templates, western Amazonian classifier
systems also provide evidence of direct borrowing of classifier forms. This
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      239

Table 9.2. Similar classiﬁer forms in Guaporé-Mamoré languages (van der Voort
2005: 397)

‘bark’ ‘fruit’ ‘bone’ ‘tooth’ ‘liquid’ ‘round’ ‘thorn’ ‘porridge’ ‘powder’

Kwaza (isolate) -kalo -ko -su -mãi -mũ -tɛ -nĩ -mɛ̃ -nũ
Kanoê (isolate) -ko -mũ -tæ -nũ
Aikana (isolate) -zu -mũj -mũ -ðãw -nũ
Arikapú -nĩ -mrɛ̃ -nũ
(Macro-Jê)
Nambikwara -kalo -su³ -nũx³
(Namb.)

phenomenon is particularly striking in light of the general constraints against the

borrowing of form in the region, as mentioned in section 9.1 above; classifiers seem
to behave somewhat exceptionally in this context. Particularly noteworthy examples
include the loan of an extensive set of classifier etyma from Bora (Boran) into
Resígaro (Arawakan; Seifart 2011), the congruence of classifier forms among appar-
ently unrelated languages of the Guaporé-Mamoré region (van der Voort 2005;
see Table 9.2), and apparent Arawakan and East Tukanoan classifier forms in Kakua
(Kakua-Nukakan). A similar example can be seen not far away in the Chaco region,
where Pilagá (Guaycuruan) has borrowed classifier forms from Wichí (Matacoan)
(Comrie et al. 2010: 112). Motivation for the direct borrowing of classifier forms is
not clear, but may have to do with their heavy role in reference-tracking in
Amazonian languages, and the proneness of such elements of discourse organiza-
tion to borrowing crosslinguistically (see Matras 1998; Seifart 2011).

9.2.2 Tense

Like nominal classiﬁcation systems, tense-marking in western Amazonian lan-

guages also exhibits a considerable degree of elaboration. Of the sixty-three
languages considered by Müller (2013), most distinguish three to four grammat-
ical tenses, and about half encode two remoteness distinctions in the past and at
least one in the future (Müller 2013: 68–9). As Müller (2013: 61) observes, this
proliferation of remoteness distinctions in the languages of this region is typolo-
gically extreme. In Dahl & Velupillai’s (2011) global sample of 222 languages, only
forty exhibit graded remoteness, and these cluster heavily in South America (as
well as in Papua New Guinea); of these, the two most extreme cases (with four or
more remoteness degrees) are likewise Amazonian (Chácobo (Panoan) and Yagua
(Peba-Yaguan)).
Shipibo-Konibo, like other Panoan languages, offers an example of an elaborate
inventory of remoteness degrees within its tense system, with seven distinctions:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

240  . .    

(5) Shipibo-Konibo (Panoan; Valenzuela 2003: 284–5)

a. -yat~-ya ‘tomorrow’
-wan ‘early tomorrow’
-ibat~-iba ‘yesterday, a few days ago’
-yantan ‘two months ago’
-rabe ‘nine months to three years ago’
-kati(t) ‘distant, many years ago’
-ni ‘remote past’
b. moa-tian-ronki i-pao-ni-ke jema-bo yama
already-- do.I--- village-: exist.not
‘It is said that in remote times there were no villages.’

Alongside the tendency to proliferation, tense markers often display low select-
ivity and/or bondedness with respect to a host. In Hup, for example, the remote
and proximate tense markers appear only occasionally, and highlight a contrast
between the event time and the reference and/or utterance time. They also are
phonologically free elements in the clause, and while they usually follow the verb
complex, they may also associate with other constituents, and may even act as
demonstrative element and head a noun phase:

(6) Hup (Naduhupan; Epps 2008: 601)

pahá-wәd-ǎn n’ǔh, páy=pog páh yú-wәd-ә́h,
..--  bad=big . that--
‘húptok ‘ectragá-áy’ yúw-úh, ʔacúka-áh,’
manioc.been go.bad- that- sugar-
nɔ́-ɔ̃́y páh yú-wәd-ә́h
say- . that--
‘As for that old guy [who I encountered] just now, he was such a jerk just
now, “sugar makes manioc beer bad,” he just said, that old guy.’

Similarly, while graded tense forms in Chácobo are normally bound, they project
their own prosodic word when not adjacent to a verb form (example (7)). Other
tense markers pattern more ostensibly with auxiliaries; some do not even need a head
verb in the same clause, such as the remote future auxiliary/enclitic (example (8)).

(7) Chácobo (Panoan; Tallman 2018)

a. kako sani=ʔi (ka=ʔitá=kɨ)Pwd
Caco fish= go=.=:
‘Caco went fishing [yesterday or two days prior]’
b. sani=ʔi (kaa)Pwd kako (=ʔitá=kɨ)Pwd
fish= go Caco =.=:
‘Caco went fishing [yesterday or two days prior].’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      241

(8) Chácobo (Panoan; Tallman 2018)

yoʂa pi=no=tsi noʔó yonatí ʂ-í=ki
woman =while=2 1: helper .=:
‘If she is a woman she will be my helper.’

Such low selectivity is in keeping with an indeterminate morphology-syntax

distinction, which may in turn be associated—at least in some cases—with
processes of grammaticalization and diffusion. Hup’s graded tense distinctions
are almost certainly recently innovated, propelled by contact with Tukanoan
languages; while the proximate form páh can be reconstructed to the proto-
language as a general past marker, the distant past form j’ãh́ has a probable source
in the adverb ‘yesterday (or before)’ (see Epps 2007a, 2008). Tariana (Arawakan)
has likewise developed remote/distant past and future tense distinctions via
contact with Tukanoan (Aikhenvald 2002: 121). Additional examples can be
seen in the southwestern languages Aikanã (isolate) and Wari’ (Chapakuran), in
which the future tense has its source in a quotative construction (also evident in
the second-language Portuguese of Aikanã speakers; van der Voort 2016). Müller
(2013: 61) also observes that the elaborate remoteness degree system in Kokama-
Kokamilla (Tupi-Guaranian) resembles that of its Panoan neighbours, suggesting
diffusion. Finally, Müller (2013: 83–5) notes that tense markers tend to be very
divergent across the related languages in her sample (e.g., the Arawakan forms in
example (9)), pointing to extensive innovation. While the extent to which this
innovation might be due to diffusion is unclear, such heterogeneity is particularly
notable for past tense forms, which are observed by Wichmann & Holman (2009)
to be relatively stable crosslinguistically.

(9) Tense markers in Arawakan languages (Müller 2013: 75)

Baure Tenseless
Apurinã Sufﬁx -ko ‘future’
Paresi Enclitic -ite/te ‘future’; enclitic -ene/ne ‘past’
Tariana Fused paradigm of present, past and evidentiality;
sufﬁxes -mhade and -de ‘future’
Yanesha’ Enclitic =cha’ ‘future’; immediate auxiliary o’ch ‘future’

9.2.3 Evidentiality

Evidentiality—here understood as the grammaticalized encoding of information

source—has been observed to be characteristic of the Americas, and particularly of
western Amazonia. Within South America, evidential marking was attested in
68% of the languages surveyed by de Haan (2013), with 27% having both direct
and indirect evidentials, in comparison to the worldwide ﬁgures of 57% and 17%,
respectively (see Müller 2013).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

242  . .    

Evidential systems in western Amazonian languages are also among the most
elaborate in the world with respect to the number of distinct categories encoded.
Languages of the Vaupés region, for example, tend to exhibit as many as ﬁve or even
six categories, generally along the lines of visual, non-visual, inferred, assumed, and
reported (Malone 1988; Aikhenvald 2000; Epps 2005; Stenzel 2008; Silva 2012), as
illustrated by Hup in example (10). Other western Amazonian languages with
complex inventories include Nambikwara (Nambikwaran; e.g., Lowe 1999) and
Shipibo-Konibo (Panoan, Valenzuela 2003: 35–7; see also Aikhenvald 2004),
while still others have complex systems in which the distinction between evidenti-
ality and modality is not fully clear (Karo (Tupian), Gabas Jr. 1999; Andoke
(isolate), Landaburu 2005).

(10) Hup (Naduhupan, Epps 2008)

a. nácia pǽ-ǽy
boat go.upriver-
‘The boat is going upriver’ (default interpretation: I see it)
b. nácia pǽ-ǽy=hɔ̃
boat go.upriver-=-
‘The boat is going upriver’ (I hear it)
c. nácia pǽ-ǽy=cud
boat go.upriver-=
‘The boat is going upriver’ (I infer it from visual evidence, e.g., a wave
on the beach)
d. nácia pǽ-ǽy=mah
boat go.upriver--
‘The boat is going upriver’ (someone told me)
e. nácia pǽ-ní-h
boat go.upriver--
‘The boat is going upriver’ (I infer/assume)

Western Amazonian systems also frequently exhibit an association between eviden-

tiality and tense, sometimes encoded via portmanteau morphs. Matses offers a
typologically extreme example of joint elaboration in these domains (Table 9.3).⁴
While in some languages (including Matses and members of the Tukanoan
family) evidentiality and tense co-occur in obligatory sufﬁxes, in others evidenti-
ality exhibits more syntax-like behaviour. In some cases this behaviour can be
linked to recent innovation—though notably this does not always involve a one-
way association between morphological bondedness and grammaticalization.

⁴ We note that some of these forms appear to be internally analysable, so not all are clearly
portmanteau.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      243

Table 9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593)

Sufﬁx Gloss Temporal reference

R :
-o ‘recent past experiential’ immediate past to about 1 month ago
-ak ‘recent past inferential’ immediate past to about 1 month ago
-aşh ‘recent past conjecture’ immediate past to about 1 month ago
D :
-onda ‘distant past experiential’ about 1 month ago to about 50 years ago
-nëdak ‘distant past inferential’ about 1 month ago to speaker’s infancy
-nëdaşh ‘distant past conjecture’ about 1 month ago to no upper bound
R :
-denne ‘remote past experiential’ about 50 years ago to max. human life span
-ampik ‘remote past inferential’ before speaker’s infancy
-nëdampik ‘remote past conjecture’ before speaker’s infancy

In Hup, for example, the non-visual and inferential evidentials may appear as
either suffixes or enclitics (compare examples (1) and (10b) above). Interestingly,
however, it is the enclitic form that appears to have progressed the farthest along a
trajectory of grammaticalization. As noted above, the suffix forms of both eviden-
tials originated in a morphosyntactic slot that can be occupied by both com-
pounded roots and suffixes, and thus facilitates the reanalysis of the former to the
latter. Then, as argued in Epps (2005), the suffix moved out of the verb complex
via extension to non-verbal predicates and scopal widening, where it developed an
enclitic form that was able to reassociate with verbs in certain contexts. At this
point, the evidential had become fully distinct from the verb root, except in the
contexts where it still exhibited its earlier suffixed form.
In Nanti, on the other hand, the syntax-like behaviour of the reportive eviden-
tial probably does reflect its intermediate grammaticalization from verb to clause-
initial clitic. As described by Michael (2008), the Nanti reportive ke is transpar-
ently related to the verb root kem ‘hear’ and—like other Nanti verbs—is inflected
for person:

(11) Nanti (Arawakan; Michael 2008: 324)

no-ke i=ket-be-ak-a kemari.
1- 3S=pierce---. tapir
‘He wounded (that is, shot without killing) a tapir.’ (reportive)

As with tense, the high heterogeneity seen in evidential forms among related
Amazonian languages suggests recurrent innovation. Aikhenvald’s (2004: 275–84)
discussion of evidential development in a number of Amazonian languages
includes such sources as verbs (e.g., ‘seem’ > inference in Jarawara (Arawan);
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

244  . .    

‘hear, feel’ > non-visual in Tariana (Arawakan)), nouns (e.g., ‘noise’ > reportive in
Xamatauteri Yanomami (Yanomaman), possibly via noun incorporation), and
other morphology (e.g., declarative–indicative marker > direct evidential in
Shipibo-Konibo (Panoan); past tense markers > reportive/attested in Kamayurá
(Tupi-Guaranian)).
Whether or not contact is responsible for such innovations is often unclear; in
some cases, such as Nanti (Michael 2008), emergent evidential systems do not
appear to be directly contact-driven. However, Müller (2013: 227) observes the
regional clustering of Amazonian languages exhibiting evidentiality, as for
example in the Guaporé-Mamoré (Crevels & van der Voort 2008) and the
Vaupés regions, and evidentiality does appear to be relatively prone to diffusion
crosslinguistically (see, e.g., Aikhenvald 2004: 21). Surveys of Amazonian eviden-
tiality (Aikhenvald & Dixon 1998; Aikhenvald 2004: 292; Müller 2013: 228)
suggest multiple points of independent innovation, from which the phenomenon
has likely diffused more widely. Probably the clearest examples of contact-driven
elaboration of evidential systems come from the Vaupés, in which a number of
unrelated languages have undergone the grammaticalization of native forms to ﬁll
a regionally deﬁned set of categories; this is the case for Hup (see above), Tariana
(Arawakan, see Aikhenvald 2002: 117–29), and Kakua (Kakua-Nukakan; Bolaños
2016), among other languages.

9.2.4 Valence-adjusting

Complex valence-adjusting systems have been noted in Amazonian languages,

especially those of the western sub-Andean area (Wise 1990, 2002). Birchall
(2014) found that more than 50% of the South American languages in his sample
had morphological applicatives, and that these are concentrated in the west, where
some languages show particularly elaborate inventories. Also relevant is
Guillaume & Rose’s (2010) observation that a large number of Amazonian
languages exhibit a dedicated ‘sociative causative’, which speciﬁes that the causer
participates in the action along with the causee, in addition to resources for
expressing more neutral causation. They propose that the sociative causative
may be an Amazonian areal feature in light of its apparent rarity elsewhere in
the world, and observe a historical relationship between the sociative causative
and applicative constructions.
Elaborate valence-adjusting morphology is especially evident in the sub-
Andean Arawakan languages, which stand out as having among ‘the most highly
developed systems of morphologically distinct applicative operations on earth’
(T. Payne 1997: 190, cited in Wise 2002: 335; see also Wise 1990; Danielsen 2007;
Valenzuela 2010). Such a system can be seen in Nomatsigenga:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      245

(12) Nomatsigenga (Arawakan; Wise 1971, 2002)

a. -oko ‘with reference to’
-bi / -birí ‘because, for, why, because of ’
-así ‘purposive [action done with some purpose in view], for’
-pí ‘with respect to, in relation to’
-an / -ant ‘instrumental’
-ben / -bin ‘for, benefactive’
-té ‘towards, against’
-ak / -akag ‘comitative/sociative causative’
b. i-samë-ko-k-e-ro i-gisere
3-sleep----3 3-comb
‘He went to sleep with reference to his comb.’
(e.g., he was making it and dropped it) (Wise 2002: 336)

As observed for the other grammatical domains discussed above, Amazonian

valence-adjusting systems often display a highly porous boundary between
morphology and syntax. In particular, valence-adjusting mechanisms in these
languages are often transparently derived or difﬁcult to distinguish from incorp-
oration (of postpositions or nouns). In Paresi (Arawakan), for example, at least
half a dozen different postpositions can be incorporated with valence-adjusting or
argument-rearranging functions (Brandão 2014: 276). A particularly interesting
case is the form kakoa, which Brandão (2014: 256–9) analyses as a reciprocal sufﬁx
when it occurs inside the verb word, and as a comitative postposition when it is
juxtaposed to the right of a noun phrase. Both are fully productive, and both
moreover can co-occur in reciprocal constructions, in which the comitative
expresses one of the arguments involved in the reciprocal event:

(13) Paresi (Arawakan; Brandão 2014: 259)

wakoakare=kakoa Ø=aitsa-kakoa-ha minita hoka
Indian= 3=kill-- always 
kazaihera-ty-oa-heta
be.invisible?---
‘They were always ﬁghting with each other, with the Nambikwara, and he
became invisible.’

Interestingly, the indeterminacy demonstrated by kakoa—which could be

regarded as one morpheme with low selectivity or two morphemes, one syntac-
tically and another morphotactically placed—does not appear to be due to recent
grammaticalization in Paresi. Wise (1990) reconstructs the form *khakh ‘recipro-
cal’ to Proto-Arawakan, but notes that both reciprocal and comitative functions
are widespread, and that reﬂexes of *khakh appear in both postpositional phrases
and in verb phrases in languages representing diverse branches of the family.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

246  . .    

While she suggests that the form originated as a postposition on noun phrases and
entered the verb word via incorporation, she observes that a shift from reciprocal
to comitative function also appears to have occurred in some languages. It seems
likely that the indeterminacy exhibited by Paresi kakoa can be reconstructed to
Proto-Arawakan itself.
In verb-ﬁnal languages, the subtlety of the distinction between incorporation
and pre-verbal object placement can blur the syntax-morphology divide even
further. In Hup, for example, the ‘interactional’ (reciprocal) verbal preﬁx ʔũh-,
which originates in the incorporation of the noun ‘sibling’, can occur as a phono-
logically free element with an intervening object argument (see Epps 2008, 2010):

(14) Hup (Naduhupan; Epps 2008: 488)

hɨd ʔũ̌h nam nɔ́ʔ-ɔ́y
3  poison give-
‘They give poison to each other.’

The diachrony of valence-adjusting systems has in general not been widely

explored, both within Amazonia and beyond (see Haspelmath & Müller-Bardey
2004). However, as with the other domains considered here, the elaborate inven-
tories in sub-Andean Amazonia suggest an areal component. It is tempting to
speculate that the complex systems of applicatives in these languages—many of
which appear to originate in the incorporation of postpositions and other
element—might represent the intersection of the complex verb morphology and
incorporating tendencies of western Amazonian languages with the proliﬁc case-
marking tendencies of Andean languages. Wise (2002: 341) also points out a
number of similar applicative and causative forms in unrelated sub-Andean
languages (e.g., Chayahuita (Cahuapanan) -të/-ta, Arabela (Zaparoan) -ta/-tia,
and Yagua (Peba-Yaguan) -ta/-tya), and van der Voort (2005: 400) observes
similar widespread forms in Guaporé-Mamoré languages (e.g., Kanoe (isolate) -
ta-/-to-, Kwaza (isolate) -ta-/-tia-, and Karo (Tupian) -ta-; see also Crevels & van
der Voort 2008: 167). Although these forms are very short, at least some of these
similarities may be due to direct borrowing. Otherwise, clear evidence for diffu-
sion in the grammaticalization of valence-adjusting morphology comes once
again from the detailed studies of contact in Vaupés languages Hup (Epps
2007a, 2010) and Tariana (Aikenvald 2002: 113–16).

9.2.5 Summary

The studies we have reviewed thus far suggest that Amazonian languages tend to
display a high degree of morphological elaboration in particular grammatical
domains, and that many of these proliﬁc domains show evidence of restructuring
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      247

and diffusion across unrelated languages in particular geographic regions.

Moreover, in case after case, there is analytic indeterminacy between a morpho-
logical and a syntactic treatment of such elements. Often this indeterminacy can
be linked to grammaticalization—either as an outcome of a relatively recent
change from syntax to morphology or as a facilitator of developments in which
innovative morphological forms are delinked from the constructions in which
they originated, and extended to new morphosyntactic contexts. Notably, these
processes appear to involve movements toward both tighter and looser bonding of
morphological forms, rather than a more consistently one-way trajectory toward
afﬁxation, and in some cases freer and more bound instantiations of the same
morpheme appear to co-exist in a relatively stable fashion.
The ubiquity of such cases in the Amazonian context means that they cannot be
treated as categorically different from ‘normal’ cases. While comparable phenom-
ena are mentioned in theories that advocate or presuppose morphological auton-
omy, they are considered in such discussions to be unusual and of marginal
importance (e.g., Blevins 2006: 555). However, western Amazonian languages
suggest that at least in some regions of the world they may be the norm rather
than the exception, and that an index of the degree of autonomy from syntax
should be incorporated into the study of the complexity of morphological systems.
However, the cases reviewed thus far only provide anecdotal evidence for this
perspective, focusing on individual elements within particular languages. In what
follows, we engage with the issue of morphological autonomy on a more global
level, addressing the broader morphological proﬁles within a sample of languages.

9.3 Exponence complexity and morphological autonomy

This section takes up the relationship between EC and the morphology-syntax

divide empirically in western Amazonian languages. We develop and demonstrate
a methodology that provides more globally oriented metrics of morphological
autonomy, focusing primarily on Anderson’s second category of morphological
complexity, ‘exponence complexity’ (see section 9.1). EC is a key element of the
distinction between morphology and syntax: Advocates of morphological auton-
omy maintain that while complex deviations from biuniqueness (allomorphy,
multiple exponence, morphomic structure, etc.) apply in the form-meaning map-
pings in morphology, these are rare or even absent at the syntactic level (Booij
1997, Anderson 2015a, Blevins 2016b; cf. Haspelmath 2011). Although the inves-
tigation is necessarily preliminary at this stage, we argue that the results lend
support to the view that low morphological autonomy is a robust feature of
languages in the western Amazon region.
Our approach can be summarized as follows. If EC is associated with morph-
ology, and morphology is concerned with the structure of words as at least
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

248  . .    

partially autonomous systems, then we should expect EC to correlate with other

criterial properties of words or parts of words, such as bound status, prosodic
dependence, and contiguity. We take the position that the strength and signiﬁ-
cance of these correlations can be used to assess the degree of discreteness between
morphology and syntax in a given language. Below, we show that the correlations
between EC and criterial wordhood properties are low and usually non-signiﬁcant
in the western Amazonian languages we sampled. This observation further sup-
ports our argument, presented anecdotally in section 9.2 above, that the languages
of this region tend to display a low degree of morphological autonomy.
The following sections provide a description of the languages considered in this
study (section 9.3.1), an overview of the properties of EC considered and a
statistical summary of its realization across these languages (section 9.3.2), and a
discussion of the correlations between EC and other wordhood criterial properties
(section 9.3.3).

9.3.1 Languages considered

Our sample consists of eleven western Amazonian languages from nine language
families (see Figure 9.1): Cavineña (Tacanan; Guillaume 2008), Chácobo (Panoan;
Tallman 2018), Hup (Naduhupan; Epps 2008), Jarawara (Arawan; Dixon 2004),
Kokama-Kokamilla (Tupi-Guaranian; Vallejos 2010), Kotiria (Tukanoan; Stenzel
2013b), Movima (isolate; Haude 2006), Paresi (Arawakan; Brandão 2014),
Ashéninka Perené (Arawakan; Mihas 2015), Tariana (Arawakan; Aikhenvald
2003b), and Urarina (isolate; Olawsky 2006). The three Arawakan languages
represent distinct branches of this family. The eleven languages are distributed
widely across western Amazonia, although some (in particular Hup, Kotiria, and
Tariana) are not geographically independent. We have focused on languages with
descriptions that are detailed enough for us to code wordhood properties and
properties of EC for a range of morphemes.
The concept of morphological autonomy developed in this chapter is a relative
one, which we quantify as an index that can vary from language to language.
Accordingly, we need a baseline for assessing how this index ranks in comparative
perspective. While this is a large-scale typological problem, we take a preliminary
step by comparing the Amazonian languages in our sample to Central Alaskan
Yup’ik (CAY; Eskimo-Aleut family). There are three reasons for choosing CAY as
a point of comparison: (i) it is a well-described language with a relatively com-
prehensive grammar and an extensive literature on its morphological and syntac-
tic structure; (ii) it is comparable to Amazonian languages in displaying a high
degree of system complexity in its morphology (i.e., it is a polysynthetic language);
and (iii) it diverges from Amazonian languages in that its morphological and
syntactic structures have been described as easily distinguishable from one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Figure 9.1. Western Amazonian languages sampled
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

250  . .    

Table 9.4. Number of morphemes coded in this study by language and functional
domain

Valence Tense Evidentiality Nominal Total

Classiﬁcation

Perene 16 14 14 68 98
Tariana 34 29 14 81 119
Jarawara 5 13 1 0 19
Kotiria 3 0 10 21 34
Urarina 6 15 3 0 24
Movima 32 8 2 111 153
Hup 9 5 5 46 65
Cavineña 17 6 3 0 26
Chácobo 38 20 4 0 61
Paresi 10 11 0 11 32
Kokoma-Kokamilla 6 19 4 0 29
CAY 13 10 2 0 25

another on both syntagmatic and morphophonological grounds (Miyaoka 2012;

Woodbury 2017). While morphology and syntax are of course interwoven in CAY
(e.g., in incorporation), clear cases of indeterminacy in word segmentation do not
appear to be as ubiquitous as they are in many Amazonian languages (see
Miyaoka 2012: 18). Our hypothesis is that, in general, CAY will rank higher
than the western Amazonian languages on metrics of morphological autonomy,
reflecting the Amazonian areal tendency to make a fuzzier distinction between
words and phrases.
For the eleven western Amazonian languages and CAY, we coded a total of 685
morphemes for morphological and wordhood properties in the four domains of
grammar that have been discussed in this chapter: nominal classification, eviden-
tiality, tense, and valence-adjusting (Table 9.4).⁵ Morphemes were identified on
the basis of their function as grammatical elements associated with these domains
(i.e., elements that do not function exclusively as members of a major word class).
The variation in the number of elements per functional domain across the sample
certainly reflects typological differences among the languages, and may also reflect
differences in coverage across grammars regarding particular grammatical
domains. The differences between languages sampled with respect to the total
number of morphemes coded makes the interpretation of the statistical signifi-
cance of the correlations somewhat more tentative than it would be if they were

⁵ In general, non-linear and syncretic morphology was not evident in the data. Given that the
relationship between morphology and syntax is treated in global fashion in the literature, we did not
address possible variation in this regard among domains; however, this could be an interesting question
to consider in future work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      251

more equal. Further details concerning the coding methodology and the metrics of
morphological autonomy are provided in the following sections.

9.3.2 Exponence complexity

The types of EC considered in this study are listed in (15). Each type of EC was
coded as a binomial or ordinal value for the morphemes coded in this study.

(15) a. Number of allomorphs (ordinal: 1 to 5)

b. Suppletive allomorphy (binary: yes = 2, no = 0)
c. Multiple expression (binary: yes = 1, no = 0)
The EC score we develop in this study is simply the sum of these three scores. For
instance, a morpheme that is realized by two allomorphs that are non-suppletive
(i.e., related by productive morphophonological rules) and do not involve multiple
expression will receive a score of 2; a morpheme that has two allomorphs that are
related through suppletion and do not involve multiple expression will receive a
score of 4. Below, we describe the process of measuring these variables, and
provide a justiﬁcation for the scoring techniques used in this study. We then
present an overview of EC scores across the languages considered in this study.

• Number of allomorphs. This variable refers to a count of the segmental

allomorphs associated with a given morpheme. The number of allomorphs
and the presence/absence of suppletion (our second variable) together relate
to Anderson’s complexity measure of allomorphy (see section 9.1 above). It
should be noted that for this metric, we are simply concerned with counting
the allomorphs, whether they are morphophonologically conditioned or
suppletive (i.e., these are not distinguished here). We assume that a higher
number of allomorphs translates into higher EC, all other factors remaining
equal. The maximum number of allomorphs found in our data was ﬁve, but
the vast majority of morphemes only have one allomorph. Table 9.5 presents
the number of morphemes coded at each level for this variable.
An example of a morpheme with at least four allomorphs is the CAY
applicative ut~ul~us~uc (example (16)).

Table 9.5. Number of allomorphs per morpheme attested across the sample

Number of allomorphs 1 2 3 4 5

Number of morphemes 563 110 6 4 2

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

252  . .    

(16) CAY (Miyaoka 2012: 1132–3)

a. kis’-ut-aanga
sink-APL-.3.1
‘It [e.g. anchor] sank with me.’
b. AngunP(=E)=llu kis’-ul-luku kica-mA=S
man..sg.=and sink-APL-.3. anchor-.
‘The man sank along with the anchor’, i.e. the anchor sank along with
the man (entangled).
c. An-us-gu mikelnguq!
go.out-APL-.2.3 child..
‘You [] take the child out!’
d. unuaqu-uc-iiq-aaten
be.tomorrow-APL--.3.2
‘It will be tomorrow before you (sg.) are done.’ (lit. It [the dawn] will
come on you)

The opposite extreme can be seen in the Urarina causative, which displays no
variation in phonological form—it is always realized as -a:

(17) Urarina (Isolate; Olawsky 2006: 459–60)

a. kanʉ komasaj ʉ-a-anʉ
1 wife come-1-1/
‘I have brought my wife.’
b. tɕãe kanaanaj-ʉrʉ eno-a-e=lʉ
also child- enter-1-3/=
‘He also made the children enter.’

• Suppletive allomorphy. Suppletive allomorphy is considered one of the most

important defining properties for morphological status (cf. Haspelmath &
Sims 2010, inter alia). An example of suppletive allomorphy can be seen in
the tense-modal suffixes of Jarawara, the forms of which vary depending on
the gender of the subject; for the immediate past non-eyewitness tense-
modal suffix, the masculine form in (18a) is distinct from the feminine
form in (18b). Because there is no identifiable phonological rule that
accounts for the difference between the masculine and feminine forms and
generalizes beyond this particular pair of tense-modal suffixes, cases such as
these are coded as suppletive.

(18) Jarawara (Arawan; Dixon 2004: 206–7)

a. bahiS to-ke-hino
sun() -in.motion-..:
‘The sun is (surprisingly to me) going away [i.e., setting]’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      253

b. baniS mee wina-tee-hani

animal() 3 live--..:
‘There were surprisingly many animals.’

The difﬁculty with suppletion in a study such as this one is that many (perhaps
most) linguists have an intuition that suppletive allomorphy is qualitatively
distinct from allomorphy based on productive morphophonological rules.
However, in order to calculate a global EC score we need to change this qualitative
intuition into a quantitative metric. To capture the fact that we view suppletive
allomorphy as a much stronger weight to EC, suppletion is coded as a binary
variable, but one that is weighted relatively heavily (2 for morphemes that display
suppletive allomorphy; and 0 for those that do not). Thus if a morpheme displays
suppletion its EC score will automatically be 4 (number of allomorphs: 2 +
presence of suppletion: 2).

• Multiple expression. Multiple exponence, or deviations from biuniqueness, is

another measure of morphological complexity as defined by Anderson
(2015a; see section 9.1). Here we focus on discontinuous realizations of
form that correspond to a single unit of content, that is, infixes and circum-
fixes. We found no infixes in the languages considered in this study, and
there were only a few other cases of multiple expression, such as the
reflexive/reciprocal k(a)- . . . -ti in Cavineña:⁶

(19) Cavineña (Tacanan; Guillaume 2008: 271)

tudya=yatse ka-peta-ti-kware e=kwe e-jakwi=tsewe
then=1 -look.at-=. 1- 1-brother.in.law=
‘Then my brother-in-law and I looked at each other [wondering who of us
would know how to milk a cow].’
Multiple exponence was coded as a binary variable: morphemes like the Cavineña
reﬂexive/reciprocal would receive a score of 1 for expression in a discontinuous
fashion, whereas a one-form-one-meaning correspondence would receive a 0.

• A metric for gauging EC. We calculated a global measure of EC for each

morpheme by summing up the scores for the three EC criteria described
above. Accordingly, a morpheme that is realized as one contiguous form
with no allomorphy will receive an EC score of 1; typically morphemes that

⁶ Other examples involve the obligatory double-marking of a particular operation; for example, the
Tariana passive requires the co-occurrence of the prefix ka- (which elsewhere functions independently
as a ‘relative’ prefix) and the suffix -kana (Aikhenvald 2003b: 259). We did not consider other types of
deviation from biuniqueness (besides allomorphy and multiple expression) because they were found to
be very marginal in our data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

254  . .    

receive such a score are described as syntactic elements (e.g., function words)
or as agglutinative morphemes. Higher EC scores are associated with forms
that deviate from biuniqueness in some way. Our coding was carried out
independently of the grammarian’s structural classiﬁcation of the morpheme
in question; for example, we include auxiliaries used as analytic causatives as
well as morphological causatives. For this reason, it is unsurprising that a
relatively high percentage of morphemes in even a highly polysynthetic
language like CAY have a low EC score (56%); this outcome simply reﬂects
the fact that elements Miyaoka (2012) regards as ostensibly syntactic (tem-
poral frame adverbs, evidential clitics, etc.) were coded alongside those he
treats as morphological elements. This strategy gets at precisely what we are
aiming for: we are interested in how morphology and syntax may (or may
not) be distinct in the languages in question, not just how morphemes that
grammarians have categorized as morphological correlate with indices of
morphological complexity.

Individual morphemes score from 1 to 5 in EC across the languages in the

sample.⁷ The percentage of morphemes associated with each EC value in the
twelve languages considered are provided in Table 9.6.
A visual representation of the distributions of EC values across the twelve
languages is provided in Figure 9.2, which provides kernel distributions of EC
value densities across the languages in the study.

Table 9.6. Percentage of morphemes for each EC value across the languages sampled
(with average scores across all the morphemes for each language)

Family 1 2 3 4 5 Average score

CAY Eskimo-Aleut 56% 20% 4% 16% 4% 1.92

Cavineña Takanan 85% 11.5% 4.5% 0% 0% 1.27
Chácobo Panoan 92% 8% 0% 0% 0% 1.08
Hup Naduhupan 94% 6% 0% 0% 0% 1.06
Jarawara Arawán 26% 5% 16% 53% 0% 2.95
Kotiria Tucanoan 97% 3% 0% 0% 0% 1.06
Kokama Tupian 58% 42% 0% 0% 0% 1.41
Movima isolate 57% 5% 11% 27% 0% 2.08
Paresi Arawakan 91% 6% 0% 3% 0% 1.16
Ash. Perené Arawakan 91% 5% 2% 2% 0% 1.15
Tariana Arawakan 91% 6% 1.5% 0% 1.5% 1.16
Urarina isolate 83% 17% 0% 0% 0% 1.17

⁷ As seen in (15) above, the EC score according to our metric could be higher for any given
morpheme, but in our data set none go above 5.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      255

1 2 3 4 1 2 3 4
Movima Paresi Tariana Urarina

1.5

1.0

0.5
Kernel Density Distribution

Hup Jarawara Kokoma-Kokamilla Kotiria

1.5

1.0

0.5

Asheninka Perene Cavineña Central Alaskan Yupik Chácobo

1.5

1.0

0.5

1 2 3 4 1 2 3 4
Exponence Complexity

Figure 9.2. Kernel distribution of densities across the languages of this study

We note two points about the EC values across the languages of this study.
First, it is generally true that CAY morphemes are more evenly distributed across
the range of EC scores in comparison to the other languages—in other words, they
are less likely to cluster at any particular EC value, most notably 1 (the lowest).
This is to be expected based on current descriptions of CAY as highly morpho-
phonologically complex, such that afﬁxal elements display a high degree of word
internal adjustments (i.e., fusion; see, e.g., Fortescue 1992); a higher degree of
allomorphy will produce higher EC values. Second, and in contrast to CAY, the
western Amazonian languages sampled cluster predominantly around the lowest
EC value (1)—in keeping with the observation that languages of this region tend
to exhibit a highly agglutinative proﬁle. On the other hand, Movima, Jarawara,
and to a certain extent Kokama-Kokamilla display higher EC levels—a point we
return to below.
Despite the generalizations made here, we emphasize that a higher EC score
does not necessarily translate to a higher degree of morphological autonomy.
Higher morphological autonomy is only corroborated if EC correlates with other
criterial wordhood properties. In other words, morphological autonomy may be
manifested by high EC scores, but high EC scores may not be limited to autono-
mous morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

256  . .    

9.3.3 Criterial wordhood properties and

morphological autonomy

Criterial wordhood properties refer to features that identify constituents or for-

matives as independent words versus parts of words. Each morpheme coded in the
database is coded with a value (either binary (0,1) or ordinal (0,1,2)) for each of
the criterial wordhood properties. Since each morpheme also has an EC value
associated with it, we can assess the correlation between EC level and wordhood
values. We investigate the following three criterial wordhood properties:

(20) a. Bound status (yes = 1, no = 0)

b. Prosodic dependence (0 = never, 1 = sometimes, 2 = always)
c. Contiguity (yes = 1, no = 0)

In what follows we provide a brief discussion of each of these criterial word-

hood properties and how they were coded. We then turn to measurements of
association between the wordhood properties and EC complexity across the
languages in our sample.
According to our conception of morphological autonomy as a typological index
along which languages may vary, we propose that the morphological system of a
language can be more or less autonomous. However, we do not feel that we are in a
position to directly measure morphological autonomy, since it involves many
interacting criteria that need to be weighed against one another in a principled
way (although future research on this topic may make an overall global measure
more appropriate, as suggested by Haspelmath 2011). For this reason, we simply
provide statistical summaries of the correlations between EC levels and wordhood
criteria in the languages considered here.
Due to the fact that the variables are binary and/or ordinal and not normally
distributed we use rank statistics to assess the relationship between EC level and
criterial wordhood value across the languages. We use Kendall’s tau adjusted for
ties in the statistical analysis programme R (McLeod 2011).⁸ In contrast to other
rank correlation statistics like Spearman’s rho, Kendall’s tau is ideal for compari-
sons that involve many ties and small sample sizes. The data we gathered naturally
contains many ties because we are comparing variables that, at most, are quanti-
ﬁed from zero to ﬁve across a large sample of morphemes. Furthermore, as can
be seen from Table 9.1, we gathered fairly small samples of data, according to the
morphemes and constructions described in the grammars. We are concerned here
with effect size (i.e., correlation strength) as much as we are concerned with

⁸ For an explanation of this methodology, including the concept of ties in rank statistics, see Kendall
& Gibbons (1992) and Gibbons (1993). For an introduction to using Kendall’s tau in R, see Field et al.
(2012: 225–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      257

statistical significance. The tau statistic can be read as a measure of the degree of
morphological autonomy that a relationship between EC and a criterial wordhood
property affords. For a given association, a strong positive correlation (a tau
coefficient that approaches 1) suggests a more robust distinction between morph-
ology and syntax; a weak or negative correlation (a tau correlation close to and/or
below 0) suggests a more porous boundary between the two. In this study we judge
a correlation to be significant if the p-value is lower than 0.05.⁹

• Bound status. Bound status is a classic criterion for wordhood (Bloomﬁeld

1933; Hockett 1958).¹⁰ Here we consider a morpheme bound if and only if it
fails the minimum free form test (and is not a primary content item, i.e., a
member of a major word class, such as a verb); otherwise it is considered free.
A morpheme or construction passes the minimum free form test if it can
stand alone as a single grammatical utterance. Crosslinguistically, bound
status tends to be associated with morphological elements, while free forms
are more syntactically relevant (Bloomﬁeld 1933: 207). Despite a tendency
toward lower EC, western Amazonian languages typically have a large
repertoire of bound forms. Example (21) illustrates a verb complex from
Chácobo: The only morpheme which can stand on its own is the verb root
oʂa ‘sleep’; all other morphemes are bound.

(21) Chácobo (Panoan; Tallman 2018)

a. oʂa-mis=tɨkɨn=kas=ʔitá=kɨ=rɨ́
sleep-===.=:=
‘What a shame that he only wanted to sleep yesterday.’
b. oʂa ‘asleep’
c. *-mis
d. *=tɨkɨn
e. *=ria
f. *=ʔitá
g. *=kɨ
h. *=rɨ

⁹ Of course, high p-values do not necessarily imply that there is no relationship between the EC
score and a wordhood property (the sample sizes are too small to afford such an interpretation). We
include the information regarding statistical significance for the reader who is interested in gauging
how reliable our results are on this point.
¹⁰ A number of authors have pointed out problems with the minimum free form test (Haspelmath
2011; Bickel & Zuñiga 2017), in particular that it identifies compounds as phrasal elements and certain
function words (determiners) as morphological elements. However, this test is not uniquely problem-
atic among wordhood tests, as Haspelmath’s (2011) systematic review demonstrates. Furthermore, the
test still provides useful information regarding morphological vs. syntactic status; for instance,
Haspelmath (2011: 40) points out that if an element passes the minimum free form test this provides
strong evidence that this element is not an affix. We see non-affixicality as an important criterion in
calculating overall morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

258  . .    

Table 9.7. Rank correlations between EC level

and bound status values across languages

tau correlation p-value

CAY 0.543 0.004

Kokama 0.449 0.017
Jarawara 0.329 0.139
Paresi 0.246 0.166
Tariana 0.189 0.038
Hup 0.183 0.143
Ash. Perené 0.119 0.231
Chácobo 0.099 0.445
Cavineña 0.085 0.671
Urarina 0.037 0.858
Kotiria 0.342 0.050
Movima 0.716 >0.005

We encode bound status as a binary variable. The morpheme in (21b) would

receive a score of 0, and all of the other morphemes (21c–h) receive a score of 1.
Table 9.7 provides the rank correlations across the languages of this study. The
tau correlation can be interpreted as an indicator of effect size; how strongly
associated EC level is with bound status in the language. In CAY and Kokama-
Kokamilla there are signiﬁcant positive correlations, with CAY coming out on top.
In Movima, however, there is a signiﬁcant and negative correlation, a point we
return to in section 9.3.4 below. Such measures of association are here considered
to be metrics of morphological autonomy.

• Contiguity. This criterion refers to whether a given formative is required to

occur directly adjacent to the morpheme it semantically combines with, or
can be separated from it by a free element. A lower degree of contiguity is
associated with a more syntactic status, while a higher degree of contiguity is
associated with a more morphological status (e.g., Mugdan 1994; Dixon and
Aikhenvald 2002). To illustrate the criterion of contiguity, we can make
reference to the Chácobo verb complex in example (21) above. According
to the minimal free form test the verb complex in this example is a single
word-unit, but according to rules of contiguity it consists of at least ﬁve
different units, each of which can be separated from its neighbours by a full
noun phrase such as honi siri ‘old man’. Example (22) illustrates the possi-
bility of inserting this noun phrase at any of the points (a–e). Only the
antipassive -mis and the combination of the recent past and past tense
declarative =ʔitá=kɨ require contiguity.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      259

(22) Chácobo (Panoan; Tallman 2018)

a. (honi siri) oʂa-mis b. =tɨkɨn c. =kas d. =ʔitá=kɨ
(man old) sleep- = = =.=:
e. =rɨ́
=
‘What a shame that the old man only wanted to sleep again yesterday.’

We coded the criterion of contiguity as a binary variable for a given morpheme.

Morphemes that can be separated by a free phrasal construct from the element
they associate with semantically receive 0 for contiguity (as in 21d–h). If the
morphemes require contiguity they receive a 1, as with antipassive -mis (21c).¹¹
A language that displays a high degree of morphological autonomy is expected
to show a strong and positive correlation between EC level and morphemic
contiguity. Table 9.8 shows the rank correlations between EC level and contiguity
across the languages of this study. CAY, Jarawara, and Ashéninka Perené show
positive and signiﬁcant correlations between EC level and contiguity, with CAY
coming out on top. While Ashéninka Perené’s correlation is statistically signiﬁ-
cant, the effect size is substantially lower than for CAY. Thus on this EC contiguity
metric only CAY and Jarawara provide evidence for morphological autonomy.

• Prosodic dependence. For a given formative or construction, prosodic word

projection is prototypically associated with wordhood status. Incorporation
into an adjacent prosodic word is prototypically associated with afﬁx status
(Spencer & Luís 2012).

Table 9.8. Rank correlations between EC level and

contiguity value across languages

tau correlation p-value

CAY 0.594 0.002

Jarawara 0.532 0.016
Cavineña 0.305 0.121
Ash. Perené 0.236 0.018
Urarina 0.205 0.325
Chácobo 0.178 0.168
Tariana 0.131 0.150
Movima 0.101 0.190
Kotiria 0.030 0.862
Paresi 0.053 0.739
Kokama 0.139 0.462
Hup 0.166 0.183

¹¹ A reviewer suggests that contiguity might be better treated as a three-way variable, with inter-
mediate status given to elements that require adjacency in some constructions but not in others. We
concur that this could be a productive approach to explore, but for the purposes of this study it was
found to be too difﬁcult to apply in a consistent way.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

260  . .    

In some cases, the prosodic dependence of a given morpheme may vary

depending on its syntagmatic context. For example, the Jarawara auxiliary na
prosodically incorporates into the main verb, with which it forms a single phono-
logical word (23a), but projects its own independent prosodic word when it
combines with other afﬁxes (23b).¹²

(23) Jarawara (Arawan; Dixon 2004: 30)

a. amó+na
sleep+.
‘She sleeps.’
b. amo o-ná-habóne
sleep --.
‘I’m going to sleep.’

A similar situation occurs with tense morphemes in Chácobo, but the syntagmatic
contexts that license prosodic word projection or incorporation are different: In
this language, a tense morpheme prosodically incorporates into an adjacent verb
root (24a), but projects its own prosodic word when a subject NP intervenes
((24b), repeated from (7a–b) above).

(24) Chácobo (Panoan; Tallman 2018)

Finally, some grammatical formatives may always project their own prosodic
words, as exempliﬁed by the Jarawara ‘aspect/time lexeme’ hibati ‘completed’
(example (25); Dixon 2004: 223); see also the Hup recent past marker páh in (6) above:

(25) Jarawara (Arawan; Dixon 2004: 223)

Barako owa heta na-re-ka
name() 1. lease.from -..:-:
hibati jaa
 
‘Branco did lease [the ﬁshing waters] from me, but this arrangement is now
ﬁnished.’

¹² Dixon uses the symbol ‘+’ to indicate what he refers to as ‘a grammatical word boundary within a
phonological word’ (2004: 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      261

Table 9.9. Rank correlations between EC level and

prosodic dependence across languages

tau correlation p-value

CAY 0.543 *0.045

Kokama 0.491 0.083
Jarawara 0.426 *0.049
Ash. Perené 0.187 0.053
Tariana 0.155 0.076
Cavineña 0.151 0.443
Chácobo 0.129 0.307
Paresi 0.122 0.485
Urarina 0.107 0.595
Hup 0.018 0.884
Movima 0.089 0.236
Kotiria 0.274 0.111

Our scoring captures these three possible degrees of prosodic dependence. If a

formative always projects a phonological word, it receives a score of 2; if it never
projects a phonological word (i.e., it always phonologically incorporates), it
receives a score of 0. Formatives that do both receive a score of 1, as in the
Jarawara and Chácobo cases above.¹³
Table 9.9 provides the rank correlations for the languages considered in this
study. CAY displays the strongest correlation for the relationship between pros-
odic dependence and EC. Only two languages, CAY and Jarawara, display a
signiﬁcant and positive correlation.

9.3.4 Summary

By comparing measures of EC and wordhood status, we obtained a metric by

which to gauge the relative degree of morphological autonomy across our sample
of languages. The western Amazonian languages in our set show a relatively low
degree of morphological autonomy, in contrast to our geographic and typological
outlier, CAY, which scored much higher on all measures considered.
Despite the fact that the types of EC considered here (allomorphy, culminativ-
ity) have been described as unproblematic measures of morphological complexity

¹³ One might argue that prosodic independence is more a fact about the phonological or prosodic
component of grammar, rather than having anything to do with the morphology-syntax distinction.
However, the relevance of this criterion is evident in the problem of clitics. As Spencer & Luís (2012)
argue, the clitic can be understood as a ‘boundary category’—which calls into question the discreteness
of the components that it straddles (Croft 1991, 2001). From this perspective, a language with a greater
degree of isomorphism between phonological words and grammatical words would be understood as
having a higher degree of morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

262  . .    

(e.g., Anderson 2015a), our study illustrates that their status as necessarily mor-
phological cannot be assumed. This point is best exemplified by Movima, which
demonstrates a relatively high level of EC complexity in comparison to the other
languages in the sample, but coupled with a lower overall tendency for morphemes
to be dependent elements with respect to the wordhood criteria considered here,
particularly bound status. Similarly, while Jarawara comes closest to CAY in
displaying morphological autonomy via its relatively high correlations between
EC and the wordhood measures of contiguity and prosodic dependence, its
association between EC and bound status is weak and non-significant. The
Movima and Jarawara cases demonstrate that deviations from biuniqueness are
in principle orthogonal to the structural classification of form-meaning mappings
as either morphological or syntactic.

9.4 Conclusion

Our findings suggest that a relatively loose distinction between syntax and
morphology is an areal feature of western Amazonian languages (perhaps extend-
ing into neighbouring regions). In this chapter, we have presented evidence for
this view of Amazonian morphological profiles from two major angles. From the
perspective of system complexity, we addressed morphological behaviour across
four domains that show a tendency toward elaboration in western Amazonian
languages—nominal classification, tense, evidentiality, and valence-adjustment—
and for each explored the relationship between complexity and language contact
and change. Turning our focus to EC, we systematically evaluated aspects of this
domain against criteria associated with wordhood for a sample of eleven western
Amazonian languages, plus CAY as a point of contrast. In addition to showing
that the Amazonian languages all exhibit relatively low degrees of morphological
autonomy, our findings highlight the important point that factors associated with
morphological complexity are in fact not necessarily morphological: for two
Amazonian languages in our sample, high EC does not correlate strongly with
wordhood status. In future work, we hope to expand the typological scope of this
survey, in order to establish the degree to which Amazonian languages might
deviate from a more widely defined baseline relating to morphological autonomy,
and to determine a more precise understanding of the geographic distribution of
these patterns within and beyond South America.
The low degree of morphological autonomy in western Amazonia has import-
ant implications not only for our understanding of synchronic relationships
among linguistic subsystems, but also for our conception of diachronic processes
of contact and grammaticalization. As we have argued here, the porous nature of
the morphology-syntax distinction in Amazonian languages is associated with
other areal tendencies, such as productivity of compounding and incorporation,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      263

that facilitate grammaticalization by creating a context in which lexical elements

are easily reanalysed as bound morphology. These processes in turn can feed the
elaboration of grammatical domains, particularly under the pressure of areal
diffusion. A fuzzy morphology-syntax distinction also allows for low selectivity
on the part of grammaticalizing morphological elements, through which they may
readily detach from the contexts in which they emerge and be extended to new
ones. These processes result in outcomes that are typologically unusual in broader
perspective; in particular, that morphologization might frequently involve a
decrease in bound status, and that more and less bound instantiations of particular
morphemes might be maintained over time, rather than representing only ﬂeeting
stages of a transition in progress.
In sum, a closer look at the morphological proﬁles of western Amazonian
languages invites a revision of current views of morphological complexity and
its relationship to processes of language contact and change. The Amazonian case
underscores the recognition that large-scale regional patterns may play an import-
ant role in shaping our vision of what is canonical or ‘normal’ in language, and
that a robust understanding of human language must take a range of diversity into
account.

Acknowledgements

Epps gratefully acknowledges funding from the University of Texas at Austin, as well as
earlier support from the National Science Foundation, Fulbright-Hays, and the Max
Planck Institute for Evolutionary Anthropology for work on Hup; Tallman thanks the
National Science Foundation and the Endangered Languages Documentation
Programme for supporting his work on Chácobo. We are grateful to the editors of this
volume for inviting us to contribute, and to Peter Arkadiev, Francesca di Garbo, Tony
Woodbury, and an anonymous reviewer for their suggestions.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

III
THE ACQUISITIONAL
PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

10
Radical analyticity as a diagnostic
of adult acquisition
John H. McWhorter

10.1 Introduction

I propose a hypothesis (cf. McWhorter 2016, 2019): that when a language is

radically analytic in comparison to its close relatives, this can be treated as an
indication that in the past, the language was acquired by a critical mass of adults,
rather than having always been passed down the generations intact. In previous
work (McWhorter 2007) I have argued that when a language within a family is
markedly more analytic than its sisters, it can be traced to extensive second-
language acquisition (e.g., English, Persian, Mandarin, Malay). Here, however, my
argument is more speciﬁc, extending this framework to whole families or even
Sprachbunds of languages not just relatively analytic, but extremely so.

10.1.1 Deﬁnition of radical analyticity

By radical analyticity, I refer to absence (or all but absence) of inflectional marking
indicated by affixation, tone, or vowel changes in quality or length. The difference
must be clear with relative analyticity, which linguists often refer to as ‘analyticity’
in a kind of shorthand, such as Nurse (2007) referring to the amply inflected
Supyire (Gur, Niger-Congo) as ‘analytic’ in comparison to especially inflected
languages like those of Narrow Bantu.
My hypothesis distinguishes two kinds of language contact effects: transfer
and structural simplification (although the two are hardly mutually exclusive).
The role of transfer in language contact would seem self-evident and is richly
studied. However, the role of simplification in language contact has been studied
more in regard to pidgins and creoles than to less extremely simplified lan-
guages. Kusters (2003) and McWhorter (2007) were pioneering explorations of
this intermediate range in a crosslinguistic sense, continued by the now seminal
Trudgill (2011).

John H. McWhorter, Radical analyticity as a diagnostic of adult acquisition In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John H. McWhorter.
DOI: 10.1093/oso/9780198861287.003.0010
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

268  . 

10.1.2 Radical analyticity worldwide

This presentation proposes that there are three main geographical clusters of
radically analytic languages with extensive adult acquisition in their histories.
The first is the few Niger-Congo languages that are radically analytic, such as
the Gbe languages, Yoruba, and Nupe (henceforth GYN), which my hypothesis
suggests would have arisen from an earlier Niger-Congo variety with ample
inflection. Yoruba’s near lack of inflectional morphology of any kind is
indicated here:

(1) Yoruba
Mo mú ìwé wá fún ẹ.
I take book come give you
‘I brought you a book.’ (Stahlke 1970: 63)

The second cluster is a few languages of Eastern Indonesia—Austronesian ones on

the island of Flores and a few on Timor—and some non-Austronesian ones on the
northern coast of the island of New Guinea (as documented by Paauw 2007).
Within Austronesian, adult acquisition is considered relatively uncontroversial for
various colloquial dialects of Malay/Indonesian (Grijns 1991; McWhorter 2007:
223–9), and for Tetun (Hull 1999: ix; Thomaz 2002). However, my proposal will
explain why we can infer a history of adult acquisition even for languages of this
region with no documented history, such as ones in central Flores like Rongga,
whose characteristic analytic structure is shown here:

(2) Ema ja’o weli kebaya toro.

father I buy dress red
‘My father bought a red dress.’ (Arka 2011: xviii)

or one of western Papua such as Abun:

(3) Men ben suk no nggwe yo, men ben suk sino.
we do thing  garden then we do thing together
‘If we do things at the garden, then we do them together.’ (Berry & Berry
1999: 23)

Finally, the Sinitic languages can be seen as revealing, in their radical analyti-
city, adult acquisition in their past (cf. McWhorter 2016). The radical analyticity
in language families neighbouring Sinitic, such as Hmong-Mien, Tai-Kadai, and
Mon-Khmer, is often treated as an areal ‘Sinosphere’ feature. I suggest that within
this language area, the radical analyticity, at least, traces to Sinitic. This recon-
struction is especially compelling given that Mon-Khmer languages are most
analytic where Chinese has had inﬂuence, and much less so where it has not,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     269

among the Munda languages to the west and Aslian languages to the south. Under
this analysis, the question becomes how Sinitic itself reached a radically analytic
state in the first place, upon which I argue that adult acquisition is the most
plausible cause.
Under the analysis I will present, the GYN languages likely reached their state
as the result of waves of second-language acquisition as an earlier Niger-Congo
variety travelled southward towards the coast of the Bight of Benin. Various
isolates, as well as the Mande and Ijo groups Dimmendaal (2011) has argued
not to be members of Niger-Congo, are likely remnants of the original language
distribution in upper west Africa. The Flores languages were likely affected
by invasions from Sulawesi (or possibly the aboriginal population of Homo
floresiensis). Hull (1998) makes a strong case that the Timor languages were
deeply impacted by an invasion from the island of Ambon, while Paauw (2007)
suggested that contact with Austronesian as its speakers migrated eastward
affected the languages in Papua. The reason for the analyticity (and in general
the radically isolating structure) of Old Chinese remains unknown, although
DeLancey (2011) and McWhorter (2016: 81–2) offer suggestions—under an
analysis which, we must recall, posit the nature of Old Chinese as an indication
of adult acquisition yet to be identified.

10.1.3 Application to this volume

In modern linguistics, many linguists are sceptical of the idea that the develop-
ment of even radical analyticity necessarily entails a loss of overall morphological
complexity. A guiding caveat is that what was once marked by an affix (or clitic)
can later be marked by a free morpheme, or even a process on some other level of
the grammar such as syntax (e.g., via word order). While this is true, any
assumption that this kind of replacement is somehow regular or even obligatory
in diachronic development is (i) logically unmotivated (i.e., for what reason or
purpose would grammars ‘compensate’ in this way towards an unspecified sine
qua non degree of structural complexity?); and (ii) empirically disproven (Shosted
2006 disproves that languages compensate for loss of complexity in one module by
gaining it in another).
Thus the development of radical analyticity is not a mere matter of a language
transforming its typology in a fashion independent of complexity. Rather, the
languages addressed in this chapter have lost, or all but lost, overt indication of
case marking and concord in any module. They do not mark these with free
morphemes. Moreover, while of course they have syntactic processes sensitive to
the distinction between, for example, subject and object, these are not as obliga-
torified (in the terminology of Lehmann 1985) as affixal markers of these categor-
ies tend to be, often qualifying more as pragmaticized structures rather than
grammaticalized ones.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

270  . 

Similarly, indeed noun class markers can be replaced with free morphemes as in
Wolof (Loporcaro, Chapter 6, this volume), and numeral classifiers in languages
such as many in East and Southeast Asia can be seen as functionally equivalent to
noun class marking (Grinevald & Seifart 2004). However, the free morphemes in
question neither vary for case—much less according to declensional classes indicat-
ing this case variantly—nor vary in form between modifiers and heads as affixal
noun class marking often does (Russian iz krasivyx ženščin ‘of the beautiful women’).
Similarly, while radically analytic languages indicate inherent inflectional cat-
egories such as tense and number with free morphemes, these free morphemes do
not occur in paradigmatic variants independent of semantics, in the vein of verb
conjugational affix paradigms. Furthermore, it would appear that affixation,
complete with the morphophonemic processes it encourages as well as distortions
into outright irregularity beyond, conditions much more irregularity—another
facet of complexity—than free morphemes do. The ‘irregular verb’ is quite rare in,
for example, Yoruba, Mandarin, and Rongga, where there are no affixal markers of
inherent inflection likely to drift into morphophonemic subrules, thorough
irregularity subject to no rule, and then utter suppletion.
Radical analyticity, that is, is less a change of type than an unravelling. Radically
analytic languages remain vastly complex in countless ways, as all languages are.
However, their radical analyticity does entail a significant degree of relative
simplification.

10.2 Adult acquisition versus ‘drift’

That is, I propose that we would no more question whether Yoruba, Rongga, or
Mandarin have extensive adult acquisition in their histories than that we would
question whether the difference between Haitian Creole French and French—loss
of grammatical gender, verbal inﬂection, and much else—were due to extensive
adult acquisition:

(4) a. French
Ils n’ont pas de ressources qui puissent
3. -have   resource.  can.3
leur permettre de résister à la famine.
3. allow of resist to . famine
b. Haitian Creole
Yo pa gen resous ki pou pèmètyo reziste anba
3  have resource  can allow 3 resist under
grangou.
famine
‘They didn’t have the resources that would allow them to hold off famine.’
(Ludwig et al. 2001: 164)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     271

Indeed, specialists in second language acquisition and in language contact concur

that for adult acquirers, the target language’s inflectional morphology is especially
subject to elimination (cf. Pienemann 1998; Plag 2008), as the result of factors of
phonological and semantic transparency, of the same kind that condition hier-
archies of borrowability (cf. Thomason & Kaufman 1988; Matras 2009: 153–7).
Current orthodoxy assumes, however, that adult acquisition is but one of two
pathways via which a language might become radically analytic (cf. Thomason
2003: 242; Hyman 2004). That is, works such as Thomason (2003) and Hyman
(2004) are typical in their assumption that radical analyticity can also occur
grammar-internally as the result of the ‘drift’ process described by Sapir (1921),
in which a language’s grammar-internal changes—or even that of a number of
contiguous languages such as those of much of Europe—coalesce upon a certain
general tendency, such as inflectional loss. The assumption is natural, given that
the loss of significant (if not radical) amounts of inflectional affixation is well-
known from the difference between modern and Old English, between the mod-
ern Mainland Scandinavian languages and Old Norse, and the general ‘drift’
towards analyticity identified by Sapir (1921).
Various treatments, however, have demonstrated that the above cases and
similar ones were, themselves, products of second language acquisition (cf.
Kusters 2003, McWhorter 2007, Trudgill 2011 for general treatments;
McWhorter 2002 on English; Trudgill 2011 on Scandinavian). There is currently
such a volume of studies of this kind that it becomes appropriate to explore a
certain theoretical economy in our theory of language diachrony and its relation-
ship to language contact. To wit: it is worthwhile to explore whether radical
analyticity can emerge only via adult acquisition, and therefore could be useful
as a window on the past of languages whose previous stages are otherwise lost to
history.
In sections 10.3, 10.4, and 10.5, I will present three aspects of radically analytic
languages that suggest that they owe their state to second language acquisition
rather than grammar-internal development. I will then address two prominent
proposals suggesting that radical analyticity could emerge without second-
language acquisition: (in section 10.6) Mufwene’s (2001) proposal that creole
languages’ analyticity is due simply to the analyticity of their source languages;
and (in section 10.7) Hyman’s (2004) proposal that Gbe, Yoruboid, and Nupe
reached their state via the evolution of a monosyllabic phonological template.
I must specify: my claim is not that any degree of adult acquisition of a language
must denude it of a radical amount of its inflectional affixation. Adult acquisition
has occurred in various degrees to, probably, most languages, and has varying
degrees of effect. My argument is that radical analyticity can be analysed as tracing
to an extreme degree of adult acquisition: Trudgill (2011: 57), for example,
suggests that the tipping point for stark inflectional loss begins when non-native
learners constitute 50% or more of the speech community.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

272  . 

10.3 Argument No. 1: contextual versus inherent inﬂection

One indication of radical analyticity’s roots in adult acquisition rather than ‘drift’
is that in languages that have reached such a state, one type of inflection is
eliminated entirely or virtually so, while another type is retained in the form of
free morphemes. This is typical of adult acquisition, but not of grammar-internal
change.
Booij (1993) distinguishes inherent inflection from contextual inflection.
Inherent inflection contributes meaning, driven by the speaker’s choice of what
they wish to communicate. It thus includes nominal number, tense, and aspect,
and is not required for syntactic grammaticality. This contrasts with contextual
inflection which, indicating features such as case and concord necessary to the
syntactic composition of the sentence, has function.
Crucially, in creoles, the lexifier language’s inherent inflection is typically
preserved to a considerable extent in the form of free morphemes, such as pre-
verbal tense and aspect particles (even when the substrate languages were syn-
thetic, as was the case with many creoles; cf. section 10.6 below). However,
contextual inflection is typically not replaced in this fashion (Plag 2008; Luís
2009). In this Haitian sentence, French’s past tense inflection is replaced by the
free form te, but the nouns baay ‘thing’ and moun ‘people’ are not marked for
grammatical gender as their French equivalents are, nor is grammatical gender
marked on Haitian’s definite articles; also, pronouns such as li (here, ‘it’) are not
marked for case:

(5) Yo te suvèye baay sa-a pu anpèche moun vole li.

they  watch thing this- for prevent people steal it
‘They watched this thing in order to prevent people from stealing it.’
(Koopman & Lefebvre 1981: 203)
The facts are similar in pidgins, in which even as free morphemes, contextual
inflection is rare while inherent inflection is frequent (Roberts & Bresnan 2008).
As Plag (2008) notes, creoles’ retention of inherent rather than contextual
inflection is predictable from the hierarchical pathway of second-language acqui-
sition identified by Pienemann (1998), under which inherent morphology is more
easily accessible to the learner than contextual, and thus always acquired first.
In contrast, under ordinary grammar-internal change, contextual morphology is
much less fragile. For example,

1. French has lost Latin’s case inflections on nouns (first collapsing the oblique
cases into one and then losing even this distinction), but retains case
distinctions in pronouns, and concord within NP.
2. Pashto has lost much of the inflection in early Iranian languages, but
nevertheless retains ample case marking and concord.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     273

3. Within Niger-Congo, while Wolof lacks the noun class prefix paradigm
typical of Bantu and even many of its own relatives within the Atlantic
subfamily, it has replaced them with postposed free morphemes (Torrence
2013: 16; cf. also Babou & Loporcaro 2016, Loporcaro, Chapter 6, this
volume), as shown in Table 10.1.
4. Modern Armenian dialects retain Indo-European case marking as well as
inflections distinguishing declensional classes; Albanian also retains case
marking as well as grammatical gender. Adult acquisition is not assumed to
have been significant in the timelines of either of these branches of Indo-
European, as opposed to in Romance and Germanic.
5. Georgian has retained the contextual inflection of Proto-Kartvelian over
several millennia.

These cases serve to illustrate, as Nichols (1992: 169) indicates, that ordinary
grammar-internal change poses no threat to contextual inflection. The contrast is
clear with the extent to which adult acquisition indeed does so.
As such, the fact that radically analytic languages like the GYN ones and those
of central Flores like Rongga retain free morphemes in the function of inherent
inflection, but eschew contextual morphology completely, suggests that they have
roots in non-native acquisition, under which learners had access to inherent
morphology rather than contextual because inherent inflection is more like
derivational morphology, as in more ‘lexical’, and thus more salient to the non-
native learner. This distinction is the one reflected in borrowing as described by
Gardani (2008, 2012, 2018).
Thus, a sentence like the one below in (example (6)) Fongbe contrasts with a
Swahili one not only in encoding aspect with a free morpheme, but in lacking
either bound or free noun class morphology:

(6) Fongbe
Àvún ɔ́ nɔ hàn àɖú mὲ.
dog   bite tooth person
‘The dog bites people.’ (Lefebvre & Brousseau 2002: 266)

Table 10.1. Wolof noun class markers

xaj bi the dog

gaal gi the boat
ndap li the pot
wax ji the talk
jën wi the ﬁsh
ndaw si the young woman
saw mi the urine
nit ki the person
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

274  . 

(7) Swahili
U-levi hu-ondoa akili.
-drunkenness -remove .sense
‘Drunkenness takes away sense.’ (Perrott 1950: 56)

In the same way, Central Malayo-Polynesian languages typically have subject

marking concordial preﬁxes, as in Leti:

(8) Müani-ne püate ra-mtïètne.

man-and. woman. 3-sit.
‘The man and the woman sit.’ (Van Engelenhoven 2004: 243) ( =
indexical marker)

The languages in central Flores such as Rongga lack these preﬁxes, and case
marking, but mark tense and aspect with free morphemes:

(9) Ata gagi ngai ngaja.

person old  talk
‘The elders are talking.’ (Arka 2011: 56)

Because contextual morphology is usually discussed in reference to afﬁxal lan-

guages, it may seem unremarkable that the Chinese languages have very little
marking of case and grammatical relations. However, even a largely monosyllabic
language like Akha (Sino-Tibetan) marks ergativity with free morphemes:

(10) ŋà nɛ àjɔq áŋ áshì thì shì biq ma.

I  he  fruit one  give 
‘I gave him one fruit.’ (Hansson 2003: 243)
Therefore, the radically analytic languages I discuss resemble creoles not simply in
being analytic, but in also retaining a particular kind of morphology as free
morphemes while eschewing the other kind. In this, these languages can be seen
as harbouring evidence of adult acquisition.

10.4 Argument No. 2: analytic language

as an unnatural state

Especially given how familiar it is to linguists that Modern English is so much

more analytic than Old English, it may seem unexceptionable that, by chance,
some languages might shed all of their inﬂectional afﬁxation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     275

10.4.1 Grammaticalization is unceasing

I will present three observations which, together, suggest that a language can only
lose all of its bound inflection via external intervention.
First, the emergence of new grammatical items via grammaticalization pro-
cesses, as well as reanalysis, is a constant in the life cycle of a language. More to the
point, there is no indication in the grammaticalization literature that the process
only operates in a subset of languages, or that the process is given to halting for
long periods. Grammaticalization can be taken as equivalent to the movement of
bodies in the theory of physics: just as stasis under this formulation is irregular, we
can assume that in language change, the cessation of grammaticalization, indicates
the death of the language. To wit, grammaticalization is unceasing.
Second, following from this point is that there is no reason that while a
language were losing bound inflection, the development of new inflection via
grammaticalization would not be occurring simultaneously. Put differently, dia-
chronic theory knows no reason that there would be such a cessation. Moreover,
empirical evidence demonstrates its opposite. In Romance, the erosion of Latin’s
future marking suffixes was paralleled by the emergence of new ones from the
grammaticalization of habere ‘to have’ (as well as a new conditional marking
paradigm). Also, Italian developed new noun inflectional classes as original
ones were lost (Gardani 2013). In the Kartvelian language Svan, declension
marking suffixes proliferated amidst its loss of some of Common Kartvelian’s
original concord machinery (Harris 2004: 152–5). In Swahili, past marking prefix
li- grammaticalized from a locative verb as the Common Bantu equivalent
a- (Nurse 2008: 257) wore away (McWhorter 1994: 62–3). Affixes and paradigms
change function as often as they disappear (cf. Mukarovsky 1977: 32–5; Harris &
Campbell 1995; Good 2012a).
Third, following in turn from the above point, languages do not ‘cycle’ through
stages of radical analyticity followed by the development of new inflections which
eventually wear away such that the cycle begins again. That linguists sometimes
suppose so would seem to be due to a ‘folk’ interpretation of Hodge (1970) on
Egyptian, which actually showed a phase of relative analyticity, nothing approach-
ing radical. Meanwhile, no cycle through radical analyticity has been demon-
strated elsewhere. As Dahl (2004: 261–88) notes, the absence of such a cycle has
been explicitly noted in Afroasiatic, Uralic, and Altaic, and meanwhile specialists
in language groups worldwide report no such cycles.
In sum, grammaticalization is analogous to crocodiles’ and fishes’ teeth, which
are continually replaced throughout life. These animals do not ever reach a
toothless stage. If one were encountered toothless, we would know that this was
the result of an external disruption. We would neither venture that it was a normal
development nor expect it to develop a mouthful of new teeth overnight.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

276  . 

With the three above observations, grammars such as Yoruba, Mandarin, and
Rongga become puzzles. Adult acquisition is the only mechanism which has been
empirically documented to shave away all or almost all of a language’s bound
inﬂection. There are no documents of radical analyticity’s emergence in East and
Southeast Asia, Indonesia, or West Africa: in all cases, the languages are radically
analytic by the time they were committed to writing. I suggest that a solution to
the puzzle that these other languages pose is that they, too, were born of adult
acquisition.

10.4.2 Unstressed ﬁnal syllables do not lead to

the typology of Chinese

Two common conceptions must be addressed.

First, is withdrawal of stress from ﬁnal (or initial) syllables a possible reason for
a language becoming radically analytic? Two answers beckon:

1. This account would neglect that bound affixation often includes vowel
changes within the root. A great deal of English’s inflectional morphology,
for example, is indicated with the root vowel changes in the past forms of
verbs. Even if destressing the final syllable had denuded English of all
inflectional suffixes, the vowel changes in the strong verb roots would
have remained.
2. Lack of stress on the final syllable is not as regularly destructive of inflectional
morphology as often supposed. Withdrawal of stress from the final syllable is
common in Indo-European, and usually the result has been languages that
have remained richly suffixed. Baltic and Slavic preserve a great deal of
Proto-Indo-European nominal morphology, and yet, for example, West
Slavic fixed its accent on the first syllable several centuries ago. Armenian
has fixed the accent on the penult, and yet retains a rich declensional system
and robust verbal inflection. A considerable degree of unaccented word-
final inflection has survived in Icelandic. In Celtic, when the accent was
retracted from endings, Goidelic (such as Irish and Scots Gaelic) retained
much verbal inflection and a degree of nominal. We must also consider the
Romance languages other than French, such as the Iberian languages and
Italian, in which unstressed inflectional suffixes are prolific and robust.

10.4.3 Inﬂection is more quickly lost than gained

The second conception we must address is a possible misinterpretation. My claim

that radical analyticity is an unnatural state for a language must not be taken to
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     277

mean that it is incompatible with human cognition. In fact, many would recon-
struct that language emerged uninflected (e.g., Comrie 1992). However, the
development of grammatical affixes is a slow process. As Dahl (2018) notes, a
mere few inflectional affixes are documented to have emerged in Europe over the
past 2,000 years.
Thus while a single instance of disruption, such as the inmigration of a large
population of adult learners, can eliminate a language’s bound inflection (many
creoles) or vastly reduce it (English) in one stroke, the nature of grammatical-
ization conditions no reason to suppose that new affixes would emerge immedi-
ately. In fact, theoretically, this is what we would not expect.
Yet the radically analytic languages I have referred to do show signs of
grammaticalization, albeit the forms are not yet bound ones. This, too, is what
we would expect, and would find puzzling if absent. In Fongbe, an imperfective
marker wὲ has emerged, likely from a postposition, which in the modern language
could be treated as an inflection:

(11) Kɔkú ɖò àsɔ́n ɔ́ ɖù wὲ.

Koku be.at crab  eat 
‘Koku is eating the crab.’ (Lefebvre & Brousseau 2002: 96)

In Palu’e in Central Flores, a new ﬁrst-person singular subject marking clitic has
developed (Donohue 2009).
In Mandarin, since the seventh century  (Li & Thompson 1976), the marker
bǎ has emerged from the meaning take:

(12) N1̌ bǎ jiǔ màn-màn-de hē.

you  wine slowly drink
‘You drink the wine slowly.’ (Li & Thompson 1981: 464)

In a future stage of Mandarin this, as well as other items that cleave closely to roots
such as nominalizer zi, could become bound morphemes.
Also, in Mandarin, the modern usage of numeral classifiers began developing in
the second century  (Norman 1988: 115–17), and diachrony has rendered
them quite often semantically unpredictable. Zhī is used with animals (although
only some of them) and birds, but is also used with eyes, hands, suitcases, and
boats. Tiáo is most immediately identified with long, thin things; less likely to
come to mind is that it is also used with proposal, voice, scheme, and ‘piece of
news’. Bă is used with things that one holds such as knives and teapots, but also
with chairs—and the experience of aging (niánjì). As such, Gao (1998) notes that
Mandarin speakers’ mental representation of classifiers is subdivided between
three classes of association, one transparent, one prototypical (metaphorically
extended in a synchronically processible fashion) and one arbitrary. This can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

278  . 

analysed as the emergence of grammatical gender—that is, contextual morph-

ology. (Cf. Grinevald & Seifart 2004 on the likeness of noun class marking and
grammatical gender.)

10.5 Argument No. 3: radical analyticity is rare

Because linguists tend to be familiar with analyticity from the textbook example of
Chinese, as well as from any acquaintance with creole languages, it can seem that
analyticity is as likely a state in a language as any other.
However, this is not true when it comes to, especially, radically analytic
languages. Outside of creole languages, where we take it as uncontroversial that
adult learning was the cause of the analyticity, radically analytic languages are
actually rare. Donohue & Denham (forthcoming) in their survey World Atlas of
Language Structures, ﬁnd none outside of the areas I have cited.
If we treat Sinitic as about ten languages, Hmong-Mien as about twenty (a high
estimate according to most accounts), Tai-Kadai as about a hundred according to
Ethnologue, and treat about 130 of the 168 Austroasiatic languages tabulated by
Ethnologue while subtracting Munda and Aslian (again, yielding a likely high
tally), then in East and Southeast Asia there are about 260 radically analytic
languages. Furthermore, the analyticity of these can be treated as tracing to the
analyticity of Chinese alone (McWhorter 2016). In the meantime, outside of these
languages, the tally of radically analytic languages in Africa, Flores, Timor, and the
island of New Guinea is about three dozen at most.
How often the linguist encounters sentences of Mandarin, plus how familiar
creole languages have become within the ﬁeld, can distort our sense of the bigger
picture. There would appear to have never been reported a radically analytic
indigenous language in:

1. North America, South America, or Australia

2. The four families indigenous to all of Africa other than a tiny pocket of
languages in one of those families
3. Dravidian, Uralic, Altaic, the Caucasian families, Yeniseian, or any
‘Paleosiberian’ group
4. Indo-European.

A feature manifested in a mere few hundred of the world’s 7,000 older (as
opposed to creole) languages qualiﬁes not as an ordinary result (‘Language
X simply lost its inﬂections’) but as an unusual circumstance. This is even more
the case if the feature manifests itself in solely a few dozen of 7,000, the result if we
count the analyticity of the Sinosphere as an areal feature spread from Chinese. It
is clear that radical analyticity is not a state that a language reaches easily and, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     279

fact, everything we know about how languages transform over time makes it
difficult to see how such a state could occur in stepwise fashion.
However, the origin of radical analyticity in acquisition by adults is richly
observed and thoroughly predictable. The scientific benefit of cordoning off creoles
into this origin scenario while assuming an unspecified different one for other
radically analytic languages is unclear. This bifurcated approach would be appro-
priate if there were evidence that large-scale acquisition of a language by adults
was impossible before the emergence of the transatlantic slave trade in the
fifteenth century . Obviously, however, there is not. Rather, we could treat
creoles as revealing to us how other languages reached a state which, according to
observable processes of stepwise grammar-internal evolution, is a mystery.
In short, the common idea that a given language simply ‘lost its inflection’ is
less coherent than it seems. Lack of stress on final syllables vastly undershoots
what would be necessary for a language to reach a radically analytic state, and
languages are not empirically recorded to undergo such a process short of
extensive acquisition by adults.
I will finally discuss two counterproposals to my reasoning.

10.6 On claims dissociating creolization from ossiﬁed

acquisitional capacity

Some creole specialists have attempted a dissociation between even creolization

and the effects of adult acquisition. Mufwene (2001), Aboh & Ansaldo (2007), and
Aboh (2015) propose a theoretical economy of a different kind: that creole genesis
is simply a matter of language mixture, with simplification playing no more
significant a part in creoles’ birth than in how languages change elsewhere
worldwide. Mufwene, for example, proposes (2001: 80–105) that there is no
qualitative distinction between the emergences of standard English, African-
American Vernacular English, and Gullah Creole English: all were the result of
the mixture of features within the ‘ecology’ of the linguistic contexts in which they
emerged, analogously to the mechanisms of population genetics.
The idea that the association of creoles with pidginization has been a mistake
has become familiar among linguists, to the point that I must spell out that my
assumptions will not incorporate this proposal, often termed the ‘Feature Pool’
hypothesis. This hypothesis is motivated partly by a claim that while creoles’
analyticity—such as that of Sranan Creole English or Haitian Creole French—may
seem to contrast with European languages’ morphology, in actuality English is
only moderately inflected, spoken French is much less inflected than its written
version suggests, and meanwhile the substrate languages of many creoles are the
radically analytic ones abovementioned, such as Gbe and Yoruba.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

280  . 

The implication is that these creoles are analytic simply because their source
languages were: as Mufwene specifies (2009: 386), ‘The extent of morphological
complexity (in terms of range of distinctions) retained by a “contact language”
largely reflects the morphological structures of the target language and the par-
ticular languages that it came in contact with’.
However, an equal number of creoles are based on robustly inflected Iberian
languages, and/or have robustly inflected substrate languages such as Bantu, West
Atlantic, Nilo-Saharan, and even Austronesian languages, and yet are as analytic
as Sranan and Haitian. Linguists supporting the Feature Pool hypothesis have yet
to respond to such observations, such as that while Palenquero Creole Spanish was
created by Kikongo speakers, such that both of the languages in the ‘pool’ were
heavily inflected:

(13) Kikongo (Bentley 1887:526) (8 = noun class 8 plural)

O ma-tadi ma-ma ma-mpembe ma-mpwena
 8-stone 8- 8-white 8-big
i ma-u ma-ma tw-a-mw-ene.
 8-that 8- we-them-see-

(14) Spanish
Est-a-s piedr-a-s grande-s y blanc-a-s
-- stone-- big- and white--
son las que hemos visto.
.3 ..  have.1 see..
‘These great white stones are those which we have seen.’

Palenquero is yet a highly analytic language. The facts are similar with all of the
Portuguese-based creoles, as well as Nubi Creole Arabic and the Aboriginal
English-based creoles of Australia. Chinook Jargon creolized as well, and despite
its source languages all being richly inﬂected, the creole version was as analytic as
Sranan and Haitian (Grant 1996). Adherents of the Feature Pool hypothesis have
not responded to such observations, and it is difﬁcult to see how their framework
could accommodate them.
In this presentation, therefore, I maintain on the basis of the argumentation
I have presented that adult acquisition does play a decisive and diagnostic role in
creole genesis. My aim is to extend this analysis to languages other than creoles.

10.7 On a phonological pathway to radical analyticity

Hyman (2004) proposes a grammar-internal diachronic pathway to radical ana-

lyticity. He reconstructs that what caused the difference between verbs in the GYN
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     281

languages, usually monosyllabic or at most bisyllabic, and the heavily affixed ones
in Narrow Bantu languages was the development of a phonological template
disallowing verbs of more than two syllables.
I suggest that extensive adult acquisition is a preferable explanation for both the
GYN verb’s lack of inflection and its phonotactics.
For one, the process Hyman describes has been proposed, to my knowledge,
nowhere else. Hyman’s account, in that light, is more descriptive than explanatory.
That is: the literature on language change does not record it as a crosslinguistic
commonplace that languages permitting richly multisyllabic words gradually take
on a phonological ‘template’ limiting words to one or two syllables, with this treated
as an ordinary phonological development alongside processes such as nasalization
or resyllabification. I submit that an adult acquisition account has more explanatory
power.
Second, the templatic account contravenes the tendency for languages to resist
letting phonological processes eliminate grammatical morphemes. Hyman’s
account requires that speakers of a language ‘drifted’ into a disyllabic or mono-
syllabic restriction even on the pain of eliminating grammatically crucial affixes,
replacing them with free morphemes—despite linguists’ well-known findings that
speakers resist phonological erosion when it threatens grammatical morphemes
(cf. Guy 1991; Carstairs-McCarthy 2010). Counterproposals to some reported
cases of this morphologically conditioned sound change (Hill 2014) have not
disproven the tendency itself.
Third, pidginization, specifically, explains the GYN situation as well as a
templatic explanation, and even better, in proceeding from an empirically
observed phenomenon. To wit, the reason words might become radically, as
opposed to modestly, shorter in a language, to such a degree as to force a vast
restructuring of the grammatical system, is the language’s transformation by non-
native acquirers who are less likely to master lengthier words (as well as gram-
matical features). To the extent that the GYN languages restrict their verbs to a
maximum of two syllables, it is relevant that, as pidgin specialist Mühlhäusler
(1997: 140) puts it, ‘There appears to be a tendency in most stable Pidgins,
whatever their sub- and superstrata languages and whatever their jargon prede-
cessors, to favour open syllables and words of the canonical shape CVCV.’

10.8 Conclusion

My goal has been to demonstrate the arguments for, and advantages of, assuming
that radical analyticity traces solely to extensive adult acquisition. Under this
analysis, radical analyticity sparks a search for sociohistorical factors that would
entail such adult acquisition. The processes in question occurred before written
history (otherwise, they would long have been readily apparent) and therefore the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

282  . 

investigation of the relevant sociohistorical factors for the various clusters of

radically analytic languages is still in progress (McWhorter 2016, in preparation).
The advantage to my hypothesis is theoretical economy: rather than positing
two pathways to radical analyticity—one of them mechanically incommensurate
with what is known of how languages change—we could posit a single one. As a
result, radical analyticity could be treated as a clue to social history otherwise
difficult to reconstruct or even unrecoverable. We assume that the featherless bird
has been plucked, not that it has lost its feathers by chance. We might approach
the language devoid of bound inflection similarly, to the benefit of our models of
diachronic change and language contact.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

11
Different trajectories of morphological
overspeciﬁcation and irregularity under
imperfect language learning
Aleksandrs Berdicevskis and Arturs Semenuks

11.1 Introduction

11.1.1 Why study complexity?

In the Introduction, Arkadiev & Gardani (Chapter 1, this volume: 5) list four most
important open questions in the study of morphological complexity. In our view,
the first three questions become important and interesting only as a means to
answer the fourth question, which could be reworded as ‘How is morphological
complexity related to socioecological factors?’. The true value of this question is
not even that it relates morphology and extralinguistic characteristics of the
environment in which the language is spoken, but that it makes complexity
more than a mere parameter of crosslinguistic variation. Complexity becomes a
parameter involved in explanatory theories, giving us the possibility to use it in
order to understand how language is structured. As was discussed in the
Introducton, in these theories complexity is a dependent variable, while socioeco-
logical parameters are predictors. This means that if the theories are correct, we can
better understand why linguistic structures are distributed across languages the way
they are, how the processes of language change and social interaction are structured
and work together, and how language is organized and functions in the brain.
If not for this explanatory attempt, the first three questions from Arkadiev and
Gardani’s list (Can we define morphological complexity? Can we find an under-
standing of morphological complexity which would be applicable to all languages
and quantify this understanding? Can we compare and typologize languages in
terms of morphological complexity?) would, in our view, be better described as
brain teasers rather than research avenues. Brain teasers are not at all useless, but
given how notoriously difficult it is to address these particular questions, it would
hardly be possible to expect that the potential benefit of finding answers would
outweigh the required effort. Arkadiev and Gardani provide examples which

Aleksandrs Berdicevskis and Arturs Semenuks, Different trajectories of morphological overspeciﬁcation and irregularity
under imperfect language learning In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani,
Oxford University Press (2020). © Aleksandrs Berdicevskis and Arturs Semenuks.
DOI: 10.1093/oso/9780198861287.003.0011
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

284     

suggest that Lithuanian nominal inﬂection is more morphologically complex than

Turkish. Does this claim per se yield any new information compared to what can
be learned from descriptive grammars of these two languages?
The fourth question, however, changes everything. If we can explain which
factors likely have contributed to Lithuanian being more complex than Turkish,
then the game is worth the candle. That gives us the incentive to ponder about
what complexity is and to search for the means of operationalizing it. Our
contribution to this volume should be read with this value system in mind.

11.1.2 What is complexity?

Having this incentive to deal with all the questions from Arkadiev and Gardani’s
list, let us briefly outline what we mean by complexity in this chapter. As most
would agree, complexity is a multi-faceted phenomenon, and a language can be
complex in several different ways. This volume contains a variety of perspectives
on and approaches to complexity, see Dahl (Chapter 13, this volume) for an
overview. Trying to tackle all aspects of it simultaneously, however, is likely to
hinder progress rather than aid it. In order to usefully limit the scope of this
particular investigation, we will concentrate on two of the facets of complexity that
are, in our view, most crucial: overspecification and irregularity.
We define overspecification as overt and obligatory marking of a semantic
distinction that is not necessary for communication, following McWhorter’s
(2007: 21–8) understanding. The problem with this definition is that it is not at
all obvious what is necessary for communication. McWhorter makes inferences
about what is necessary by comparing the grammars of different languages. If
many of the world’s languages have neither subject-verb agreement nor any
apparent means to compensate for the lack of it, it seems reasonable to hypothe-
size that this feature is redundant and that languages that do possess it have
overspecified grammars.
A more direct way to find out what is necessary would be to run psycholin-
guistic experiments. MacWhinney et al. (1984), for instance, find that Italian
speakers do use the subject-verb agreement markers when establishing semantic
roles in a sentence. Note that this finding does not necessarily contradict the claim
that agreement is an instance of overspecification. That a feature is useful does not
mean it is necessary. Fortunately, in this chapter we will be dealing with an
artificial language where it is obvious what is overspecification and what is not
(see section 11.2).
Another facet of complexity we will discuss is irregularity (McWhorter 2007:
33–5). A linguistic system is irregular to the degree that it cannot be described by
exceptionless deterministic rules. Such a system can also be described as predict-
able and consistent. Intuitively, it is usually quite obvious whether a linguistic
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     285

system is regular or not. Systems, however, can be irregular to very different

degrees. While in theory it is clear that the fewer rules are required to describe a
system, the simpler these rules are and the fewer exceptions they have, the more
regular the system is, in practice it is usually difficult to rank several irregular
systems in order of their (ir)regularity, and even more difficult to quantify it.
Again, for the artificial language in this chapter, this task is simpler than it would
be for real languages.
Other facets of complexity exist, but some of them are reducible either to
overspecification or irregularity, while others are, in our opinion, less ubiquitous
and salient. Importantly, overspecification and irregularity are not reducible to
each other. It is easy to imagine a system which has little or no overspecification
but is irregular, and it is equally easy to imagine a highly overspecified but fully
regular system (though these are not that frequent in real languages).
This understanding of complexity, however limited and simplified it is, enables
us to test specific hypotheses about the typology and diachrony of morphological
complexity.

11.1.3 How to study complexity?

Various hypotheses have been proposed to explain the distribution of morpho-

logical complexity among the languages of the world. The ones that arguably have
the strongest empirical support and have the most lively discussions in the
literature are those that suggest the existence of a causal link between a large
proportion of non-native speakers in the population and morphological simplifi-
cation (Dahl 2004; Wray & Grace 2007; McWhorter 2007 and Chapter 10, this
volume; Trudgill 2011; Dale & Lupyan 2012).
The evidence in favour of this hypothesis comes mostly from typological
surveys, though rigorous quantitative studies (e.g., Parkvall 2008; Szmrecsanyi &
Kortmann 2009; Bentz & Winter 2013; Bentz et al. 2015) are a minority among
them. Correlational studies of this kind are necessary, but not sufficient (Tily &
Jaeger 2011; Nettle 2012), as other types of evidence are required to demonstrate
and explain the causality (Ladd et al. 2015; Roberts 2018).
Experimental approaches, in particular iterated artificial language learning (IALL)
(Kirby et al. 2008), can be an efficient means to model the simplification and
complexification processes. In a typical IALL setting, a constructed mini-language
is learned by a participant within a limited amount of time, then this participant’s
linguistic output is used as linguistic input (i.e., training data) for the next participant,
and then the iteration is repeated. If the output of the participants in generation n
differs from their input, then the participants in generation n+1 will learn a changed
version of the language. This design enables us to observe language evolution in
miniature, as the language changes, being transmitted over ‘generations’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

286     

The IALL approach does have its limitations. The possibility to observe language
change in the laboratory and to have the full control over the environment comes
at the price of naturalness. The artificial languages are by necessity small and
relatively simple, and the learning usually takes less than one hour. Nonetheless,
while the experimental results should be treated with due caution, they can be a
valuable complement to the typological surveys.
Suppose a typological study shows a correlation between a proportion of non-
native speakers and absence of inflectional morphology, and suppose its data and
methods are completely reliable and trustworthy. Even in this best-case scenario,
we still do not know whether there really exists a causal link between non-native
acquisition and simplification (though we have good reasons to hypothesize that).
Moreover, we do not get an insight into how exactly adult acquisition facilitates
simplification (if it does). An iterated learning experiment can serve as a means
both to test the presence of the causal link and to identify a potential causal
mechanism.

11.1.4 Why does complexity decrease?

Bentz & Winter (2013: 3–4) list three potential mechanisms of contact-induced
case loss (which can be generalized to other instances of morphological simplifi-
cation): imperfect acquisition by adult learners; the tendency of native speakers to
reduce morphosyntactic complexity of their speech when talking to foreigners; the
tendency of loan words to combine with more productive inflections, forcing the
least productive ones out (Barðdal & Kulikov 2009). The first mechanism from
this list seems to be mainstream in the typological, sociolinguistic, and evolution-
ary literature (Nettle 2012). Indeed, in the literature on language acquisition, there
is a consensus that morphology is hard for non-native learners, and that concerns
both production and perception, both tutored and untutored learners (DeKeyser
2005: 6–7).
The main factor causing simplification then is presumed to consist in the
differences between native (child) and non-native (adult) language acquisition.
However, given this, another question arises: what aspects of these differences and
what conditions are necessary to cause simplification? How deep into these
differences do we have to delve in order to find a proper explanation? It is possible
that deep differences in cognitive biases between children and adults have to be
invoked, together with nuanced properties of social network structure or other
cognitive processes besides learning.
However, it is also possible that the answer lies on the surface: children can
(usually) master a language perfectly, while adults (usually) cannot (Bley-Vroman
1989: 43–4), and that by itself is enough to provoke simplification processes. It
seems safe to claim that imperfect learning is one of the driving forces behind
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     287

simplification. Can we go further and assume it is the only driving force? While
this hypothesis may be too simplistic, it is reasonable to start the search for
explanations and mechanisms by testing it.
In this chapter, we analyse the data from Berdicevskis & Semenuks (submitted),
one of the largest-scale (in terms of the number and the length of transmission
chains) IALL experiments so far that directly address linguistic complexity. In
Berdicevskis and Semenuks (submitted) we showed that imperfect language
learning by itself reduces overspecification. Here we focus on irregularity (see
1.2) and show that it behaves differently from overspecification. We also investi-
gate how the two facets of complexity interact with learnability of the language.
In section 11.2, we summarize the methodology of Berdicevskis & Semenuks
(submitted). In section 11.3 we describe the trajectory of overspecification, and in
section 11.4, that of irregularity. In section 11.5, we draw on the existing know-
ledge about language acquisition to explain the observed differences. In section
11.6, we conclude.

11.2 Materials and methods

In order to investigate whether imperfect learning could lead to higher rates of

morphological overspecification loss, we designed and ran an IALL experiment.
As mentioned in section 11.1.2, the approach provides the opportunity to model
language change in a controlled experimental setting.
Each transmission chain contained 10 generations, and each generation con-
sisted of a single participant. After the initial instructions, in the training stage of
the experiment the participants learned an artificial language, that is, learned to
match 16 ‘sentences’ to 16 stimuli pictures. After that, in the testing stage the
participants first matched sentences with their appropriate pictures and then
produced sentences that they considered to correspond to the each of the indi-
vidual pictures. The set of all of the sentences that they produced in the last part of
the experiment was used as the learning input language for the next generation.
The initial artificial languages that we generated as input for all of the generation 1
participants contained a redundant agreement marker that was not necessary in
order to identify which picture corresponded to each sentence. In order to
investigate whether imperfect learning could lead to the loss of morphological
overspecification (in our case – the semantically redundant agreement marker),
the amount of time given to the participants to learn the language was manipu-
lated between three different types of transmission chains. In the normal condi-
tion all chains contained an amount of time that pilot experiments suggested to be
sufficient to fully learn the language, in the temporarily interrupted condition the
generation 2-4 participants received less time to learn the languages, and in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

288     

permanently interrupted condition chains all participants after the first generation
received less time. A more detailed description is given in section 11.2.2.
Before that, however, we want to note the apparent fact that the IALL approach
lacks ecological validity due to a variety of both quantitative and qualitative
differences between language learning in an experimental setting and in the real
world. Because of that, the claims that one makes based only on IALL experiments
need to be tempered. Taken as a piece of a larger picture, however, they provide
important supporting evidence and new perspectives on the questions of interest.
In the context of the current study, in particular, although we ultimately are
interested in differences between native and non-native acquisition, we are not
contrasting adult and child learners in our experiment. However, since we are
interested in whether the difference between normal and imperfect learning by
itself can be a sufficient cause for morphological simplification, we consider our
model to possess the necessary external validity.

11.2.1 Artiﬁcal language structure

Each of the sentences in the languages learned by the participants identiﬁed a

picture. We will refer to the set of all pictures as the languages’ meaning space (see
Figure 11.1). The meaning space had three dimensions, that is, three characteris-
tics that each of the sixteen pictures could be uniquely identified by: the agent
performing the action (round animal or square animal), the number of agents
(one or many) and the action being performed (no action, falling apart, growing
antlers or flying).
The structure of the initial input languages (we will refer to them as generation
0 languages) is represented in Figure 11.1.¹ The sentences in the languages trans-
parently mapped onto the meaning space: the noun stem identified the agent, the
plural marker (or its absence) identified the number of agents, and the verb stem
(or its absence) identified the action. Importantly, the agreement marker is
semantically redundant, in the sense that its omission would not affect the
identification of the correct picture in the meaning space – the picture is uniquely
specified by the other three morphemes. Thus, in the generation 0 languages the
agreement system is an instance of morphological overspecification. See Di Garbo
(Chapter 8, this volume) for a detailed study of the changes of gender-agreement
systems in a sample of real-world languages in relation to complexity.

¹ We used ﬁfteen different isomorphic languages, as is common in IALL experiments. When

reporting results, however, we orthographically map all the languages we have onto the example
language in Figure 11.1: the ﬁrst letter of the word for the round animal in the chain’s generation 0
language becomes s, the second letter becomes e, and so on. This procedure makes the comparisons
between languages easier while preserving all the information about the changes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     289

agent: round animal agent: square animal

singular

segN fuvN
event: none

plural

segN-lPL fuvN-lPL

singular

segN mV-oAGR fuvNmV-iAGR

event: fall apart

plural
segN-lPl mV-oAGR fuvN-lPL mV-iAGR

singular
segN rV-oAGR fuvN rV-iAGR
event: grow
antlers

plural
segN-lPL rV-oAGR fuvN-lPL rV-iAGR

singular
segN bV-oAGR fuvNbV-iAGR

event: fly

plural
segN-lPL bV-oAGR fuvN-lPL bV-iAGR

Figure 11.1. The meaning space of the experimental languages with the
corresponding sentences from an example generation 0 language
Notes: Subscript N denotes noun stems, V = verb stems, PL = plural marker, AGR = agreement marker.
Morphemes are hyphenated and subscripts are provided for clarity’s sake. Glosses for the meanings of
the sentences are provided in parentheses.
Source: Adapted with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

290     

11.2.2 Experimental procedure

After the initial introductory instructions, the participants learned the language in
the training stage of the experiment. The stage consisted of a number of training
blocks interspersed with interim test blocks. In the training blocks, the partici-
pants saw all of the pictures from the meaning space, which were presented in a
random order and accompanied by the sentence corresponding to the picture in
the participants’ input language. Each picture-sentence pair remained on the
screen for four seconds, after which the next pair appeared. In the interim test
blocks the participants were shown one by one eight pictures randomly selected
from the meaning space and were asked to type in the corresponding sentences for
each of them. The instructions preceding the training block prohibited the
participants to take any notes during the experiment.
In order to model the difference between normal and imperfect learning, we
manipulated the number of training and interim test blocks that the participants
received. Normal learner generation participants received six training blocks,
whereas imperfect learner generation participants received three blocks. In order
to investigate how the amount of imperfect learners in a population would affect
the tendency to eliminate morphological overspeciﬁcation from the language
spoken by its members, we compared the development of generation 0 languages
in transmission chains in three different conditions: normal, temporarily inter-
rupted and permanently interrupted. Figure 11.2 illustrates the differences in the
numbers of normal and imperfect learner generations between the conditions.
Since the experiment contained 15 generation 0 languages, each of which was
used once in each of the three experimental conditions, and each of the

Normal transmission
L L L L L L L L L L

Temporarily interrupted transmission

L S S S L L L L L L

Permanently interrupted transmission

L S S S S S S S S S

Figure 11.2. A schematic representation of the chains in the normal (a), temporarily
interrupted (b), and permanently interrupted (c) conditions
Notes: L = generations with long (full) learning time, S = generations with reduced learning time
(imperfect learners). Arrows denote languages transmitted between generations. The very ﬁrst arrows
denote pre-generated input languages for the ﬁrst generation learners.
Source: Reproduced with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     291

transmission chains required 10 participants, we recruited and analysed the data

from a total of 450 participants (140 female, 310 male, mean age = 30.5, SD = 9.2).
The participants were recruited online and took part in the experiment on
a webpage created using the jsPsych JavaScript library (de Leeuw 2014).
Unknowingly to the participants, the web page assigned them to a new generation
in a randomly chosen transmission chain before the start of the experiment. The
experiment was conducted in Russian, and all of the participants self-reported
speaking Russian natively and being at least 16 years old. Because Russian has a
salient gender agreement system of its own, we could be sure the native language of
our participants would not push them to shed agreement in the experiment by itself.

11.3 The trajectory of overspeciﬁcation

The normal transmission chains tended to preserve morphological overspeciﬁca-

tion to a much greater extent compared to chains in either temporarily or
permanently interrupted transmission condition, thus supporting the hypothesis
that a larger share of imperfect learners in a population would lead to the loss of
morphological overspeciﬁcation in the language of that population. In this section,
we present a condensed description of some of the results from Berdicevskis &
Semenuks (submitted), complementing it with some additional observations.

11.3.1 Qualitative analyses

The qualitative analysis of the final languages revealed a general trend for the
structure of the languages to deteriorate. Several reasons could have led to this,
most likely the underestimated difficulty of learning the language even with six
training blocks and the absence of true communicative pressures in the experi-
ment. However, it was not the case that this deterioration of structure was equally
likely to affect all aspects of the language and was equally likely to affect chains of
all three conditions. The agreement system was eroded by the participants much
more often than the other morphological aspects of the system, and this erosion of
structure was less frequent in the chains with normal transmission. Nonetheless, it
is important to keep in mind that the learning was not entirely perfect in normal
condition either. Thus, when speaking about imperfect learning we will mean the
degree of imperfect learning rather than its presence or absence.
The system was fully preserved in just three languages, two of which were
generated in normal condition chains and one in a temporarily interrupted condi-
tion chain, and it was also almost fully preserved in three other languages, all of
which belonged to normal condition chains. An example of a final (generation 10)
language without any damage to the agreement system can be seen in Table 11.1.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

292     

As one can see, the last generation language preserves the generation 0 agree-
ment system fully -o is consistently used to mark agreement with seg, and -i with
fuv. The only deviation from the generation 0 language structure is the loss of the
verb root in one of the sentences of the language (gen. 0 segl ro => gen. 10 segl o),
however, this change still conserved the correct agreement suffix.
The system disappeared, in turn, in fourteen languages, three of which
belonged to the normal condition, five in temporarily interrupted condition,
and six in permanently interrupted condition. An example of a generation 10
language that has fully lost the agreement system can be seen in Table 11.2.
As Table 11.2 shows, the generation 10 language in this chain has fully lost the
-i agreement pattern used for fuv in the generation 0 language, and now uses -o in
all sentences, which now is more reasonably analysed as a part of the verb stems
than an agreement marker. One can also note that one of the noun stems changed
from fuv to fug, likely under the influence of seg.

Table 11.1. An example of a ﬁnal language with a fully preserved

agreement system
Agent Gen 0 Gen 10
round square round square
Event
animal animal animal animal
sg seg fuv seg fuv
-
pl segl fuvl segl fuvl
sg seg mo fuv mi seg mo fuv mi
fall apart
pl segl mo fuvl mi segl mo fuvl mi
sg seg ro fuv ri seg ro fuv ri
grow antlers
pl segl ro fuvl ri segl o fuvl ri
sg seg bo fuv bi seg bo fuv bi
fly
pl segl bo fuvl bi segl bo fuvl bi

Table 11.2. An example of a language with a fully lost agreement

system
Agent Gen 0 Gen 10
round square round square
Event
animal animal animal animal
sg seg fuv seg fug
-
pl segl fuvl segl fugl
sg seg mo fuv mi seg mo fug mo
fall apart
pl segl mo fuvl mi segl mo fugl mo
sg seg ro fuv ri seg ro fug ro
grow antlers
pl segl ro fuvl ri segl ro fugl ro
sg seg bo fuv bi seg bo fug bo
fly
pl segl bo fuvl bi segl bo fugl bo
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     293

In the other chains the initial agreement system substantially deteriorated, but
did leave some remnants in generation 10 languages, which made it difﬁcult to
precisely characterize the level of system erosion in a qualitative yet objective way.
Nevertheless, taking the above ﬁndings together, we can see that chains including
imperfect learner generations were more likely to completely shed the agreement
system and less likely to preserve it.

11.3.2 Quantitative analyses

Here we focus on a speciﬁc quantitative analysis which operationalizes morpho-

logical overspecification in our artificial languages as the expressibility of the only
redundant feature, viz. verbal agreement. Expressibility is defined as the propor-
tion of pairs of sentences where meaning differs in (and only in) the agent, and
where the surface forms of the verbs are different. The concept can be easily
understood by means of Table 11.2. For every language, we ignore the first two
rows (as they have no verbal meanings) and then compare pairwise the two cells in
the other six rows: are the verbs the same or different? In generation 0, the verbs
are always different, and expressibility of agreement would equal 1. In generation
10, the verbs are always the same, and expressibility of agreement would equal 0.
As Figure 11.3 shows, although the expressibility of agreement declined in all
conditions, it declined to a lesser extent in the normal transmission chains. This
pattern is in accord with the qualitative findings reported above. As we mentioned
in section 11.3.1, learning is imperfect in all three conditions, but to a lesser degree
in the normal one.
Taken together, the results of the experiment provided experimental support
for the hypothesis that a large share of non-native learners in the population of
speakers of a language could lead to the simplification of the morphological
structure of that language. More specifically, the study showed that imperfect
learning of a language could lead to the loss of morphological overspecification.

11.4 The trajectory of irregularity

The initial languages used in the study described above are perfectly regular.
While the rule ‘change the verb form depending on the agent’ is redundant, it is
still a rule, deterministic and exceptionless, as are the other properties of the initial
languages. Irregularity in this setup is equal to zero and thus cannot decrease. At
ﬁrst glance, this setup cannot then be used to test any hypotheses about the
potential role of imperfect learning in regularization. Manual inspection of the
evolving languages, however, quickly reveals noticeable changes in irregularity.
Due to the reasons outlined above they always start with an increase, but some
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

294     

1.00

0.75
Transmission
Normal
Overspecification

Temporarily interrupted
0.50

Permanently interrupted

0.25

0.00

0 1 2 3 4 5 6 7 8 9 10
Generation

Figure 11.3. Change of the overspeciﬁcation of agreement, as measured by

expressibility, over time
Note: Shaded regions denote the standard error.

transmission chains show less trivial patterns later. In this section, we present and
analyse these patterns.
Irregularity emerges because participants fail to learn or to apply a certain rule.
Most often, this is the agreement rule, and we will focus solely on the irregularity
of agreement (as we did with overspeciﬁcation in section 11.3).

11.4.1 Probability matching

While the participants often fail to learn the rule that governs the distribution of
the two agreement markers in the initial languages, they seldom ignore the fact
that there are two different markers. When a deterministic distribution rule is not
available to learners, they often resort to probability matching, that is, reproduce
the variants with approximately the same relative frequency as in the input
(Hudson Kam & Newport 2009; Smith & Wonnacott 2010: 447, ﬁgure 1), but
without a clear consistent rule for when to use which variant. Figure 11.4 dem-
onstrates that our participants do the same with the agreement markers. In all
three conditions, the mean relative frequency of the round-animal marker does
not deviate much from the initial 50% (and, consequently, the same is true for the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     295

1.00
Proportion of Round Animal Agreement Marker

0.75
Transmission
Normal

0.50 Temporarily interrupted

Permanently interrupted

0.25

0.00

0 1 2 3 4 5 6 7 8 9 10
Generation

Figure 11.4. Relative frequency of the agreement marker which denoted the round
animal in the initial language of the chain
Note: Shaded regions denote the standard error.

second marker). The narrow error bars show that relative frequencies in the
individual chains do not deviate much from 50% either (i.e., it is not the case
that the mean 50% is a result of half the chains using one marker in 100% cases
and the other half in 0% cases).
Out of our forty-five chains, fourteen lose agreement completely (see section
11.3.1). Some of those completely replace one marker by another, as the language
in Table 11.2, but this happens only in three chains, in the other chains both
markers get reanalysed as parts of the verb stems. The most common scenario is
represented in Table 11.3.
In the final language, all three verbs have only one form. Two (m- and b-)
preserve the original round-animal form with the -o ending, one (r-) preserves the
square-animal form (-i), thus making the relative frequencies of the markers 2/3
and 1/3, respectively. Out of the fourteen agreement-losing chains, nine arrive at
this frequency distribution at the end (counting both cases when it is the round-
animal marker that has frequency of 2/3 and when it is the square-animal one).
Analysis of all the individual chains confirms that while a few chains do replace
one marker by another completely or almost completely, most keep the propor-
tion not too far from 50% throughout all the generations.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

296     

Table 11.3. A language with a fully lost agreement system

Agent Gen 0 Gen 10
round square round square
Event
animal animal animal animal
sg seg fuv seg fuv
-
pl segl fuvl segl fuvl
sg seg mo fuv mi seg mo seg mo
fall apart
pl segl mo fuvl mi segl mo fuvl mo
sg seg ro fuv ri seg ri fuv ri
grow antlers
pl segl ro fuvl ri segl ri fuvl ri
sg seg bo fuv bi seg bo fuv bo
fly
pl segl bo fuvl bi segl bo fuvl bo
Note: seg instead of expected fuv in the third row is not a typo.

It should be noted that in some chains, verb endings different from the original
two emerge. If we calculate denominator of the ratio as the number of all present
verb endings and not just the original two, the general picture does not change.

11.4.2 Irregularity and overspeciﬁcation

While the agreement markers continue to be present as elements of form, they

lose their connection to the meaning (without being replaced by another element).
In order to measure this trend, we pair up the twelve verb forms in the same way
as we did when measuring expressibility (see section 11.3.2) and compare the last
symbols in the verbs of every pair (manual analysis shows that if agreement is
expressed, it is almost always expressed by the last symbol). For every pair of
symbols, we calculate how often it occurs (out of six possible cases). Pairs where the
symbols are the same get lumped together, regardless of what the symbols actually
are. To quantify irregularity, we calculate the Shannon entropy of the probability
distribution and normalize it by the maximal entropy, see Equation (1).

(1) Irregularity = H(SC)/log₂(6), where SC is the probability distribution of

patterns of agreement expression

This measure is similar to Cuskley et al.’s (2015: 215) Sj measure, used to measure the
variability of sub-rules a participant uses in the formation of irregular past tenses.
Consider some examples. In the final language in Table 11.1 there is only one
pattern of agreement marking: {o, i}, and the same is true for the final language in
Table 11.2 (the same-symbol type). Both languages would get an irregularity score
of zero. So would the final language in Table 11.3: while there are two different
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     297

Table 11.4. A language with an irregular distribution of the

agreement markers
Agent Gen 0 Gen 10
round square round square
Event
animal animal animal animal
sg seg fuv seg fuv
-
pl segl fuvl segl fuvl
sg seg mo fuv mi seg mi fuv mi
fall apart
pl segl mo fuvl mi segl mi fuvl mi
sg seg ro fuv ri seg ri fuv ro
grow antlers
pl segl ro fuvl ri segl ro fuvl ro
sg seg bo fuv bi seg bo fuv bo
fly
pl segl bo fuvl bi segl bi fuvl bo
Note: Cases where agreement is preserved are marked in bold.

pairs {o, o} and {i, i}, they both fall under the same-symbol pattern. The language
in Table 11.4, however, is less regular. The strategy here is almost the same as in
Table 11.3 with two exceptions: the verb r- preserved the agent marking in singular,
the verb b- in plural. Hence, there are two patterns: the same-symbol pattern (four
cases) and {o, i} (two cases). The language gets an irregularity score of 0.36.
Irregularity depends on the number of patterns (the more patterns, the higher
irregularity is) and the distribution of their probabilities (irregularity is highest if
all the patterns are equiprobable). Thus, the least irregular language (apart from
the fully regular one, which scores 0) would have two patterns, one of which
occurs only once, and would score 0.25. The most irregular language would have
six equiprobable patterns and score 1. However, this never happens in our data,
the highest observed score is 0.74 (it can be achieved, e.g., by having four patterns:
two that occur twice and two that occur once).
As can be seen on Figure 11.5, unlike overspecification, in all three conditions
irregularity increases rather steeply at first, then starts oscillating around what
seems to be a plateau. In the permanently interrupted condition, there is a rather
steep decrease during the last two generations, in the other two conditions the
peak of irregularity is also closer to the middle (i.e., there is a slight decrease
towards the end), but the difference is small. It is, however, interesting to take a
look at the individual trajectories of irregularity and compare it to those of
overspecification. We do that in Figure 11.6.
In most chains, the initial changes in overspecification and irregularity go in
exactly opposite directions, that is, the two measures seem to be almost perfectly
negatively correlated. Sometimes this trend continues through all the generations
(see, e.g., chains 2 and 13). If, however, the overspecification decreases beyond 0.5,
the measures become positively correlated and subsequently change almost in
unison (see, e.g., chains 22 and 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

298     

1.00

0.75
Transmission
Normal
lrregularity

Temporarily interrupted
0.50

Permanently interrupted

0.25

0.00

0 1 2 3 4 5 6 7 8 9 10
Generation

Figure 11.5. Change of irregularity, as measured by Shannon entropy, over generations

Note: Shaded regions denote the standard error.

This behaviour largely follows from the definition of the measures. There are
two states where the system is fully regular: complete overspecification and
complete absence of overspecification. If the system is closer to the first state
(overspecification > 0.5), almost any mutation would change the two measures in
different directions (if agreement is lost in one case out of six, it is a decrease in
overspecification, but an increase in irregularity), but if it is closer to second state
(overspecification < 0.5), then the measures usually change in the same direction
(e.g., if the two remnants of agreement in the language in Table 11.4 disappear,
both overspecification and irregularity would go down to zero).

11.4.3 Irregularity and learnability

For every generation (apart from the final ones) we estimate how learnable its
language is. The measure of learnability is transmission fidelity, which is obtained
by comparing the language of generation n with the language of generation n+1,
calculating the normalized pairwise Levenshtein distance between the sentences
with the same meanings and subtracting it from 1. We found that, unlike in most
other IALL experiments, learnability clearly decreases over time. If, however, we
look at the learnability as a function of overspecification, we find that it follows a
1 2 3 4 5
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

6 7 8 9 10
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

11 12 13 14 15
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

Figure 11.6 Change of overspeciﬁcation (solid line) and irregularity (dashed line) in verbal agreement over generations in individual chains:
(a) normal condition; (b) temporarily interrupted condition; (c) permanently interrupted condition
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
16 17 18 19 20
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

21 22 23 24 25
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

26 27 28 29 30
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

Figure 11.6 Continued

31 32 33 34 35
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

36 37 38 39 40

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

41 42 43 44 45
1.0 1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

Figure 11.6. Continued

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

302     

Normal learners Imperfect learners

1.00

0.75
Learn ability

0.50

0.25

0.00

0 0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74 0 0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74
Irregularity

Figure 11.7. Learnability as a function of irregularity

U-curve: high when overspeciﬁcation is 1 and 0 (slightly higher at 0), but

noticeably lower at other values.
An obvious reason is that at intermediate overspecification values the system is
almost always irregular. On Figure 11.7, we represent learnability as a function of
irregularity (averaging across chains and conditions, but keeping normal and
imperfect learners separately). As irregularity increases, the learnability indeed
decreases, and the decrease is steeper for imperfect learners.
Thus, to go from a regular overspecified state (learnable) to a regular non-
overspecified state (more learnable), the system has to pass through an irregular
stage (less learnable). The irregular stage can only be avoided if the total loss of
overspecification occurs within one generation, which almost never happens.

11.5 Discussion

That imperfect learning eliminates morphological overspeciﬁcation is not surprising

and fits well with the predictions of the theories discussed in section 11.1.2. It is also
in accord with the knowledge accumulated by acquisition studies. While much is
still unknown about how exactly adult learners are different from child learners and
why it is so, it seems safe to claim that inflectional morphology is difficult for non-
native speakers and often absent in their speech (DeKeyser 2005: 6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     303

The observation that irregularity increases under imperfect learning, contrariwise,

seems to be at variance with the theories discussed in section 11.1.2. It is, however,
not unexpected from the language acquisition and language change perspective.
Loporcaro (Chapter 6, this volume) compares Wolof noun morphology to that
of other Atlantic languages (the subfamily of the Niger-Congo languages). In
Wolof, the system of marking noun classes through initial consonant mutations
typical of other Atlantic languages has largely eroded and has been restructured:
now noun class is marked on function words that modify the noun, for example
articles and demonstratives. However, certain nouns still show remnants of the
previous system and can be optionally marked for number through initial con-
sonant alteration. Thus, we see a pattern reminiscent of our experimental results—
certain systems of noun classification disappear (thus decreasing overspecifica-
tion), but leave irregular atavisms (thus increasing irregularity).
Clahsen et al. (2010) review evidence in favour of the claim that non-native
speakers are less sensitive to morphological structure. They underuse morpho-
logical decomposition and rely more on memorization and lexical storage, even of
the regularly inflected forms. This effect has been found also in highly proficient
non-native speakers which approach native-like performance (Neubauer &
Clahsen 2009). While memorization of separate forms per se does not imply
irregularity, it clearly creates a friendlier environment for its emergence than
does rule-driven form generation.
In this context, it is interesting to look at the finding that non-native speakers of
English produce significantly more irregular past-tense forms in a Wug-task than
native speakers (Cuskley et al. 2015). Cuskley et al., however, argue that the
irregularities are still rule-driven and follow the patterns that exist in the set of
real English irregular verbs. They hypothesize that the effect is explained by the
peculiarities of the non-native input, namely higher relative frequency of the
irregular verbs and their higher salience in the explicit instruction. The contro-
versial conclusion of Cuskley et al. (2015) is that despite the seeming preference
for irregularity, non-native speakers actually prefer rules over exceptions and
simplicity over complexity.
Our data lend modest support to Clahsen’s memorization vs. generation
account. The elimination of agreement implies that our learners fail to do the
full-fledged morphological analysis of their input. Agreement gets affected more
than other features, probably because it is redundant and based on a long-distance
relationship (verb and agent), and both these factors can inhibit learning
(DeKeyser 2005). It is, however, difficult to say whether the normal learners
preserve more agreement because they are more sensitive to the morphological
structure or because they have more time to memorize the forms. We can only
claim that imperfect learning inhibits acquisition of rule-based distributions, but
cannot say how exactly it happens.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

304     

Usage of complex unproductive rules instead of simple productive ones is one

source of irregularity. Another one is the usage of probabilistic rules instead of
deterministic ones. Inconsistent probabilistic usage is typical for non-native
speakers (Johnson et al. 1996). Hudson Kam & Newport (2005, 2009) in a series
of ALL (but not I(terated)ALL) experiments show that if grammatical forms in
the linguistic input are used probabilistically, then adult learners usually repro-
duce the inconsistencies, regularizing only the most infrequent ones in most
complex cases. In one experiment, when adults did impose deterministic rules,
those were mostly ‘rules of omission which served to remove structure from the
language’ (Hudson & Newport 1999: 276). This means that just as with our
participants, those learners decreased overspecification but not irregularity.
Smith & Wonnacott (2010), however, show that regularization can occur if
weak individual biases of adult learners are amplified by iterated transmission. In
an IALL study, they show that transmission chains, but not isolate learners
eliminate unpredictable variation (see also Smith et al. 2017 on how language
use affects bias amplification; Samara et al. 2017 on how sociolinguistic condi-
tioning affects language use by adults and children).
Although our chains are twice as long as Smith & Wonnacott’s (2010) five-
generation chains, we do not see any reliable overall decrease in irregularity (see
Figure 11.5). An important difference between the two studies, however, is that
Smith & Wonnacott’s participants received probabilistic, or truly unpredictable,
input. They saw several signals for exactly the same meaning, and those signals
could be different. In our study, the input is, strictly speaking, deterministic, since
every meaning is represented by one sentence. Thus, while it is possible that, for
instance, ‘fall apart’ will sometimes be denoted by mo and sometimes by mi, the
variation will not be fully unpredictable, it will always be possible to condition it
on something (e.g., agent or number).² This conditioning is likely to protect
variation from elimination.
Note that the conditioned variation is still difficult to learn, and participants
seldom manage to reproduce faithfully the conditioning ‘invented’ by a previous
generation. Instead of eliminating it completely, they replace it by their own
conditioning. It can be argued that the participants treat the input as at least
partly probabilistic (failing to learn the rule behind the distribution of markers,
they nonetheless match the frequencies of markers). The input, however, is not
complex enough to trigger the regularization as in Hudson Kam & Newport
(2005, 2009).
Another reason for the difference from Smith & Wonnacott’s results can be that
our languages are more complex and it is more difficult for learners to converge on
a regular pattern. In addition, the probability of random mutations that can make

² We are grateful to Kenny Smith for bringing this difference to our attention.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     305

language deviate from the regular state is higher in our case (note that Smith &
Wonnacott filtered away certain random mutations that they deemed irrelevant
before passing the input on to participants).
It should also be noted that while there is a clear difference between the
trajectory of overspecification in the normal condition and the interrupted con-
ditions, such difference is absent for irregularity. It can be that overspecification is
more sensitive to the degree of imperfect learning. The effect of irregularity on
learnability, however, seems to be different for normal and imperfect learners: as
irregularity increases, the learnability decreases steeper in the latter category.

11.6 Conclusion

We show that during morphological simpliﬁcation the trajectories of overspeci-

fication and irregularity need not be the same and, moreover, are likely to be
different. Imperfect learning prevents speakers from acquiring certain morpho-
logical rules (especially those that are redundant or particularly difficult) and thus
causes decrease in overspecification but increase in irregularity. Interestingly, the
degree of imperfect learning seems to affect how much overspecification
decreases, but not how much irregularity increases.
The increase in irregularity, in turn, makes languages less learnable (this effect
is stronger for imperfect learners than for normal ones), unless all overspecifica-
tion is eliminated and the system reaches the non-overspecified regular state. Our
chains seldom reach this optimum, probably because the regularization bias is
relatively weak in our participants and the experimental setting suppresses it.

Acknowledgements

The experiment was funded by Faculty of Humanities, Social Sciences and Education
at UiT, The Arctic University of Norway. AB was supported by the Norwegian
Research Council grant ‘Birds and Beasts’ (222506).
We are also grateful to the popular-science portal ‘Elementy’ and its editor-in-chief
Elena Martynova for advertising the experiment, to Tanja Russita for designing the
Epsilon fauna, to Kenny Smith and Peeter Tinits for commenting on an earlier version
of the chapter, and to Peter Arkadiev and Francesco Gardani for inviting us to
contribute to this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

12
Where is morphological complexity?
Marianne Mithun

12.1 Introduction

As linguists, we love discovering order in chaos. Grammatical complexity provides

us puzzles to play with. An assumption underlying some theoretical models of
language has been that the most elegant formal description naturally matches
speaker knowledge. But closer attention to what speakers actually do raises the
question of whether complexity is in fact the same for the analyst, the speaker, and
the language learner. An examination of speech in languages displaying different
kinds of morphological complexity, spoken in language contact situations, sug-
gests that they are not.

12.2 What is complexity?

Dahl (2004, 2017) provides useful surveys of approaches to complexity, distin-

guishing ﬁrst agent-related or relative complexity from objective or absolutive
complexity. Agent-related complexity refers to the effort a generalized outsider
needs to become acquainted with the system (Kusters 2008: 9). Objective com-
plexity refers to (i) the amount of information needed to specify the system
(Kolmogorov complexity); (ii) the length of the description of a set of regularities
or recurring patterns (the effective complexity of Gell-Mann 1994); or (iii) the
number of parts of a system and/or interactions (Miestamo 2008). Dahl further
distinguishes the linguistic material the measures are applied to. System complexity
pertains to what a learner must master in order to become proﬁcient in a language,
presumably including such things as rules and their exceptions. Structural com-
plexity pertains to the complexity of individual expressions, such as the depth of
maximal embedding in a sentence. Corpus complexity measures complexity over
samples of connected speech, such as the Greenbergian (1960) calculations of
degree of synthesis, or average number of morphemes per word.
But morphological complexity is itself not a straightforward matter, as pointed
out by editors of this volume in Chapter 1. To compare degrees of synthesis across
languages, one could measure the average number of morphemes per word over

Marianne Mithun, Where is morphological complexity? In:The Complexities of Morphology.

Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Marianne Mithun.
DOI: 10.1093/oso/9780198861287.003.0012
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 307

comparable stretches of speech. Alternatively, one might compare the maximal

possible number of morphemes per word. In languages with templatic morph-
ology, one could count the number of slots within the templates, or the number of
morphemes per slot, or the coherence of functions of morphemes within slots.
One could count all slots, or only those which are obligatory. The fact that a
morphological structure is templatic could itself be viewed as adding complexity:
it would mean that morpheme order does not follow naturally from scopal
relations and must be stipulated. Discussions of morphological complexity usually
include form/function mappings as well. Deviations from one form : one function
correspondences have been cited as added complexity (Anderson 2015a). Such
phenomena would include fusion, suppletion, syncretism, dependence on lexical
classes, and elements with no discernible meaning.
The very existence of morphological complexity might seem to be counter-
productive, adding useless difficulty to the acquisition and use of language. But
even where the complexity seems arbitrary, the factors which produce it are
not. Perhaps the most important factor is cognitive. Frequently-recurring
sequences of meaningful elements eventually tend to become routinized and
stored in memory as chunks, as described by Bybee & Beckner (2015) and
many others. Over time, the formal and semantic salience of their individual
components fades for speakers, and their forms can erode. Another intriguing
possible factor in the development of complexity, raised by Dahl (Chapter 13,
this volume), Trudgill (2011, 2017), and Dale & Lupyan (2012), is the socio-
cultural context in which a language is used. Small communities, with dense
social networks which persist over long periods of time, might foster an
increase in complexity. If speakers interact regularly with a limited set of
interlocutors, the relative frequency of particular turns of phrase might
increase, setting the stage for routinization and just the kinds of grammatical-
ization processes that underlie complexity. Multilingualism within the commu-
nity might affect complexity as well, but in several possible ways. Intensive,
longstanding bilingualism might lead to an increase in complexity, as early
bilinguals replicate grammatical distinctions of each language in the other,
adding to the total number in each. If, on the other hand, the bilingualism
has a different profile, consisting, for example, of a substantial proportion of
untutored adult learners, there might be an overall decrease in complexity, as
second-language speakers systematically choose simpler, analytic constructions
over more complex, synthetic ones.
Here the fate of morphological complexity under contact is explored in two
languages with slightly different kinds of complexity. The data come from con-
versations among first-language speakers affected to varying degrees by contact.
The implications of the findings are then considered for our larger understanding
of morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

308  

12.3 Central Pomo

Central Pomo is a language of the Pomoan family, indigenous to an area of

Northern California approximately 100 miles north of San Francisco. It shows a
certain degree of morphological complexity, but it would not be considered
polysynthetic in one narrow sense: arguments are not specified within the verb.
Verbs can show other kinds of morphological elaboration, however, including
specification of means/manner, location/direction, various kinds of verbal num-
ber, argument structure (causatives, reciprocals, passives), inchoatives, aspect,
dependency, and more. An example is in (1). Affixes are in bold. (All examples
here are taken from unscripted speech.)

(1) Central Pomo verb structure (Frances Jack, speaker p.c.)

Mu:l bašá ʔel ʔ-áʔ-č’i-n ʔe
that buckeye the ﬁngering-gather-.-.. 
‘When gathering buckeyes,
kúyq’a:l ʔe mu:l m-t ̯’á:-ka-w-aʔ-ya-w.
right.away  that heat-sense---.--
you have to cook them as soon as you get them.’

Complexity can be affected in a variety of ways by the sociocultural context in

which languages are spoken. Trudgill (2011) has proposed that small communi-
ties, with tightly-knit social networks and frequent interaction among small
numbers of participants, could foster the growth of complexity. Enhanced fre-
quencies of recurring expressions could result in routinization and morphologiza-
tion. Language contact can affect complexity in quite diverse ways. Early
bilingualism might increase complexity, as children, who have the least difficulty
in acquiring complex systems, replicate distinctions from one of their languages in
the other. Late bilingualism in a large proportion of a population might decrease
morphological complexity, as adult second-language speakers opt for more ana-
lytic forms of expression. Importantly, the encroachment of one language on
another might have a simplifying effect, as spheres of usage of the endangered
language and frequency of its use are reduced.
Northern California is a recognized linguistic area, with striking structural
parallels across the languages, including morphological distinctions. Communities
have always been small, and exogamy common, so a good proportion of children
were raised in bilingual households. The small communities and longstanding,
intense contact could well have contributed to morphological complexity.
One feature that is widespread across languages of the area is the specification
via verbal prefixes of means/manner/instrument (Mithun 2007). Examples of
their functions can be seen in (2) with the Central Pomo verb root t̯’é:č’ ‘stick
together’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 309

(2) Central Pomo means/manner preﬁxes

t ̯’é:č’ ‘stick together, be alongside each other’
da-t ̯’é:č’ ‘push on something that sticks in your hand’
ʔ-t̯’é:č’ ‘stick on with fingers, as chewing gum under table’
ma-t ̯’é:č’ ‘step on a nail or something that sticks in your foot’
ča-t̯’é:č’ ‘sit on a thorn, put a patch on pants’
h-t ̯’é:č’ ‘stick up a pole, pitchfork, shovel, in ground’
m-t ̯’é:č’ ‘catch fire’
ph-t̯’é:č’ ‘hammer a nail into the wall, nail something on’
pha-t̯’é:č’ ‘something floating downriver gets stuck on bank’
s-t ̯’é:č’ ‘while one is drinking, something gets into the mouth that
doesn’t belong, like dirt or a bug’
ša-t ̯’é:č’ ‘stick a support, as a box, next to something long, like fence
posts stored upright for use’

Two of the prefixes seen in (1) also occur here: ʔ- ‘fine finger action’ and m-
‘involving heat’.
Another widespread feature is the specification of location and direction.
Central Pomo examples of such suffixes are in (3) with the verb čá- ‘run’.
(Perfective aspect is marked here with the suffix -w after vowels and glottal stop
after obstruents. Imperfective aspect here is -an.)

(3) Central Pomo directional sufﬁxes

čá-w ‘run’ (one)
čá-:la-w ‘run down’
čá-:qač’ ‘run up (as up a hill)’
čá-č’ ‘run away’
čá-way ‘run against hither, as when a whirlwind came up to you’
čá-:ʔw-an ‘run around here and there’
čá-mli-w ‘run around it (tree, rock, house, pole)’
čá-mač’ ‘run northward’
čá-:q’ ‘run by, over (on the level), south’
čá-m ‘run over, on, across (as bridge)’

A third area of morphological elaboration in Central Pomo as well as in related

and unrelated but neighbouring languages is a set of sufﬁxes and enclitics that
mark dependent clauses. The markers distinguish what speakers cast as elements
of a single larger event or state () and what they cast as related but distinct
events or states (). In addition, the markers distinguish realis from
irrealis situations. For realis situations, simultaneous or overlapping events and
states are distinguished from those viewed as consecutive (sequential). Examples
of the realis same sufﬁx -(i)n are in (4).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

310  

(4) Central Pomo dependent same

Mú:l ʔe mu:t̯uya, mó da-héle:č’-in
that  3. hole pulling-dig-
Then, they would dig a hole,
hó ʔmáhč’i-n,
hole build.fire-
build a fire,
mi: ʔ=mú:t̯uya lóq’ ts’aqʰáṭ m-čá-la-w-ač’-in,
there =3. thing greens -throw-horizontally--.-
throw green stuff in there,
mi: ʔ=mú:t̯uya mu:l šá ʔel m-ča-la-w-ač’-in,
there -3. that fish the -throw-horizontally--.-
throw those fish in there,
m-ṭ’á-:ka-w-ač’.
heat-sense---.
and cook them.’

Examples of the realis different enclitic =da are in (5).

(5) Central Pomo dependent different

Šé: ʔul ma, yém-aq-’=da
longtime already 2 old--=
‘In the future, when you are older
ʔá: čʰó-w=da,
1. not.exist-=
when I am no longer here,
ma ʔ-yá:q-an-ka-w=ʔkʰe
2. mentally-recognize-.--=
you will see.’

Speakers can vary in their packaging of events as  or . Generally

the kinds of factors that enter into their decisions include continuity versus
discontinuity of topic, place, and time.
The ﬁrst sustained contact between Pomoan speakers and a European language
was in the nineteenth century, when California was a part of Mexico. Contact with
Spanish resulted in the adoption of some nouns, primarily designating introduced
concrete objects, but it had little apparent effect on the morphological complexity
of Central Pomo or its neighbours.
During the twentieth century, schools were established in which children were
required to speak English, and many children were sent away to boarding schools
where they were forbidden to speak Central Pomo. One man born in 1912 recalled
that when he left the community at age 5, pretty much everyone spoke the
language. When he returned ten years later, almost no one used it on a daily basis.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 311

All of the speakers represented here learned Central Pomo as their mother
tongue. All subsequently learned English as well, but they had varied histories. All
ultimately returned to live in Central Pomo communities.

(6) Central Pomo speakers cited here

Speaker 1: Fluent speaker (F), spoke Central Pomo on a daily basis
Speaker 2: Fluent speaker (F), away for a few years as young woman
otherwise spoke Central Pomo on a daily basis (daughter-in-
law of Speaker 1)
Speaker 3: Early Acquisition of Central Pomo
Fairly ﬂuent speaker (F), left at age 18, away for 30 years,
married to a non-speaker, occasional use of Central Pomo
Speaker 4: Early Acquisition of Central Pomo
Less ﬂuent speaker (F), lived in community until age 13,
returned 30 years later, widowed, occasional use
Speaker 5: Early Incomplete Acquisition of Central Pomo
Halting speaker (F), language scorned by father, departure for
boarding school age 5, rare use
Speaker 6: Some early acquisition of Central Pomo
Son (M) of Speaker 1, older brother of Speaker 5, son of non-
speaker, boarding school ages 5–15, rare use

The Central Pomo of Speakers 1 and 2 shows full ﬂuency and articulateness.
That of the others provides some insight into potential effects of contact on
morphological complexity.

12.4 Obsolescence and morphological complexity

With reduction in language use, particularly in situations of contact with a less

synthetic language, we might expect a reduction in morpheme per word ratios.
One way to investigate this hypothesis is to compare the speech of individuals with
differing balances in their bilingualism. As noted, all of the speakers cited here
learned Central Pomo as a ﬁrst language, then later learned English. For a
preliminary comparison of morphological complexity, the speech of Central
Pomo-dominant speakers was compared with that of now English-dominant
speakers during the same conversations, so that the topics of discussion, discourse
contexts, and social setting were constant. Calculations of morphemes per word
revealed surprising results. In one conversation, for example, both Speaker 2 and
Speaker 5 averaged precisely 1.44 morphemes per word! Other comparisons
yielded similar results.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

312  

The nature of the morphological complexity with varying contact effects differs
in several ways, however. One is underspeciﬁcation of certain distinctions regu-
larly mentioned by the most Pomo-dominant speakers. Speaker 4, for example,
who was away from the community for some time and did not use the language
often after her return, made the comment in (7).

(7) Central Pomo direction: Speaker 4

[‘I walked out,]
ʔa: yhé-:n ht̯ow hčé-hče-w.
1. do-. from stagger-stagger-
but I was staggering.’

She used reduplication to describe her staggering, but Speaker 2 later commented
as we were transcribing the recording that a more dominant Pomo speaker would
have used the verb in (8), specifying direction with the sufﬁx -:ʔw- ‘around here
and there’.

(8) Central Pomo direction: Speaker 2

hihčé-:ʔw-an
stagger-around-.
‘was staggering around’

Whether or not the reduplicative strategy used by Speaker 4 is morphologically

simpler than the directional suffix construction suggested by the more fluent
Speaker 2 could be debated. Reduplication for iteration does occur elsewhere in
the language as a derivational process creating lexical items. To Speaker 2, it was
less idiomatic, and the perfective aspect less appropriate than the imperfective.
Speaker 4’s comment could be interpreted as an active innovative extension of
existing patterns, or the symptom of a more limited vocabulary.
Central Pomo verbs contain numerous kinds of number distinctions. One is
inflectional. Imperfective markers, as well as the other aspect suffixes derived from
them, obligatorily indicate subject number: basically -(a)du- for singulars and -(a)
č’i- for plurals. As speakers were discussing the special knowledge Pomo people
have about gathering seafood, fluent Speaker 2 made the first comment in (9). The
fact that she was describing multiple people was clear not just from the plural
pronoun mú:t̯uya ‘they’, but also the plural imperfective suffix -č’i- on the verb
‘know’, the distributive -t̯ay on ‘knowledgeable’ (since each was knowledgeable in
their own right), and the distributive -ay on ‘people’. When Speaker 4 echoed the
thought, she used the singular form of the verb ‘know’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 313

(9) Central Pomo number: Speakers 2, 4

2 Hínt̯il ʔ=mú:t̯uya šá:-t̯’a:ʔ-č’i-w
Indian =3 knowledge-sense-.-
‘Indians know that.’ . . .
Ma: šá:-t̯ay ʔe mu:l hínt̯il čá:č’-ay.
stuff knowing-  that Indian people-
‘They know things, Indians.’
4 Mm. ʔúda:w ma: šá:-t’a:ʔ-du-w.
lots stuff knowledge-sense-.-
‘[He] knows lots of stuff.’

Speaker 2 later commented that Speaker 4 made it sound like just one person is
smart. Speaker 4’s imperfective verb, a frequently-occurring one, was well-formed,
but inappropriately selected in this context.
The speech of less ﬂuent speakers does show morphological complexity. On
another occasion, Speaker 4 offered the explanation in (10) with a complex verb.

(10) Central Pomo morphological complexity: Speaker 4

Mé:n=ʔt̯i ʔa: car čá-:ʔw-an-ka-w=ʔkʰe
so=but 1. run.-around-.--=
t̯ʰi-n ʔi-n.
not- be-
‘That’s why I don’t drive.’

The verb is certainly morphologically complex, but it is highly frequent. Speaker 4

did not assemble it online: she selected it as a fully-formed lexical item. The same
speaker used the verb in (11).

(11) Central Pomo morphological complexity: Speaker 4

ba:-yú:-čʰ-ma-w=ʔkʰe
orally-know--.-=
‘they will understand’

This verb, too, shows some morphological complexity, and it is well-formed. But
the context is revealing. It was part of a conversation among Speakers 2, 3, and 4.
(The full conversation was in Central Pomo. Just the translation of Speaker 3’s
remarks are presented here for context.)

(12) Central Pomo morphological complexity: Speakers 3, 2, 4

3 ‘My daughter says that we don’t want the White people to understand us.
That’s why we speak Indian.’
2 Mú:t̯uya ba-yú:-cʰ-ma-w=ʔkʰe ṱʰi-n.
3. orally-know---= not-.
‘They won’t understand.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

314  

4 Pretty soon ba:yú:čʰmawʔkʰe.

‘Pretty soon they will understand.’

Speaker 4 was echoing the verb just used by Speaker 2.

Morphological complexity like this is not something speakers usually produce
online as they speak. Speakers know lexical items: they know which formations
exist and which do not, and for those that do, they know their specialized contexts
of use. Among the verbs in (2) above with means/manner prefixes was pʰ-ṭʰé:č’,
literally ‘by.swinging-stick’. When asked what this word means, Speaker 2 replied
‘hammer a nail into a wall’. Skilled speakers, who spend a major part of their day
in the language, have larger lexical inventories and an acute sense of the precise
contexts in which items are used. Their awareness of the components of morpho-
logically complex words varies, but in most cases the internal structure of words is
opaque to them. This is not altogether surprising. They rarely if ever saw the
language written, and many morphemes are no more than a single consonant.
Central Pomo contains a passive construction which functions to eliminate a
grammatical agent from the clause. The agent may be generic, unknown, or
unimportant, and it cannot be mentioned. The passive marker is a verbal suffix
-ya. It is added to both transitive and intransitive stems. An example from fluent
Speaker 2 is in (13). She was describing a conversation that had taken place at the
senior citizens’ center. The identity of the eaters was not important; the passive
clause simply served to locate the event.

(13) Central Pomo passive: Speaker 2

Béda maʔá: qa-wá-:ʔ-ya-w=da
here food biting-go-.--=.
‘When (people) were eating here
ʔi’=ma mu:l– Mitch=t̯o ṭʰe-l . . .
be- that Mitch= mother-
she– told Mitch’s mother . . . ’

Slightly less ﬂuent Speaker 3 used a well-formed passive verb, but inappropriately.

(14) Central Pomo passive: Speaker 3

[‘He’s looking for a woman.’]
Má:t̯a-ya q’á:-ya-w ʔe.
woman- leave-- 
‘His wife he was left.’ (For ‘His wife left him.’)

As we later transcribed and translated the conversation, ﬂuent Speaker 2 noted

that Speaker 3 should have either used the basic transitive verb q’á:w ‘left’ or not
mentioned the wife.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 315

Less ﬂuent Speaker 4 used passive verbs in (15).

(15) Central Pomo passive: Speaker 4

Lady oranges qó=de:-ya-w and
hither=carry--
‘A lady brought oranges and
needle qó=de:-ya-w.
hither=carry--
brought needles.’

Here, too, the passive verb forms are well-formed but incompatible with mention
of the agent, the lady. Both (14) and (15) indicate that the speakers were selecting
pre-formed words, rather than constructing them online as they spoke. Example
(15) also reﬂects a smaller lexical inventory. As Speaker 2 later noted, a better
choice for the ﬁrst verb would have been qó=di-w, and for the second qó=be-w.
Different verb roots are used for carrying a single round item (de-), multiple round
items (di-), and long items carried horizontally (be-).
Speaker 4 did use some passive verbs appropriately, as in (16).

(16) Central Pomo passive: Speaker 4

Qʰá:p’-ṭ’á:-ya-w.
pity-feel--
‘Pitiful!’

This is a highly lexicalized, frequent expression.

The speech of less Pomo-dominant speakers differed in another way. As seen
earlier in examples (1), (4), and (5), the language contains a rich set of dependency
markers. Less ﬂuent speakers tend to use less morphological clause combining, as
can be seen in (17).

(17) Central Pomo clause combining: Speaker 3

ʔa: E=t̯o čá-l=yo-w
1. E= house-to=go-
‘I go to E’s house (and)
hínt̯il ʔel ča:nó-d-an=ya mú:t̯u.
Indian the talk-.-.=. 3.
talk Indian to her.’

Speaker 2 later commented that the first verb should have been čályohdu-n,
ending in the realis  event dependency suffix, rather than the perfective -w,
yielding a sentence meaning ‘When I go to E’s house I talk Indian with her’.
Speaker 3’s prosody in (17) reflected this structure: she did not end the first clause
with a terminal fall in pitch or a significant pause.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

316  

This same speaker made the comment in (18).

(18) Central Pomo clause combining: Speaker 3

Mkʰé ba:ʔá čʰo-w
2. food not.exist-
‘Even when you don’t have food
ma mu:lt̯ayat̯’,
2. 3.
bú ʔel ma mu:l fry-č-in, beans ʔel fryč-in . . .
potato the 2. that fry--. beans the fry--.
you fry potatoes for them, fry beans . . . ’

Speaker 2 later commented that she herself would have used the dependent verb
form čʰó-w=da in the first clause, with the realis different event dependency
enclitic =da.
The puzzle remains as to why the joint conversation between fully fluent
Speaker 2 and struggling, English-dominant Speaker 5 should show exactly the
same morpheme per word ratio: 1.44. Speaker 2 actually spoke more during the
conversation, with twice as many words (tokens). Significantly, she used many
more different words. Speaker 5 used just nine different verbs (types), all but four
of them repetitions of verbs just used by Speaker 2.
Overall, there are two main differences between the speech of fully fluent
Speakers 1 and 2 on the one hand, and more English-dominant Speakers 3, 4, 5,
and 6 on the other. The first is lexical knowledge. Fluent speakers who spend more
time in the language know more words and lexicalized constructions. They can
thus make finer semantic distinctions, as with verbs specifying means/manner,
location/direction, and different kinds of carrying, all seen here. The second is that
fluent speakers have more alternatives for shaping the flow of information, with
passives, clause linkers, and discourse particles. A significant difference between
the two groups is in fact the use of discourse particles, which convey such
distinctions as source and certainty of information (hearsay, inference, etc.),
contrast with expectation versus common knowledge, and much more. Fully
fluent speakers use substantially more such particles. Since the particles are
monomorphemic, their pervasiveness lowers the average number of morphemes
per word.

12.5 Mohawk

Mohawk is a Northern Iroquoian language indigenous to the North American

Northeast, currently spoken in communities in Quebec, Ontario, and New York
State. It is prototypically polysynthetic. It is holophrastic in the narrow sense: one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 317

word, a verb complete with pronominal arguments and predicate, can constitute a
full sentence. The verb in (19), for example, would be a complete sentence on
its own.

(19) Mohawk holophrasis: Ima Johnson, speaker

‘We were driving along and saw a sign advertising free feed with chickens.’
Kén: ne:’ ia’akiate’serehtínion’t ne:’ thí:.
ken: ne:’ i-a’-ak-i-ate-’sere-ht-inion-’t-e’ ne:’ thiken
here it.is --1.---drag- it.is that
-be.in--
here it is we two caused our dragger to be in there it is that
‘So we pulled in.’

There are three lexical categories in Mohawk, defined in terms of their internal
morphological structure: verbs, nouns, and particles. (Particles are monomor-
phemic, though they are sometimes compounded.) The morphological structures
are templatic; that of verbs is the most elaborate. The basic verb template is in
Figure 12.1.
Within the blocks of pre-pronominal prefixes and derivational suffixes there
are multiple slots. The prepronominal prefixes include a Contrastive, Coincident,
Partitive, Translocative, Factual, Duplicative, Irrealis, Future, Cislocative, and
Repetitive. The derivational suffixes include an Inchoative, Reversives,
Causatives, Instrumental Applicatives, Benefactive Applicatives, a Directional
Applicative, Distributives, Andatives, and Ambulatives. There are around sixty
pronominal prefixes, three aspect suffixes, and four final tense/mood suffixes.
Nearly all show phonologically and/or morphologically conditioned allomorphy.
As in many templatic systems, there are discontinuous dependencies among
morphemes. Certain verb roots require a Duplicative prefix (), for example. In
some cases, a semantic rationale can be discerned: the Duplicative can indicate
some kind of ‘two-ness’ or a change of state or position, though its occurrence is
lexicalized with each verb. In other cases, any semantic contribution has faded.
Some other verb roots require certain other prepronominal prefixes, in what are
now lexicalized combinations. Another discontinuous dependency holds between
inflectional prefixes and suffixes. The perfective aspect suffix, for example,
requires the presence of a Factual, Future, or Irrealis prepronominal prefix.

PRE-
PRONOMINAL REFLEXIVE NOUN VERB DERIVATIONAL ASPECT TENSE
PRONOMINAL
PREFIXES MIDDLE STEM ROOT SUFFIXES SUFFIXES MOOD
PREFIXES

Figure 12.1. Mohawk verb template

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

318  

12.5.1 Inﬂection

All verbs must contain a verb stem, an inflectional pronominal prefix identifying
the core arguments of the clause, and an inflectional aspect suffix.
There are three sets of pronominal prefixes: grammatical Agents, grammatical
Patients, and transitives, which are Agent>Patient combinations. A transitive
prefix can be seen in (20).

(20) Mohawk transitive pronominal preﬁx: Ima Johnson, speaker

Taionkhí:ion’ kítkit.
ta-ionkhii-on-’ kitkit
.-.>1-give- chicken
‘They gave us chickens.’

An assumption that still sometimes appears in the literature is that speakers

create inflection by rule, because no one could ever remember so many forms. For
Mohawk, the matter is not so simple. Even excellent speakers have differential
control over pronominal prefix—root combinations. Some combinations simply
occur more often than others: first person singulars are very frequent, for example,
while masculine duals are less so. Verb stems beginning with a are very frequent,
in good part because the middle voice prefix, which occurs at the beginning of
stems, has the shapes -at/-ate-/aten-/-an-/-ar-, while those beginning with the
vowel i are relatively rare. Under elicitation, speakers hesitate more with rarer
forms: rarer pronominal prefixes, rarer phonological contexts, rarer full words.
This does not mean of course that they cannot create new forms by analogy.

12.5.2 Derivation

As seen above, the verb allows for morphological expression of a number of

distinctions. Skilled Mohawk speakers tend to exploit these more than less
Mohawk-dominant speakers. An example of this precision can be seen in (21).
The ﬂuent speaker cited above continued her account of the chicken adventure in
the course of a conversation with friends. She and her husband bought some
chickens and built a chicken coop. They enjoyed hearing the rooster crow in the
morning. But one morning the chickens were screaming more than usual. Her
husband suggested that something must be after them, and the couple went to
look. When the husband peered through a hole in the wall, he saw that something
had gotten ahold of one of the chickens, and it was screaming. She suggested he
get his gun. It was a weasel. It had already bitten the chicken on the leg. Her
husband took aim and shot. The weasel looked around, wondering what had
happened. As the wife continued her story, each time she mentioned an event that
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 319

had happened before, she included the Repetitive preﬁx sa- ‘again’ on the verb,
whether or not there was a separate particle á:re’ ‘again’.

(21) Mohawk Repetitive preﬁx sa-: Ima Johnson, speaker

Ó:nen á:re’ ne kwáh sahaié:na’ thi: kítkit.
onen are’ ne kwah sa-ha-iena-’ thiken kitkit
now again the just .-..-grab- that chicken
now again the just he re-grabbed that chicken
‘Still (again) he grabbed onto the chicken.
Ó:nen á:re’ nakwáh taonsaiohén:rehte’ thi: kítkit.
onen are’ nakwah t-a-onsa-io-henreht-e’ thiken kitkit
now again very.much ---.-yell- that chicken
now again very much did it re-yell that chicken
The chicken really screamed.
Ó:nen á:re’ sahate’sennón:ni’ ne rikstèn:ha.
onen are’ sa-ha-ate-’sennonni-’ ne ri-ksten=ha
now again .-..--aim- the 1>.-be.old=mdim
now again he re-aimed the I have him as old man
(Again) my husband took aim.
Thi:, . . weasel nen
thiken onen
that weasel then
That weasel
kwáh taonsahatkahtónnion’ ne á:re’.
kwah t-a-onsa-ha-at-kaht-onnion-’ ne are’
just ---..--look-- the again
just he re-looked around the again
just looked around (again)
Nok á:re’ taonsahatekhwá:ko’
ne ok are’ t-a-onsa-ha-ate-khw-ako-’
the too again ---..--meal-take-
and again he re-bite-took
And then he took (another) bite
thi: kitkit ne kahsinà:ke.
thiken kitkit ne ka-hsin-a’ke
that chicken the ..-leg-place
that chicken its leg place
out of the chicken on its leg.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

320  

12.5.3 Noun incorporation

Noun incorporation, the compounding of a noun stem with a verb stem to form a
new verb stem, is pervasive in Mohawk, but it is a word-formation device.
Speakers generally know which forms are part of the lexicon of the language
and which could be but are not. Awareness of neologisms depends on the
productivity of individual noun and verb stems. Some noun stems are never
incorporated, some are sometimes incorporated, some are often incorporated,
and some occur only incorporated. Similarly, some verb stems never occur with an
incorporated noun, some occur in a few combinations with nouns, and some in
many. New combinations with less productive stems are more often noticed than
those involving highly productive ones. Often the language provides alternatives
for packaging information: a noun may occur as an independent word or incorp-
orated into a verb. The density of incorporation for discourse purposes generally
varies across speakers with the degree of language use.
Examples of noun incorporation can be seen in (22), part of a conversation
between a grandmother and her granddaughter as they were making meat pies.
The grandmother was a highly skilled speaker, who learned English only after she
went to school. The granddaughter heard Mohawk as a child, but spent most of
her daily life in English. (The entire conversation was in Mohawk, but just the free
translation is given for the ﬁrst few lines to provide context.)

(22) Noun incorporation: Grandmother and granddaughter

GM: ‘Go get the wooden bowl.’
GD: ‘Wooden bowl?’
GM: ‘Wooden bowl.’
GD: ‘Wha–
GM: ‘You’ll use it to put the flour in.’
GM: Othè:sera’ ostòn:ha, sok, kén:ie’.
flour a little then fat
‘A little flour, and then, fat.
Tánon’, um,
And, um,
né: ní: ke-rákw-as
that the=1 1.-choose-
the I I prefer
n=en-ke-wist-á:wen-ht-e’.
the=-1.-fat-liquid--
I will fat melt
and I myself prefer to melt the fat.’

The grandmother ﬁrst introduced the fat with the independent noun kén:ie’. Once
it was an established referent, she incorporated it: enke-wist-á:wenhte’ ‘I will fat
melt’. (Incorporated noun stems are not always the same as their independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 321

counterparts.) The combination ‘fat-melt’ is of course a common one. The

conversation continued.

(23) Noun incorporation: Grandmother and granddaughter

GD: To: ní:kon?
‘How much?’
GM: En, o: ní:se’ enhsanónhton’
en o: ne=ise’ en-hs-anonhton-’
ah oh the=2 -2.-think-
ah oh you you will think
ne: tho: ní:ioht
ne: tho: ni-io-ht
that there -.-be.so
it is there so it is
‘Ah, you’ll decide
tsi ní:kon – enhsena’tarón:ni’.
tsi ni-k-on en-hse-na’tar-onni-’
how --be.amount -2.-baked.goods-make-
how so it amounts you will baked.goods make
according to how many pies you’re making.’

At this point the pies were well-established referents, active in the consciousness of the
speakers, so it is no surprise that the noun stem -na’tar- ‘baked goods’ was incorpor-
ated. There was little need to highlight it. The verb ‘make’ is what could be called a
‘light verb’, not adding highly complex, new information. It is one that frequently
incorporates, and the combination ‘baked.goods-make’ = ‘bake’ is a common one.
As the conversation continued, the granddaughter introduced referents with
independent nouns, and the grandmother picked them up with incorporated
nouns.

(24) Noun incorporation: Grandmother and granddaughter

GD: Tánon’, o’wà:ron’, tánon’ ohnennà:ta’?
and meat and potato
‘And meat and potatoes?’
GM: En, tsi nikarì:wes ki: sarhá:re’ sok–
ah as so it is matter long this you are waiting then
‘Ah, while you’re waiting,
enhshennà:ton’, tánon’ teka’wahraríhton.
en-hs-henna’t-on-’, tanon’ te-ka-’wahr-a-ri-ht-on
-2.-potato-cook- and --meat--cooked--
you will potato cook and it is meat cooking
then you’ll cook the potatoes, and the meat is cooking.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

322  

During this conversation, the grandmother talked more than the granddaughter,
with ﬁve times as many words (tokens) overall. But as in Central Pomo, perhaps
surprisingly, their average morpheme per word ratios were nearly identical: the
grandmother’s speech averaged 2.4 morphemes per word, and the granddaughter’s
2.3. As in the Central Pomo conversations, the skilled speaker, the grandmother,
used many more discourse particles, which are monomorphemic. The granddaugh-
ter used some highly lexicalized polymorphemic words, words that she clearly
selected as familiar chunks, and fewer particles. The speech of the two was overall
quite different, in many of the same ways as in Central Pomo. Skilled speakers like
the grandmother here spend more of their time in the language and simply know
more words and more constructions. They have more lexical items to choose from,
including verb stems with incorporated nouns, and more choices among construc-
tions for shaping the ﬂow of information.

12.5.4 Processing

The crucial role of lexicalization in processing can be seen in interactions among

speakers of different dialects. There are six Mohawk communities, distributed
across Ontario, New York State, and Quebec. These are, from west to east, Ohswé:
ken, Wáhta’, Tehaientané:ken, Ahkwesáhsne, Kanehsatà:ke, and Kahnawà:ke.
Phonological differences among the dialects are relatively minor. Where speakers
in the west pronounce the affricate written <ts> as an alveopalatal before a high
front vowel or palatal glide, those in the east pronounce it as alveolar. Where some
speakers pronounce <r> as a retroflex flap, others pronounce it as a lateral [l].
Where speakers in the west continue the pronunciation of original *ty and *ky,
those at Ahkwesáhsne pronounce both as velar, and those at Kanehsatà:ke pro-
nounce both as alveopalatal.
Morphology is constant across the dialects. The morphological templates are
the same, as are the inventories of prefixes, roots, and suffixes. Principles of syntax
are also the same. Constituent order is purely pragmatically based.
Quite surprisingly, when a recording of an excellent speaker from Ohswé:
ken’, the westernmost community, was played for skilled speakers in Kahnawà:
ke, the easternmost community, they had difficulty understanding him. The
barrier was not the individual morphemes, nor their patterns of combination,
which are essentially the same in all of the dialects, but vocabulary, the pre-
formed chunks. Over the past several centuries since their separation, different
lexical items have developed in the different communities. These Kahnawà:ke
speakers were not processing his speech morpheme by morpheme, but word
by word.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 323

12.5.5 Native acquisition

An intriguing issue for the acquisition of languages with complex morphology is

how children first break into the system. They never hear verb roots or stems in
isolation; in fact Mohawk speakers themselves cannot isolate roots or stems
(unless they become linguists). Over the past several decades, there have been
relatively few children learning Mohawk as a first language (though that pattern is
beginning to change), so large-scale studies of acquisition have not been possible.
Some principles have emerged, however, from observation of a few children
acquiring the language (Mithun 1989).
The first is that the earliest stages of acquisition are phonologically based.
Children first extract the stressed syllable of words. This choice is actually useful.
Stress basically falls on the penultimate syllable, the second from the end (though
certain epenthetic vowels are passed over). The stressed syllable often coincides
with the root or part of it, so the children can often get their message across.
Progress remains phonologically based for a time: the child first adds the ultimate
syllable, producing two-syllable words, then the antepenult, etc. An example of
adult/child interaction can be seen in (25).

(25) Child Mohawk: Adult and child, 2;2

Adult Child
Wa’kéta’. Kéta’. ‘I’m putting them in.’

Some later child versions of words are in (26).

(26) Child Mohawk

Adult Child
osahè:ta’ ahe:ta’ ‘beans’
ohiákeri iákeri ‘fruit juice’
tehotskà:hon otskà:hon ‘he’s eating’

What is at ﬁrst astonishing about the Mohawk of young children is what

appears to be their allomorphic skill. The masculine singular agent pronominal
prefix ‘he’, for example, has the form ra- word-initially and -ha- word-internally,
except that it is basically -hr- after a stressed vowel or before the vowels o, on, e,
or en. When the following stem-initial vowel is i, this vowel merges with the a
of the pronominal prefix to the nasalized vowel en, ([ᴧ̨]), yielding allomorphs
ren-/-hen-/-hren-. (This fusion is characteristic of some other pronominal pre-
fixes, but not a general process throughout the language.) Otherwise, the final a of
the pronominal prefix is lost before another vowel. Another phonological process
involves coda h in stressed syllables: the laryngeal produces a distinctive high-fall
pitch contour (indicated orthographically with a grave accent) on that syllable,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

324  

then disappears, leaving vowel length. The masculine singular agent preﬁx thus has
the forms -hra-, ra-, -ha-, -hr-, r-, -hren-, -hen-, and -ren. And yet, children never
seem to make mistakes! At age 2 years and 10 months, one child easily asked Ka’ wà:
re? ‘Where’s he going?’, never tripping over those complex phonological processes.

(27) Mohawk phonology

Ka’ wà:re’?
ka’ wa-hra-e-’
where -..-go-
‘Where is he going?’

Of course this is no surprise. The child knew the full question as a chunk; he did
not manufacture the word from underlying forms of morphemes, then apply
multiple phonological processes to arrive at a surface form.
The second person singular agent pronominal preﬁx ‘you’ is basically s- word-
initially, -hs- word-internally, with epenthetic -e- before stems beginning in n, r, or
w and certain consonant clusters. The basic form of the perfective aspect sufﬁx is
glottal stop ’, with epenthetic -e- after consonants. As noted above, stress is
penultimate, with stressed vowels lengthened in open syllables, but epenthetic
vowels do not enter into the determination of stress. The child cited above in (27)
similarly came out with the exclamation in (28) below easily and perfectly, despite
the complexity of the processes that would go into building it from underlying
forms then applying a sequence of phonological rules.

(28) Mohawk phonology

Sótsi enhserá:kewe’!
sotsi en-hs-rakew-’
too -2.-wipe-
‘You’re going to erase too much!’

Of course the child learners did not emerge instantaneously with Mohawk
equivalent to that of adults. About the time they were producing three-syllable
words, they began to discover morphology, usually with a few more frequent
pronominal prefixes. (These immediately precede the verb stem.) From this point
on, acquisition was governed more by morphology than phonology. As seen
earlier, Mohawk speakers generally specify the direction of directed motion,
with a Translocative prefix i-/ie-/ia-/ia’-/iaha- ‘thither’ or a Cislocative prefix t-/
te-/ta-/-onta-/-onte-/-ont- ‘hither’. A Translocative prefix was seen earlier in (19)
in the verb i-a’akiate’serehtínion’t ‘we pulled in there’. At 2 years and 10 months,
the child cited in (27) and (28) generally omitted the directional prefixes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 325

(29) Mohawk direction

Child Adult version
enháhawe’ i-enháhawe’ ‘he will take it’
waháhawe’ i-aháhawe’ ‘he took it’

(The initial w of the factual preﬁx regularly disappears following the

Translocative.)
Negation is expressed in Mohawk, as in many languages, with a combination of
markers: the particle iáh plus an initial Negative prepronominal preﬁx te’- or
Contrastive prepronominal preﬁx th-/tha-/tha’-. This child used the analytic
marker iáh alone at this age.

(30) Mohawk negation

Iáh thí:ken rón:kwe ì:raks.
iah thiken r-onkwe i-hr-ak-s
not that .-person -..-eat-
‘That man doesn’t eat it.’

The adult version would include a negative prepronominal prefix on the verb:
te’-hr-ak-s > tè:raks. (Mohawk verbs must contain at least two syllables. If a verb
would otherwise be monosyllabic, a prothetic vowel i- is added at the beginning,
which bears stress.)
Overall, children learning Mohawk apparently first build vocabulary within
phonological length limitations, then begin to abstract morphological distinctions.
The fact that they so rarely make allomorphic errors suggests that they are not in
fact producing language by assembling underlying forms then applying sequences
of phonological rules. This accords well with the findings of Tomasello (2006 and
elsewhere) on acquisition: Children’s earliest acquisitions are concrete pieces of
language—words, complex expressions, or mixed constructions—because particu-
larly early in development they do not possess fully abstract categories and schemas.
Children construct these abstractions only gradually and in piecemeal fashion.
The strategies observed in children learning Mohawk as a first language differ
interestingly from those seen in adult second-language learners. In several of the
Mohawk communities, an extraordinary generation of young adults are develop-
ing an impressive competence in the language. They are becoming fluent, some-
thing that would have been considered an impossible dream only a short time ago.
These second-language learners show brilliant mastery of the complex morph-
ology, certainly making allomorphic mistakes along the way, but exquisitely tuned
in to the complexities involved. First-language speakers are delighted to see their
accomplishments, though, interestingly, they observe uniformly that these
second-language speakers continually create words that do not exist in the
language.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

326  

12.6 Implications for our models of morphology

Work by Blevins (2006, 2013, 2016a, 2016b), Pirrelli et al. (2015), and others
draws a distinction between Constructive and Abstractive models of morphology.
In Constructive frameworks, surface word forms are described as built up from
subword units, either in terms of substance or rules. In Abstractive frameworks,
the basic units of the grammatical system are surface word forms. Roots, stems,
and exponents are understood as abstractions over a lexicon of word forms.
Constructive perspectives underlie efficient linguistic descriptions, the kinds of
descriptions that are useful for both linguists and adult second-language learners.
They also fit well with what is seen in adult acquisition of Mohawk as a second
language, in particular allomorphy mistakes and the overgeneration of derived
forms. Such descriptions also provide measures of objective complexity in the
sense described by Dahl and others cited earlier.
Abstractive perspectives are word based, though it is recognized that words can
be internally structured into recognizable constituent parts. Constituent parts are
analysed as emergent from independent principles of lexical organization,
whereby full lexical forms are redundantly stored and mutually related through
entailment relations (Matthews 1991; Corbett & Fraser 1993; Pirelli 2000; Burzio
2004; Booij 2010; all cited in Pirelli et al. 2015: 142). It is significant that the
processing of a given form may be facilitated or inhibited by other, related forms.
This makes sense only if the related forms are available as elements of a speaker’s
mental lexicon (Taft 1979; Baayen et al. 1997; Schreuder & Baayen 1997; Hay
2001; de Jong 2002; Moscoso del Prado Martin 2003, cited in Blevins 2006; Blevins
2006: 535).
Abstractive models accord well with differences between highly fluent first-
language speakers of Central Pomo and Mohawk on the one hand, and English-
dominant first-language speakers on the other. One of the most salient differences
is that while less fluent speakers do use highly synthetic words if they are very
frequent or primed, they have a smaller inventory of choices. Their more limited
lexical inventories can result in some inappropriate lexical selections, both
inflected and derived, and fewer options for shaping information flow.
Abstractive models also accord well with the strong sense among both Central
Pomo and Mohawk speakers of whether a possible word exists and exactly when it
is used. They are in line with the problems even skilled speakers sometimes face in
attempting to process speech from other dialects. They would predict the variable
ability of speakers to isolate morphemes which never occur on their own as
independent words, the existence of discontinuous dependencies, and speakers’
differential facility in producing inflectional paradigms. Speakers can certainly
extend patterns of inflection by analogy on occasion, but rarer forms and com-
binations present greater challenges. Abstractive perspectives also accord with the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? 327

phonologically-based ﬁrst-language acquisition strategies of Mohawk by young

children, and the rarity of allomorphy mistakes. Learners of that age face few
memory hurdles that would hinder the acquisition of large numbers of new lexical
items.
In the end, what is complex for the analyst is not necessarily complex for the
speaker or for the learner. Speakers of Central Pomo and Mohawk store most
polymorphemic words, or at least stems, as chunks. Allomorphic alternations do
not present serious difﬁculties when they are embedded in the chunks, a fact that
is easily observed in the absence of mistakes in frequent forms, but also in the
challenges presented by rare or novel combinations. Templatic structure may be
unmotivated for the analyst and thus viewed as additional complexity, but,
importantly, the routinization of structure they represent can result in fewer
decisions on the part of speakers. It can also facilitate the acquisition of new
lexical items, items which easily ﬁt into an existing pattern.
Do the differences matter? The various types of complexity are all useful, but
for different purposes, and for that reason, it is important to recognize them. If our
goal is to delineate what is a possible language, we want to think about possible for
whom. Language is full of patterns, some no longer productive. As analysts we
care about all of them: they allow us to understand the otherwise arbitrary.
Speakers inherit the products of past patterns and happily use some without
abstracting over them. And it is learners and speakers who shape the language
according to their own knowledge.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

IV
DISCUSSION
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

13
Morphological complexity and the
minimum description length approach
Östen Dahl

13.1 Introduction

Within the study of linguistic complexity, morphological complexity has a special

place due to the fact that morphology is the part of language where differences in
complexity between languages are most apparent. Morphological complexity also
seems to lend itself fairly easily to quantiﬁcation. It is therefore natural that it should
attract the attention of linguists. The chapters in this volume show a variety of
approaches to morphological complexity which sometimes differ quite considerably
in the conceptual apparatus applied. In this concluding chapter, rather than try to
review each contribution separately, I will focus on some of the basic concepts used
by the authors. Sometimes this will demand going beyond the contributions to the
volume. I will start out by presenting brieﬂy what I will call ‘the minimum
description length approach’ to complexity and then try to see how other concepts
of complexity applied in the chapters of the volume relate to it.

13.2 The minimum description length approach to complexity

I will take as my point of departure the idea that the complexity of an entity can be
understood as the amount of information needed to recreate or specify it—which
in most cases can be identified with the length of the shortest possible complete
description of it. This is often referred to as ‘Kolmogorov complexity’ or ‘algo-
rithmic information content’ and has its most natural application when applied to
strings (of symbols or characters): the Kolmogorov complexity of a string is the
inverse of its compressibility. Kolmogorov complexity is behind the ‘minimum
description length (MDL) principle’ which is said to build on the insight that ‘any
regularity in the data can be used to compress the data’ (Grünwald 2007), leading
to the conclusion that finding the best hypothesis for a given set of data means
finding the optimal way to compress it. As in Dahl (2004), I will here use the term
‘pattern’ rather than ‘regularity’, following Goertzel (1994) and Shalizi (2001).

Östen Dahl, Morphological complexity and the minimum description length approach In: The Complexities of Morphology.
Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Östen Dahl.
DOI: 10.1093/oso/9780198861287.003.0013
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

332 ̈ 

The minimum description length principle is sometimes said to be a version of

Occam’s Razor, but it might equally well be called ‘Pāṇini’s razor’, given that he
and other Indian linguists in the first millennium  honoured a principle later
formulated as ‘Grammarians rejoice over the saving of half a short vowel as much
as over the birth of a son’,¹ which more directly addresses the issue of description
length. In modern linguistics, similar ideas have been discussed in terms of
‘descriptive economy’ or ‘parsimony’. But ‘minimum description length’ has
been explicitly addressed in computational approaches to morphology, the most
cited example being Goldsmith (2001).
I think the notion of minimum description length can also be helpful in
understanding some notoriously difficult concepts in linguistics. I will take sup-
pletion as an example. Corbett (2009) characterizes suppletion as ‘an outer limit of
inflection, the extreme of markedness and complexity’ but also approvingly quotes
the following statement from Mel’čuk (1994: 358), which does not refer to
complexity, as ‘a good definition of suppletion’: ‘For the signs X and Y to be
suppletive their semantic correlation should be maximally regular, while their
formal correlation is maximally irregular.’ But what is interesting here is rather the
explication in the cited work of Mel’čuk of what he means by ‘maximally irregu-
lar’, or as he says elsewhere, ‘minimally regular’. For his ‘rigorous definition’ of
suppletion, Mel’čuk introduces the auxiliary notion of co-representability. Two
units are co-representable if they can be derived from each other or from a
common source by rules of the language, and the condition on maximal irregu-
larity of form means that the signifiers of the units are not co-representable. No
particular conditions such as productivity or generality are put on the rules—
‘[t]he only factor that counts for there to be regularity is the presence of 
rules’. In a minimum description length approach, Mel’čuk’s account can be
interpreted as implying that suppletion involves the absence of a pattern or
regularity—a way of representing the data in a shorter way than by rendering it
literally. Thus, a suppletive form would have to be listed in the description of the
language. Notice that this excludes what is not explicitly precluded in Mel’čuk’s
account—a rule which applies to one form only. The point is that introducing
such a rule would normally involve an increase in description length that would
offset what is gained by shortening the specification of the suppletive form.

¹ The maxim (Sanskrit ardhamātrā lāghavena putrotsavaṃ manyante vaiyākaraṇ āḥ) is often
quoted in the literature without a source. There is no known formulation of it from classical times.
In the form cited here, it derives from the treatise Paribhāṣenduśekhara by the nineteenth-century
Indian scholar Nagēśa or Nāgojībhaṭtạ , which was translated into English by the German Indologist
Franz Kielhorn (Kielhorn 1871). Incidentally, Occam’s Razor in its commonly cited form (entia non
sunt multiplicanda praeter necessitatem) is not found in the writings of William of Ockham but derives
from the seventeenth-century Irish philosopher John Punch.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 333

13.3 The organization of morphology

When describing a set of objects, the most parsimonious way is often to separate
the information about their general properties from the information that is
specific to each member of the set. Descriptions of languages are traditionally
divided into ‘grammar’ and ‘lexicon’. So let’s see what that implies for
morphology.
We can see the goal of the morphological component of a grammar—or
‘morphology’ for short—as a tool to generate the set of all word forms, organized
in paradigms, in a language from a lexicon. Another way of putting this in the
spirit of the MDL principle is to regard the morphological component as a way of
compressing the set of paradigms. The morphology and the lexicon together
constitute the description of the word forms. The lexicon will consist of a set of
entries, which I shall call ‘lexical specifications’, containing the information
needed by the morphology to generate one particular paradigm, that is, on the
one hand, one or more basic forms or principal parts; on the other, membership in
inflection classes, genders, etc. I shall here assume that the lexicon contains no
other information.
The total length of the morphology and the lexicon is thus indicative of the
complexity of the paradigms. But in speaking of morphological complexity we
have to sort out a few different components in this. Primarily, the morphological
complexity of a language would be the complexity of the morphological compo-
nent in the sense of the system that relates the lexicon with the set of paradigms.
To start with, although I have been speaking of a set of word forms and a set of
paradigms as if those things were equal, the difference between them is crucial.
Think of the paradigm as a table. Since there is a number of ways any given set of
word forms can be organized into a table, and the choice between them is
significant, it follows that there is information hidden in the organization of the
paradigm and consequently the paradigm is more complex than the set of word
forms. Furthermore, the paradigms belonging to lexical items of one part of
speech usually share a common structure. But this structure can be studied
independently of the system that relates paradigms and lexical specifications. So
paradigm organization can be seen as a component of its own.
Another problem is to what extent the lexicon is relevant to the question of
morphological complexity. On the one hand, to the extent that the morphological
component does not treat all lexical items equally, the lexicon will have to contain
information that makes that possible. On the other hand, if items are added to or
removed from the lexicon, the total length of the lexicon will change—and it
seems counter-intuitive that these changes should always influence the morpho-
logical complexity of a language. For this reason, it is rather the information
contained in the individual lexical specifications that is of interest.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

334 ̈ 

As I said, to assess the complexity of a set of paradigms, we would have to

consider both the complexity of the lexicon and the complexity of the
morphology—both separately and taken together. As noted in Sagot & Walther
(2011; quoted by Parker & Sims, Chapter 2, this volume), it may sometimes be
possible to obtain a shorter total description length by changing the division of
labor of the two components.
It is possible to streamline the general picture somewhat. Instead of thinking of
the output of morphology as a set of paradigms, we can think of the morphology
as generating a set of annotated word forms—that is, each form comes with a
specification of its grammatical features. This way, the system becomes
symmetric—we can speak of input and output specifications and a set of rules
that relate them. The input and output specifications have specified formats and
contain terms taken from specific vocabularies: labels of inflectional classes and
values of inflectional features, respectively. The sizes or lengths of the specification
formats and vocabularies are part of the overall complexity of the linguistic
system, but they also influence the complexity of the rules of the morphological
component. What I have just said illustrates that it is not always clear how to draw
the boundaries of morphological complexity. In general, speaking of the com-
plexity of a component of the description of a language in isolation easily becomes
somewhat artificial, in my opinion even on a modular view of language structure.
I will return to this question below.

13.4 Notions of complexity represented in the volume

As noted above, the chapters in the volume differ in the notions of complexity that
are invoked. But they also differ in the extent to which they place these notions
within explicit frameworks.
The minimum description length approach to complexity is mentioned in the
chapters by Di Garbo, Chapter 9; Loporcaro, Chapter 6; Mithun, Chapter 12; and
Nichols, Chapter 7. But more salient in the volume is the approach of Ackerman &
Malouf (2013). Several chapters (Henri et al., Chapter 5; Parker and Sims,
Chapter 2; Mansﬁeld and Nordlinger, Chapter 3; and Meakins and Wilmoth,
Chapter 4) draw on their distinction between two ‘dimensions in the analysis of
morphological complexity’, viz. ‘enumerative complexity’ or ‘E-complexity’ and
‘integrative complexity’ or ‘I-complexity’. This motivates discussing these con-
cepts in some detail, which I will do below.
A superﬁcially somewhat similar dichotomy is that made by Nichols between
‘inventory complexity (IC)’ and ‘canonical complexity (CC)’, but while ‘IC’ and
‘enumerative complexity’ are fairly closely related, the second members of the
pairs bear little resemblance to each other. (There is a potential source of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 335

confusion in that Nichols’s ‘IC’ is closer to Ackerman & Malouf’s ‘E-complexity’

than to their ‘I-complexity’.) Thus, Nichols’s ‘CC’ deserves a discussion of its own.
Mithun (Chapter 12, this volume) cites both minimum descriptive length
complexity and the distinction between ‘Constructive’ and ‘Abstractive’ models
of morphology. Berdicevskis & Semenuks (Chapter 11, this volume) identify
‘irregularity’ and ‘overspeciﬁcation’ as the two ‘facets of complexity’ they want
to focus on. Tallman & Epps (Chapter 9, this volume) rely on the taxonomy of
Anderson (2015a), with ‘system complexity’ and ‘exponence complexity’ as the
top level categories.

13.5 Compositional complexity

In the introduction to Miestamo et al. (2008), the volume editors apply the
analysis of the notion of complexity in Rescher (1998) to linguistic complexity.
For Rescher, description length (in his terms, ‘descriptive complexity’), is just one
of several ‘modes of complexity’. Another is ‘compositional complexity’, which
relates to the constituent elements of a system and is subdivided into two
submodes: ‘constitutional complexity’—the number of elements, and ‘taxonomic
complexity’—their variety. Miestamo et al. (2008: viii) exemplify the former with
the number of ‘phonemes, inflectional morphemes, derivational morphemes,
lexemes’, and the latter with the variety of ‘phoneme types, secondary articula-
tions, parts-of-speech, tense-mood-aspect categories, phrase types’, etc.
Although there are no references to Rescher’s taxonomy (but see the editors’
Introduction, Chapter 1), notions close to ‘constitutional complexity’ show up in a
number of ways in the chapters of the volume, notably as one of the poles of the
dichotomies of Nichols and Ackerman & Malouf.
Nichols’s ‘IC’ is based on ‘assessing the number of elements in an inventory or
values in a system’, exemplified by ‘the number of phonemes, genders, tenses,
derivation types, alignments, word orders’. She identifies it with Miestamo et al.’s
(and thus indirectly Rescher’s) notion of ‘taxonomic complexity’. It may be noted
that some of the items in her list seem rather to belong to ‘constitutional
complexity’ in Rescher’s schema, illustrating that the borderline is somewhat
fuzzy.
Nichols also quotes the term ‘resources’ from Dahl (2004) in this context, which
is slightly problematic. In my book, I opposed ‘resources’ and ‘regulations’, saying
that intuitively, ‘resources determine what is possible or permitted, regulations
what is obligatory’, and noting that ‘the distinction is reminiscent of that between
grammar and lexicon but does not coincide with it’ (Dahl 2004: 41). The basic idea
was that resources are things that one can more or less freely choose from. The
primary examples are lexical items. As the quotation suggests, I did not primarily
think of the notion as applying to grammar. Many of the phenomena Nichols
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

336 ̈ 

enumerates are not freely chosen by speakers but rather show up as a consequence
of forced choices due to what I called regulations.
Later in the book (Dahl 2004: 42) I say that if one wants to characterize a
language with respect to its ‘resources’, the parameter that comes first to mind is
‘richness’. Dressler (2011), who is quoted by Loporcaro (Chapter 6, this volume),
also uses this term, characterizing the size of paradigms as a criterion of ‘richness’
rather than of ‘complexity’. However, Dressler defines ‘richness’ as ‘the amount of
productive morphological patterns’, associating complexity with unproductive
patterns, so his notion is different from mine (and apparently also from
Nichols’s ‘IC’).
Let me now turn to Ackerman & Malouf’s notion of E-complexity. It is not
quite clear what is supposed to go into it. The abstract says that E-complexity
reflects ‘reflects the number of morphosyntactic distinctions that languages make
and the strategies employed to encode them, concerning either the internal
composition of words or the arrangement of classes of words into inflection
classes’ (Ackerman & Malouf 2013: 429). The definition in the main text (2013:
433) is formulated in a somewhat roundabout way. The authors first note that
‘descriptive linguists often comprehensively catalogue the array of morphological
markers and patterns in a given language or languages’, making possible on the
one hand typological investigations of the types of information encoded in words
and taxonomies of formal strategies for encoding this information, on the other,
inferences by theoretical linguists about the bounds on possible word structures in
natural languages. ‘We refer to patterns found via this general cataloguing of
properties and their surface exponence for words in all of their variety as the
enumerative complexity or E-complexity of a morphological system.’ What is
unclear here is whether E-complexity is basically a count of distinctions and
patterns/strategies or something more. Later formulations in the paper do not
really solve this problem. On p. 434, we learn that ‘[on]e salient dimension of
E-complexity is the number and nature of inflection classes in a language’, with the
word ‘nature’ suggesting that it is not only a question of counting. On the other
hand, on p. 437, it is said that paradigm-based models ‘reflect a measure of
E-complexity’ which is specified as ‘a greater number of possible exponents, inflec-
tional classes, and principal parts’. Likewise, on p. 451, ‘the same E-complexity’ is
equated with ‘the same number of declensions, paradigm cells, and allomorphs’, and
in a later work (Ackerman & Malouf 2016: 125), E-complexity is said to increase
with ‘(i) larger numbers of morphosyntactic properties a language contains,
(ii) greater numbers of allomorphic variants it uses to encode them, and
(iii) more inflectional classes that lexemes can be distributed over’.
The interpretation of enumerative complexity as being simply an inventory
count is clearly the one chosen by Henri et al. (Chapter 5, this volume): ‘a
linguistic phenomenon’s enumerative complexity depends on how many categor-
ies (of whatever type) it employs’ (p. 106). They seem to have the same thing in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 337

mind when saying earlier (p. 106) that ‘[m]orphological complexity is often
equated with numerousness—of morphs, categories, processes, or paradigm
cells’. They also refer to Stump (2017), who is quite explicit on this point when
he describes the distinction introduced by Ackerman & Malouf: ‘a linguistic
phenomenon’s enumerative complexity depends on how many categories (of
whatever type) it employs . . . ’. Parker & Sims (Chapter 2, this volume) also refer
to enumerative complexity as ‘the number of inflection classes or the size of
paradigms’.
Ackerman & Malouf (2013), like also the chapters in this volume that quote it,
tend to give the impression that the two notions of I-complexity and E-complexity
more or less exhaust the possible approaches to morphological complexity, and
that earlier work has been dominated by E-complexity. Thus, Ackerman & Malouf
say in a footnote (2013: 434): ‘For examples of efforts to identify and quantify E-
complexity, see, for example, Juola 1998, 2007, Sampson et al. 2010, Moscoso del
Prado Martín 2011.’ But the works listed here represent a variety of approaches to
linguistic complexity, including MDL-based ones. And it should be clear that E-
complexity cannot be identified with description length. A list of morphosyntactic
categories, inflection classes, and allomorphs is not yet a morphological descrip-
tion of a language.

13.6 Integrative complexity

Minimum description length approaches to complexity can be said to represent

‘objective’ (Dahl 2004) or ‘absolute’ (Miestamo 2008) understandings of the
notion in the sense that they concern properties of objects or systems that are
independent of concepts such as ‘difficulty’ or ‘cost’, which imply an ‘agent-
related’ (Dahl 2004) or ‘relative’ (Miestamo 2008) notion of complexity.
Ultimately, we want to understand how objective measures of linguistic complex-
ity are related to how difficult or costly different aspects of a language are for a
learner or a user, but in order to do that, we have to keep objective and agent-
related notions apart and not let them be conflated. When Ackerman & Malouf
(2013) say that their notion of ‘integrative complexity’ ‘reflects the difficulty that a
paradigmatic system poses for language users (rather than lexicographers) in
information-theoretic terms’, it invites the interpretation that they are doing
exactly that—conflating objective complexity and difficulty. A more charitable
understanding, however, is that their goal is to find an objective measure of
complexity that predicts the difficulty of a linguistic system, more specifically
the uncertainty that faces a speaker when inferring an unknown word form from
other forms in the same paradigm. The most important measure then becomes
‘the average uncertainty in guessing the realization of one randomly selected cell
in the paradigm of a lexeme given the realization of one other randomly selected
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

338 ̈ 

cell’. The central result of the study is that high E-complexity of paradigmatic
systems is possible as long as low I-complexity is found in those in the form of
average conditional entropy of paradigms.
The definition of average conditional entropy presupposes that the set of
possible realizations of the cell to be guessed is known and finite, otherwise the
entropy cannot be calculated. This condition is not fulfilled when some of the
possible realizations are suppletive. That is, the notion of conditional entropy
cannot be applied to cases such as English go and went. It may perhaps be argued
that those are precisely the situations where you have to know the paradigm in
advance so guessing is not possible anyway. But it restricts the applicability of the
notion to some extent.
The notions of ‘conditional entropy’ and ‘average conditional entropy’, as
applied to inflection templates, have some interesting mathematical properties
not discussed by Ackerman & Malouf. ‘Average conditional entropy’ involves
bidirectional predictability relations between cells in a paradigm template. These
turn out to be ‘entangled’ in that there is an upper bound on the sum of two
symmetric entropies, which has as a consequence that the average conditional
entropy of a paradigm can never exceed 50% of what Ackerman & Malouf call its
‘declension entropy’, that is, the surprisal of the inflection class membership of a
lexeme under the assumption that each inflection class is equally probable. I have
no formal proof of this claim,² but I have tested it for all possible value combin-
ations for sets of classes with sizes up to eight, where I had to stop due to
limitations on computer capacity. Concretely, this means that in a system with
eight declension classes and declension entropy equal to 3—like the Greek one
exemplified in Ackerman & Malouf (2013), the average conditional entropy could
not be higher than 1.5. This fact should be taken into account when assessing the
actual average conditional entropies calculated by Ackerman & Malouf—as, when
they (p. 442) say that the overall average conditional entropy for the eight Greek

² But consider the simplest case: a system with two inflection classes and two inflectional forms, as
illustrated in Table 13.1. There are four logical possibilities in such a 22 matrix: (1) identity between
the rows in both columns; (2) identity in row 1 and no identity in row 2; (3) no identity in row 1 but
identity in row 2; (4) no identity in either row. Case (1) can be disregarded since it would mean there is
really only one inflection class. The entropy is zero. In case (4), one form always gives full information
about the other, so the entropy is zero. In case (2), the cells in row 1 do not say anything about the cells
in row 2, so the entropy for each cell is equal to the choice between two items, that is 1 (=one bit). But
since there is no choice in row 1, the entropy in the opposite direction is 0, which gives an average of
0.5. Case (3) is analogous, but with the columns swapped—the average will again be 0.5. Note further
that adding a third column will not change anything for the following reason. Guessing is always from
one column to another, so we are always dealing with pairs of columns, in which guessing can go in
either direction. While a 22 matrix involves just one such pair, a 32 matrix with columns ABC
entails three pairs of columns: AB, AC, BC. But that makes such a matrix equivalent to three 22
matrices—and as we saw, a 22 matrix has a maximum average guessing entropy of 0.5, the value for
the 32 matrix is the same. And adding further columns gives an analogous result. Things get more
complicated when rows are added, but my computer simulation strongly suggests that the relation
between declension entropy and maximum average conditional entropy is constant.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 339

declensions is 0.644 bits, which is equal ‘to a choice among . . . 1.56 equally likely
declensions’ or ‘slightly more than one’ declension. This is misleading in the sense
that no system with two declensions could ever have an average conditional
entropy higher than 0.5. Thus, if the entropy is 0.644, the system must have at
least three declensions. The low values for average conditional entropy found by
Ackerman & Malouf thus at least partly depend on mathematical necessity rather
than on anything else.
It appears that integrative complexity, in the form of conditional entropy,
primarily depends on two factors: one is the extent to which forms ‘wear their
inflection class on their sleeve’, that is, are informative about their own inflectional
class, the other is the extent to which the distributions of allomorphs—or, more
generally, exponents—differ between forms and thus, in the words of Parker &
Sims (Chapter 2, this volume), increase the ‘extent to which the system inhibits
motivated inferences about the realized form of a lexeme, given one or more other
realized forms of the same lexeme’.
The dependence of conditional entropy on these factors means that its rela-
tionship to minimum descriptive length complexity is not straightforward. The
first factor—the informativity of a form about its inflection class membership—
means that there is an inverse relation between the diversity of forms in the
predicting cells and integrative complexity. Thus, lack of overt marking, which
will in general decrease description length, can actually increase integrative
complexity. Consider the hypothetical noun inflection templates in Table 13.1,
with the rows representing two inflectional classes.
The templates can be generated by the rules beneath the table.

Table 13.1. Hypothetical noun inﬂection templates

(a) (b)
sg pl sg pl
1 -∅ -e 1 -a -e
2 -∅ -i 2 -o -i

Rules: (a) If plural then (if 1 -e else -i) else -∅; (b) if plural then (if 1 then -e
else -i) else (if 1 then -a else -o).

Thus, (b) has a greater description length than (a). However, in (b), the singular
and plural markers are wholly predictable from each other, so the integrative
complexity is 0. In (a), on the other hand, the plural form cannot be determined
from the singular, which results in an average integrative complexity of 0.5—the
theoretic maximum—for the whole template.
The second factor—the degree to which allomorph distributions differ—means
that a high average number of allomorphs—which would presumably lead to a
higher description length—does not necessarily lead to a high integrative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

340 ̈ 

complexity. Thus, paradoxically, the situation we saw in (b), where all cells of the
paradigm are different from each other, will, irrespective of the size of the
paradigm, always mean that the integrative complexity is zero. But this is not so
strange if we realize that what integrative complexity really measures is the
amount of discordance between the classiﬁcations of the lexicon entailed by the
different columns in a paradigm.

13.7 Canonical complexity and transparency

Nichols’s concept of CC builds on the notion of canonicity developed above all by

Greville Corbett and his associates (see, e.g., Corbett 2007, 2013a, 2015). ‘CC’
should not be interpreted as ‘complexity in the canonical sense’, but rather, as
Nichols herself admits, as ‘less logical’ alternative to the more cumbersome
‘noncanonicity-based complexity’, perhaps also paraphraseable as ‘degree of
noncanonicity’.
According to Nichols, canonicity theory ‘can be used as a good approximation
to descriptive complexity [i.e. minimum description length Ö.D.] and is straight-
forwardly measurable and comparable’ even if it is not a complexity measure in
itself. In Nichols’s words, ‘[i]t deﬁnes a logical space (for a linguistic concept or
structure or system) by determining the central, or ideal, position in that space
and kinds of departures from that ideal, and an element is non-canonical to the
extent that it departs from the ideal’. According to Corbett (2015: 149), canonicity
theory, or in his words, canonical typology, analyses and deﬁnes ‘phenomena that
are subject to variability (across and within languages), extracting the various
scales along which we characterize variability, and establishing the logical end-
point of these scales’, yielding theoretical spaces of possibilities, which once
established can be populated with real instances. Canonical instances are those
that match a full set of criteria and may therefore be infrequent or even non-
existent. This distinguishes canonicity from prototypicality with which it is easily
confused.
As the following quotation (Corbett 2015: 172) makes clear, phenomena are not
canonical or non-canonical tout court, but rather they are canonical or non-
canonical instances of some concept:

Just as, for instance, we say that suppletion is a noncanonical realization of

morphosyntactic specification, but can then specify canonical suppletion . . .
Similarly, inflection classes are themselves noncanonical, but we can go on to
establish criteria for canonical inflection classes . . .

It would appear that this creates a problem for the notion of CC, since we would
have to choose a concept to relate it to and also be rather cautious in doing so,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 341

since for some concepts, such as suppletion, ‘more canonical’ actually means
‘more complex’. I cannot see that this issue is addressed in an explicit fashion in
Nichols’s Chapter 7, but since she says that she is concerned exclusively with
‘morphological complexity and specifically inflectional morphology’, it can be
assumed that the canonicity she is speaking of is ‘canonical inflection’ as under-
stood in Corbett’s (2015) paper.
There is still a catch here, though. In general, one would assume that a language
with minimal inflectional complexity would be one without any inflection at all, or
that a minimally complex inflectional class system would be having no inflectional
differences between lexemes. Under a consistent canonical approach, however, it
would appear that isolating languages should not be seen as having zero inflec-
tional complexity (and thus being maximally canonical), rather the notion of
inflectional complexity would not be applicable to them. So far as I can see,
Nichols’s sample does not contain any purely isolating languages (Mandarin is
the one that comes closest) so it is not apparent how she would treat them. But the
problem may show up again at another level. Thus, with regard to unpredictability
of gender, Nichols puts languages with entirely predictable gender together with
languages without gender—which maybe makes sense assuming that one is
looking at canonicity of inflection but not if what is at stake is canonicity of
gender.
Nichols notes one point where there is a discrepancy between Kolmogorov
complexity and CC—syncretism, that is, when two or more cells in a paradigm
share the same word form. She notes that syncretism does ‘not increase the
amount of information required to describe a language’. This may in fact be
made stronger—syncretism often makes it possible to shorten a description. But
syncretism will in general lead to violations of what Nichols refers to as ‘the
structuralist notion of biuniqueness, or “one form, one function” ’,³ which Nichols
sees as central to canonicity and thus syncretism increases CC. Likewise, Corbett
(2015: 152) says: ‘In the canonical situation, the inflectional material is different in
every cell of the lexeme. The major deviation here is syncretism; we have an
expectation of a given number of inflectional forms, while with syncretism two or
more of them are identical (two or more morphosyntactic specifications share a
single realization).’ Sometimes it seems that the choice of criteria on canonicity
rely on a demand for ‘proper behaviour’—if you have a distinction somewhere,
you had better have it everywhere. If that makes things more complex does not
really matter.
What Nichols calls ‘biuniqueness’ (like Tallman & Epps (Chapter 9, this vol-
ume), who mention ‘deviations from biuniqueness’ as a criterion that relates

³ Cf. also the following statement by Mansﬁeld & Nordlinger (Chapter 3, this volume): ‘Inﬂectional
allomorphy is a prototypical form of morphological complexity, introducing unpredictability into the
mapping of form to meaning’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

342 ̈ 

to measures of morphological complexity) is sometimes referred to as ‘transpar-

ency’, notably in the work of Kees Hengeveld and his associates. Hengeveld &
Leufkens (2018) define ‘transparency’ as ‘a one-to-one relation between units of
meaning and units of form’. As for the relationship between this concept and
complexity, they say that ‘The difference is immediately evident from the fact that
languages may be complex yet transparent or simple yet opaque’; however, they
do not clarify what notion of complexity they have in mind except by giving
Turkish as an example, where the verbal morphology ‘is highly complex in the
sense that a single verbal word may contain a high number of different mor-
phemes, but also highly transparent in that every morpheme corresponds to one
fixed meaning’. This suggests that they are speaking of structural complexity
rather than system complexity.
There is another problem in identifying deviations from the one-to-one relation
between meaning and form with complexity, not addressed by the authors
mentioned above, that is crucial when it comes to crosslinguistic comparisons.
It concerns the identifiability of units of meaning and is particularly crucial in
inflectional morphology. The grammar of a language may force speakers to
express information that is not essential to their intended message. Thus, in a
language with gendered pronouns, it may not be possible to refer to a person
without revealing their gender. The consequence is that it is sometimes impossible
to translate a sentence from one language into another which conveys exactly the
same information, which makes it difficult to compare the languages with respect
to biuniqueness/transparency (see Dahl 2004: 80–6 for further discussion).
The notion of ‘overspecification’ is also relevant here. Following McWhorter
(2007: 21–8), Berdicevskis & Semenuks (Chapter 11, this volume) regard over-
specification as one of the most crucial facets of complexity, defining it as ‘overt
and obligatory marking of a semantic distinction that is not necessary for com-
munication’. Noting that ‘it is not at all obvious what is necessary for communi-
cation’, they mention McWhorter’s proposal to use crosslinguistic comparison to
determine what is necessary: if a distinction is not universally present in lan-
guages, it can be assumed not to be necessary for communication. However, as is
noted in Dahl (2004: 80), it is not possible to claim that a distinction is necessary
or unnecessary as such, since that has to depend on what information the speaker
wants to convey—the point is rather that a grammar may force speakers to express
some information whether they like it or not.

13.8 Overabundance

Meakins & Wilmoth (Chapter 4, this volume) focus on the phenomenon of

‘overabundance’, by which they mean ‘the exponence of multiple forms in the
same cell in a paradigm’, arguing that it represents an increase in integrative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

 343

complexity, ‘in that it requires speakers to make calculated choices about forms
based on features beyond the paradigm’. The particular problem studied is
optional subject marking in the mixed language Gurindji Kriol, more specifically
‘the alternation in the nominative cell of the Gurindji Kriol case paradigm between
zero and -ngku’. They identify three factors which govern the variation: (i)
transitivity; (ii) priming by a preceding subject in the discourse; (iii) presence of
a co-referential (crossreferential) pronoun. This obviously expands the domain
within which morphological complexity is considered. I think it may be ques-
tioned if this variation is to be treated within morphology at all; it looks similar to
other cases of differential argument marking and would naturally be seen as a
syntactic phenomenon. On the other hand, as I said above, seeing complexity only
from a module-internal perspective can be seen as artificial and may prevent us
from making relevant generalizations. In this case, we seem to be dealing with
phenomena that were discussed in Dahl (2004: 128–34) under the rubrics ‘pattern
competition’ and ‘pattern regulation’. I was mainly interested in what happens
during grammaticalization in a single language, but it seems that what I said can
be generalized to contact situations. My main point was that competition between
two patterns, whether lexical or grammatical, may lead to an increase in com-
plexity. As long as the patterns are in free variation, the increase is minimal (and
does not lead to any significant difficulty for learners and users), but there appears
to be a universal tendency towards regulation of the variation, which at the initial
stages shows itself merely in the form of tendencies.

13.9 Conclusion

The chapters of the volume that I have looked at here are those in which there is
explicit discussion of the basic notions relating to complexity employed in the
chapters. Time and space considerations do not allow me to comment on the
others, in spite of many of them being on topics that are of direct interest to me.
One reﬂection is that the study of morphological complexity has still quite some
way to go before there is a set of shared notions and standard works that everyone
refers to. Which approaches will prevail in the long run is obviously an open
question. It is notable that both the notion of minimum description length and
Ackerman & Malouf’s notion of integrative complexity are ultimately based on
information theory. It is not excluded that we will see other applications of this
theory in the future.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

References

Abel, Jennifer (2006). ‘That crazy idea of hers: The English double genitive as a focus construc-
tion’, Canadian Journal of Linguistics 51(1): 1–14. doi:10.1017/S0008413100003790
Aboh, Enoch O. (2009). ‘Competition and selection: That’s all!’, in Enoch O. Aboh and
Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins,
317–44. doi:10.1075/cll.35.20abo
Aboh, Enoch O. (2015). The Emergence of Hybrid Grammars. Cambridge: Cambridge
University Press. doi:10.1017/CBO9781139024167
Aboh, Enoch O. and Umberto Ansaldo (2007). ‘The role of typology in language creation’,
in Umberto Ansaldo, Stephen Matthews, and Lisa Lim (eds), Deconstructing Creole.
Amsterdam: John Benjamins, 39–66. doi:10.1075/tsl.73.05abo
Abouda, Lotfi and Marie Skrovec (2015). ‘Du rapport entre formes synthétique et analy-
tique du futur. Étude de la variable modale dans un corpus oral micro-diachronique’,
Revue de Sémantique et Pragmatique 38: 35–57.
Abouda, Lotfi and Marie Skrovec (2017). ‘Du rapport micro-diachronique futur simple/
futur périphrastique en français moderne. Étude des variables temporelles et aspec-
tuelles’, Corela, HS-21. URL: http://corela.revues.org/4804
Ackerman, Farrell, James Blevins, and Robert Malouf (2009). ‘Parts and wholes: Implicative
patterns in inflectional paradigms’, in James P. Blevins and Juliette Blevins (eds), Analogy
in Grammar: Form and Acquisition. Oxford: Oxford University Press, 54–82.
Ackerman, Farrell and Robert Malouf (2013). ‘Morphological organization: The Low
Conditional Entropy Conjecture’, Language 89(3): 429–64. doi:10.1353/lan.2013.0054.
Ackerman, Farrell and Robert Malouf (2015). ‘The No Blur Principle effects as an emergent
property of language systems’, Proceedings of the 41st Annual Meeting of the Berkeley
Linguistics Society. Berkeley, CA, 1–14. doi:10.20354/B4414110014
Ackerman, Farrell and Robert Malouf (2016). ‘Word and pattern morphology: An
information-theoretic approach’, Word Structure 9: 125–31. doi:10.3366/word.2016.0090
Agbetsoamedo, Yvonne (2014). ‘Noun classes in Sɛlɛɛ’, The Journal of West African
Languages 41: 95–124.
Aglarov, M. A. (1988). Sel’skaja obsčina v Nagornom Dagestane v XVII-načale XIX v.
Moscow: Nauka.
Aikhenvald, Alexandra Y. (2000). Classifiers: A Typology of Noun Categorization Devices.
Oxford: Oxford University Press.
Aikhenvald, Alexandra Y. (2002). Language Contact in Amazonia. Oxford: Oxford
University Press.
Aikhenvald, Alexandra Y. (2003a). ‘Mechanisms of change in areal diffusion: New morph-
ology and language contact’, Journal of Linguistics 39(1): 1–29. doi:10.1017/
S0022226702001937
Aikhenvald, Alexandra Y. (2003b). A Grammar of Tariana. Cambridge: Cambridge
University Press.
Aikhenvald, Alexandra Y. (2004). Evidentiality. Oxford: Oxford University Press.
Aikhenvald, Alexandra Y. and Robert M. W. Dixon (1998). ‘Evidentials and areal typology:
A case study from Amazonia’, Language Sciences 20: 241–57.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

346 

Aikhenvald, Alexandra Y. and R. M. W. Dixon (eds) (2006). Grammars in Contact: A Cross-

Linguistic Typology. Oxford: Oxford University Press.
Aikhenvald, Alexandra Y. and Diana Green (1998). ‘Palikur and the typology of classifiers’,
Anthropological Linguistics 40: 429–80.
Åkerberg, Bengt (2012). Älvdalsk grammatik. Älvdalen: Ulum Dalska.
Albright, Adam and Bruce Hayes (2002). ‘Modeling English past tense intuitions with
minimal generalization’, in M. Maxwell (ed.), Proceedings of the 6th Meeting of the ACL
Special Interest Group in Computational Phonology July 2002. New Brunswick, NJ:
Association for Computational Linguistics, 58–69.
Albright, Adam and Bruce Hayes (2003). ‘Rules vs. analogy in English past tenses:
A computational/experimental study’, Cognition 90(2): 119–61.
Alegre, Maria and Peter Gordon (1999a). ‘Frequency effects and the representational status
of regular inflections’, Journal of Memory and Language 40(1): 41–61.
Alegre, Maria and Peter Gordon (1999b). ‘Rule-based versus associative processes in
derivational morphology’, Brain and Language 68(1–2): 347–54.
Allen, Shanley E. M. (2017). ‘Polysynthesis in the acquisition of the Inuit languages’, in
Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook
of Polysynthesis. Oxford: Oxford University Press, 449–72.
Alleyne, Mervin (1996). Syntaxe historique créole. Paris: Editions Karthala.
Ambrazas, Vytautas, Emma Geniušienė, Aleksas Girdenis, Nijolė Sližienė, Dalija Tekorienė,
Adelė Valeckienė, and Elena Valiulytė. 2006. Lithuanian Grammar. 2nd ed. Vilnius:
Baltos Lankos.
Ambridge, Ben and Elena V. M. Lieven (2011). Child Language Acquisition. Cambridge:
Cambridge University Press.
Anderson, Stephen R. (1992). A-Morphous Morphology. Cambridge: Cambridge University
Press.
Anderson, Stephen R. (2015a). ‘Dimensions of morphological complexity’, in Matthew
Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring
Morphological Complexity. Oxford: Oxford University Press, 11–26. doi:10.1093/acprof:
oso/9780198723769.003.0002
Anderson, Stephen R. (2015b). ‘The morpheme: Its nature and use’, in Matthew Baerman
(ed.), The Oxford Handbook of Inflection. Oxford: Oxford University Press, 11–34.
Arika, Ann Lindvall (2012). ‘Glimpses of the linguistic situation in Solomon Islands’. Paper
given at the 6th international conference on ‘Languages, E-Learning and Romanian
Studies’.
Arka, Wayan (2011). A Rongga-English Dictionary with English-Rongga Wordlist. Jakarta:
Penerbit Universitas Atma Jaya.
Arkadiev, Peter (2020). ‘Morphology in typology: Historical retrospect, state of the art,
and prospects’, in Mark Aronoff (ed.), Oxford Research Encyclopedia of Linguistics.
New York: Oxford University Press. doi: 10.1093/acrefore/9780199384655.013.626
Arkadiev, Peter, Axel Holvoet, and Björn Wiemer (2015). ‘Introduction: Baltic linguistics—
State of the art’, in Peter Arkadiev, Axel Holvoet, and Björn Wiemer (eds), Contemporary
Approaches to Baltic Linguistics. Berlin: De Gruyter Mouton, 1–109.
Arkadiev, Peter and Marian Klamer (2019). ‘Morphological theory and typology’, in
Francesca Masini and Jenny Audring (eds), The Oxford Handbook of Morphological
Theory. Oxford: Oxford University Press, 435–54.
Armand, Alain (2014). Dictionnaire kréol rénioné français. Saint-André (Réunion): Epica.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 347

Arnott, David Whitehorn (1970). The Nominal and Verbal Systems of Fula. Oxford:
Clarendon.
Aronoff, Mark (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA:
The MIT Press.
Aronoff, Mark (1998). ‘Isomorphism and monotonicity: Or the disease model of morph-
ology’, in Steven Lapointe, Diane Brentari, and Patrick Farrell (eds), Morphology and Its
Relation to Phonology and Syntax. Stanford, CA: CSLI Publications, 411–18.
Aronoff, Mark (2015). ‘Thoughts on morphology and cultural evolution’, in Laurie Bauer,
Lívia Körtvélyessy, and Pavol Štekauer (eds), Semantics of Complex Words. Cham:
Springer, 277–88. doi:10.1007/978-3-319-14102-2_13
Aski, Janice M. (1995). ‘Verbal suppletion: An analysis of Italian, French and Spanish to go’,
Linguistics 33(3): 403–32. doi:10.1515/ling.1995.33.3.403
Atkinson, Mark, Kenny Smith, and Simon Kirby (2018). ‘Adult learning and language
simplification’, Cognitive Science 42(8): 2818–54. doi:10.1111/cogs.12686
Audring, Jenny (2014). ‘Gender as a complex feature’, Language Sciences 43: 5–17.
doi:10.1016/j.langsci.2013.10.003
Audring, Jenny (2017). ‘Calibrating complexity: How complex is a gender system?’,
Language Sciences 60: 53–68. doi:10.1016/j.langsci.2016.09.003
Audring, Jenny (2019). ‘Canonical, complex, complicated?’, in Francesca Di Garbo, Bruno
Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity,
vol. I: General Issues and Specific Studies. Berlin: Language Science Press, 15–52. URL:
http://langsci-press.org/catalog/book/223
Azen, Razia and Nicole Traxel (2009). ‘Using dominance analysis to determine predictor
importance in logistic regression’, Journal of Educational and Behavioral Sciences 34(3):
319–47. doi:10.3102/1076998609332754
Baayen, R. Harald (2001). Word Frequency Distributions. Dordrecht: Kluwer Academic
Publishers.
Baayen, R. Harald (2007). ‘Storage and computation in the mental lexicon’, in Gonia
Jarema and Gary Libben (eds), The Mental Lexicon: Core Perspectives. Amsterdam:
Elsevier, 81–104.
Baayen, R. Harald (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics
Using R. Cambridge: Cambridge University Press.
Baayen, R. Harald, Rochelle Lieber, and Robert Schreuder (1997). ‘The morphological
complexity of simplex nouns’, Linguistics 35: 861–77. doi:10.1515/ling.1997.35.5.861
Baayen, R. Harald, Petar Milin, Dusica Filipović Đurđević, Peter Hendrix, and Marco
Marelli (2011). ‘An amorphous model for morphological processing in visual compre-
hension based on naive discriminative learning’, Psychological Review 118(3): 438–81.
doi:10.1037/a0023851
Baayen, R. Harald, Lee H. Wurm, and Joanna Aycock (2007). ‘Lexical dynamics for low-
frequency complex words: A regression study across tasks and modalities’, The Mental
Lexicon 2(3): 419–63. doi:10.1075/ml.2.3.06baa
Babou, Cheikh Anta and Michele Loporcaro (2016). ‘Noun classes and grammatical gender
in Wolof ’, Journal of African Languages and Linguistics 37(1): 1–57. doi:10.1515/jall-
2016-0001
Baechler, Raffaela (2017). Absolute Komplexität in der Nominalflexion. Berlin: Language
Science Press. URL: http://langsci-press.org/catalog/book/134
Baechler, Raffaela and Guido Seiler (eds) (2016). Complexity, Isolation, and Variation.
Berlin: De Gruyter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

348 

Baerman, Matthew (2012). ‘Paradigmatic chaos in Nuer’, Language 88(3): 467–94.

doi:10.1353/lan.2012.0065
Baerman, Matthew (2016). ‘Seri verb classes: Morphosyntactic motivation and morpho-
logical autonomy’, Language 92(4): 792–823. doi:10.1353/lan.2016.0073
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2005). The Syntax-
Morphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press.
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2010). ‘Morphological
complexity: A typological perspective’. Ms, Surrey Morphology Group, University of
Surrey. URL: http://epubs.surrey.ac.uk/814702/
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (eds) (2015a). Understanding
and Measuring Morphological Complexity. Oxford: Oxford University Press.
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2015b). ‘Understanding and
measuring morphological complexity: An introduction’, in Matthew Baerman, Dunstan
Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological
Complexity. Oxford: Oxford University Press, 3–10.
Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2017). Morphological
Complexity. Cambridge: Cambridge University Press.
Baerman, Matthew, Greville G. Corbett, and Dunstan Brown (eds) (2010). Defective
Paradigms: Missing Forms and What They Tell Us. Oxford: Oxford University Press
and British Academy.
Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds)
(2007). Deponency and Morphological Mismatches. Oxford: Oxford University Press
and British Academy.
Baissac, Charles (1880). Etudes du patois mauricien. Nancy: Imprimerie Berger-Levrault.
Baker, Philip (1972). Kreol: A Description of Mauritian Creole. Ann Arbor: Karoma.
Baker, Philip and Chris Corne (1982). Isle de France Creole: Afﬁnities and Origins. Ann
Arbor, MI: Karoma.
Bakker, Peter (1997). A Language of Our Own: The Genesis of Michif, the Mixed Cree-
French Language of the Canadian Métis. Oxford: Oxford University Press.
Bakker, Peter (2003). ‘Mixed languages as autonomous systems’, in Yaron Matras and Peter
Bakker (eds), The Mixed Language Debate: Theoretical and Empirical Advances. Berlin:
Mouton de Gruyter, 107–50.
Bakker, Peter (2013). ‘Michif ’, in Susanne Maria Michaelis, Philipe Maurer, Martin
Haspelmath, and Magnus Huber (eds), The Atlas and Survey of Pidgin and Creole
Languages, vol. 3: Contact Languages Based on Languages from Africa, Australia, and
the Americas. Oxford: Oxford University Press, 158–65.
Bakker, Peter (2014). ‘Creolistics: Back to square one?’, Journal of Pidgin and Creole
Languages 29: 177–94. doi:10.1075/jpcl.29.1.08bak
Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall, and Ingo Plag (2011). ‘Creoles
are typologically distinct from non-creoles’, Journal of Pidgin and Creole Languages
26(1): 5–42. doi:10.1075/jpcl.26.1.02bak
Balode, Laimute and Axel Holvoet (2001). ‘The Latvian language and its dialects’, in Östen
Dahl and Maria Koptjevskaja-Tamm (eds), The Circum-Baltic Languages: Typology and
Contact, vol. 1: Past and Present. Amsterdam: John Benjamins, 3–40.
Bao Diop, Sokhna (2015). ‘Les classes nominales en nyun gunyamolo’, in Denis Creissels
and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques.
Köln: Köppe, 371–405.
Baptista, Marlyse (2003a). ‘Inﬂectional plural marking in creoles and pidgins:
A comparative study’, in Ingo Plag (ed.), The Phonology and Morphology of Creole
Languages. Tübingen: Niemeyer, 315–32.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 349

Baptista, Marlyse. (2003b). ‘Number inﬂection in creole languages’, Interface 6: 3–26.

Becher, Jutta (2001). Untersuchungen zum Sprachwandel im Wolof aus diachroner und
synchroner Perspektive. University of Hamburg PhD dissertation.
Beier, Christine, Lev Michael, and Joel Sherzer (2002). ‘Discourse forms and processes in
indigenous lowland South America: An areal-typological perspective’, Annual Review of
Anthropology 31: 121–45. doi:10.1146/annurev.anthro.31.032902.105935
Bendor-Samuel, John Theodore (ed.) (1989). The Niger-Congo Languages: A Classification
and Description of Africa’s Largest Language Family. Lanham, MD: University Press of
America, by arrangement with the Summer Institute of Linguistics (SIL).
Bentley, W. Holman (1887). Dictionary and Grammar of the Kikongo Language. London:
Trübner & Co.
Bentz, Christian (2016). ‘The low-complexity-belt: Evidence for large-scale language contact
in human pre-history?’, in Sean G. Roberts, Christine Cuskley, Luke McCrohon, Lluís
Barceló-Coblijn, Olga Feher, and Tessa Verhoef (eds), The Evolution of Language:
Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195
Bentz, Christian, Dimitrios Alikaniotis, Michael Cysouw, and Ramon Ferrer-i-Cancho
(2017). ‘The entropy of words—Learnability and expressivity across more than 1000
languages’, Entropy 19: 275. doi:10.3390/e19060275
Bentz, Christian and Aleksandrs Berdicevskis (2016). ‘Learning pressures reduce morpho-
logical complexity: Linking corpus, computational and experimental evidence’, in
Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, and
Philippe Blache (eds), Proceedings of the Workshop ‘Computational Linguistics for
Linguistic Complexity (CL4LC)’. Osaka, Japan, 222–32.
Bentz, Christian and Morten H. Christiansen (2013). ‘Linguistic adaptation: The trade-off
between case marking and fixed word orders in Germanic and Romance languages’, in
Feng Shi and Gang Peng (eds), Eastward Flows the Great River: Festschrift in Honor of
Professor William S-Y. Wang on his 80th Birthday. Hong Kong: City University of Hong
Kong Press, 45–61.
Bentz, Christian, Annemarie Verkerk, Douwe Kiela, Felix Hill, and Paul Buttery (2015).
‘Adaptive communication: Languages with more non-native speakers tend to have fewer
word forms’, PLoS ONE 10(6): e0128254. doi:10.1371/journal.pone.0128254
Bentz, Christian and Bodo Winter (2013). ‘Languages with more second language learners
tend to lose nominal case’, Language Dynamics and Change 3: 1–27. doi:10.1163/
22105832-13030105
Berdicevskis, Aleksandrs, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross,
Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, and
Christian Bentz (2018). ‘Using universal dependencies in cross-linguistic complexity
research’, in Marie-Catherine de Marneffe, Teresa Lynn, and Sebastian Schuster (eds),
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Brussels:
Association for Computational Linguistics, 8–17.
Berdicevskis, Aleksandrs and Arturs Semenuks (submitted). ‘Imperfect language learning
reduces morphological overspecification: Experimental evidence’.
Bernini-Montbrand, Danièle, Ralph Ludwig, Hector Poullet, and Sylviane Telchid (2013).
Dictionnaire créole-français Guadeloupe, avec un abrégé de grammaire créole, un lexique
français-créole, les comparaisons courantes, les locutions et plus de 1000 proverbes. Paris:
Orphie.
Berry, Keith and Christine Berry (1999). A Description of Abun. Canberra: Pacific Linguistics.
Bertrand-Bocandé, Emmanuel (1849). ‘Notes sur la Guinée portugaise ou Sénégambie
méridionale’ [pt. 2], Bulletin de la Société de Géographie 12: 57–93.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

350 

Bickel, Balthasar, Goma Banjade, Martin Gaenszle, Elena Lieven, Netra Prasad Paudyal,
Ichchha Purna Rai, Manoj Rai, Novel Kishore Rai, and Sabine Stoll (2007). ‘Free prefix
ordering in Chintang’, Language, 83(1): 43–73. doi:10.1353/lan.2007.0002
Bickel, Balthasar and Johanna Nichols (2002). ‘Autotypologizing databases and their use in
fieldwork’, in Peter Austin, Helen Dry, and Peter Wittenburg (eds), International LREC
Workshop on Resources and Tools in Field Linguistics, Las Palmas, 26–7 May 2002.
Nijmegen: Max Planck Institute for Psycholinguistics.
Bickel, Balthasar and Johanna Nichols (2005). ‘Inflectional synthesis of the verb’, in Martin
Haspelmath, Matthew Dryer, David Gil, and Bernard Comrie (eds), The World Atlas of
Language Structures. Oxford: Oxford University Press, 94–7.
Bickel, Balthasar and Johanna Nichols (2007). ‘Inflectional morphology’, in Timothy
Shopen (ed.), Language Typology and Syntactic Description, vol. 3: Grammatical
Categories and the Lexicon. Cambridge: Cambridge University Press, 169–240.
Bickel, Balthasar and Johanna Nichols (2013). ‘Inflectional synthesis of the verb’, in
Matthew Dryer and Martin Haspelmath (eds), World Atlas of Language Structures
Online. URL: http://wals.info/chapter/22
Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine
Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga, and John B. Lowe
(2017). The Autotyp typological databases. Version 0.1.0. URL: https://github.com/
autotyp/autotyp-data/tree/0.1.0
Bickel, Balthasar and Fernando Zúñiga (2017). ‘The “word” in polysynthetic languages:
Phonological and syntactic challenges’, in Michael Fortescue, Marianne Mithun, and
Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University
Press, 158–85.
Bickerton, Derek (1981). Roots of Language. Ann Arbor, MI: Karoma.
Bickerton, Derek (1984). ‘The language bioprogram hypothesis’, Behavioral and Brain
Sciences 7(2): 173–88. doi:10.1017/S0140525X00044149
Bickerton, Derek (1988). ‘Creole languages and the bioprogram’, in Frederick Newmeyer
(ed.), Linguistics: The Cambridge Survey, vol. 2: Linguistic Theory. Extensions and
Implications. Cambridge: Cambridge University Press, 268–84.
Birchall, Joshua (2014). Argument Marking Patterns in South American Languages.
Universiteit Nijmegen PhD dissertation.
Blasi, E. Damián, Susanne Maria Michaelis, and Martin Haspelmath (2017). ‘Grammars are
robustly transmitted even during the emergence of creole languages’, Nature Human
Behaviour 1: 723–9. doi:10.1038/s41562-017-0192-4
Blench, Roger (2009). ‘Do the Ghana-Togo mountain languages constitute a genetic
group?’, The Journal of West African Languages 36(1–2): 19–36.
Blevins, James P. (2006). ‘Word-based morphology’, Journal of Linguistics 42(3): 531–73.
doi:10.1017/S0022226706004191
Blevins, James P. (2013). ‘Word-based morphology from Aristotle to modern WP (Word
and Paradigm models)’, in Keith Allen (ed.), The Oxford Handbook of the History of
Linguistics. Oxford: Oxford University Press, 375–95.
Blevins, James P. (2016a). ‘The minimal sign’, in Gregory Stump and Andrew Hippisley
(eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University
Press, 50–69.
Blevins, James P. (2016b). Word and Paradigm Morphology. Oxford: Oxford University
Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 351

Blevins, James P., Petar Milin, and Michael Ramscar (2017). ‘The Zipfian paradigm cell
filling problem’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives
on Morphological Structure: Data and Analyses. Leiden: Brill, 141–58.
Bloomfield, Leonard (1914). ‘Sentence and word’, Transactions and Proceedings of the
American Philological Association 45: 65–75.
Bloomfield, Leonard (1933). Language. New York: Holt.
Blythe, Joe (2009). Doing Referring in Murriny Patha Conversation. University of Sydney
PhD dissertation.
Blythe, Joe, Rachel Nordlinger, and Nicholas Reid (2007). ‘Murriny Patha finite verb
paradigms’. Unpublished ms.
Boilat, David (1858). Grammaire de la langue woloffe. Paris: Imprimerie Impériale. URL:
http://babel.hathitrust.org/cgi/pt?id=wu.89012299343;view=1up;seq=11
Bokamba, Eyamba (1977). ‘The impact of multilingualism on language structures: The case
of Central Africa’, Anthropological Linguistics 19: 181–202.
Bolaños, Katherine (2016). A Grammar of Kakua. Utrecht: LOT.
Bonami, Olivier (2013). ‘Towards a robust assessment of implicative relations in inflec-
tional systems’. Paper given at the ‘Workshop on Computational Approaches to
Morphological Complexity’, Paris.
Bonami, Olivier (2015). ‘Periphrasis as collocation’, Morphology 25: 63–110. doi:10.1007/
s11525-015-9254-3
Bonami, Olivier and Sarah Beniamine (2015). ‘Implicative structure and joint predictive-
ness’, in Vito Pirrelli, Claudia Marzi, and Marcello Ferro (eds), Word Structure and
Word Usage: Proceedings of the NetWordS Final Conference, Pisa, Italy, March 30–April
1, 2015. Pisa: Institute for Computational Linguistics, National Research Council, 4–9.
Bonami, Olivier and Sarah Beniamine (2016). ‘Joint predictiveness in inflectional para-
digms’, Word Structure 9(2): 156–82. doi:10.3366/word.2016.0092
Bonami, Olivier and Gilles Boyé (2002). ‘Suppletion and dependency in inflectional
morphology’, in Frank van Eynde, Lars Hellan, and Dorothee Beermann (eds),
Proceedings of the 8th International Conference on Head-Driven Phrase Structure
Grammar. Stanford: CSLI, 51–70.
Bonami, Olivier and Gilles Boyé (2003). ‘Supplétion et classes flexionnelles dans la con-
jugaison du français’, Langages 15: 102–26.
Bonami, Olivier and Gilles Boyé (2007). ‘French pronominal clitics and the design of
Paradigm Function Morphology’, in Geert E. Booij, Luca Ducceschi, Bernard Fradin,
Emiliano Guevara, Angela Ralli, and Sergio Scalise (eds), On-line Proceedings of the Fifth
Mediterranean Morphology Meeting (MMM5) Fréjus, 15–18 September 2005. Bologna:
University of Bologna, 291–322.
Bonami, Olivier, Gilles Boyé, and Fabiola Henri (2011). ‘Measuring inflectional complexity:
French and Mauritian’. Paper given at the ‘Workshop on Quantitative Measures in
Morphology and Morphological Development’, San Diego.
Bonami, Olivier, Gilles Boyé, and Françoise Kerleroux (2009). ‘L’allomorphie radicale et la
relation flexion-construction’, in Bernard Fradin, Françoise Kerleroux, and Marc Plénat
(eds), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de
Vincennes, 103–25.
Bonami, Olivier and Fabiola Henri (2010). ‘Assessing empirically the complexity of
Mauritian Creole’. Paper given at the conference ‘Formal Approaches to Creole
Studies 2’, Berlin.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

352 

Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2013). ‘Comparing sources of inflectional
morphology in Romance-based creoles’. Paper given at the workshop ‘Portuguese-based
Creoles in Perspective’, Coimbra.
Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2015). ‘Making sense of morphological
complexity’. Paper given at the ‘SeePiCLa Meeting’, Lisbon.
Bond, Oliver, Greville G. Corbett, Marina Chumakina, and Dunstan Brown (eds) (2016).
Archi: Complexities of Agreement in Cross-theoretical Perspective. Oxford: Oxford
University Press.
Booij, Geert E. (1993). ‘Against split morphology’, in Geert E. Booij and Jaap van Marle
(eds), Yearbook of Morphology 1993. Dordrecht: Kluwer, 27–49. doi:10.1007/978-94-
017-3712-8_2
Booij, Geert E. (1997). ‘Allomorphy and the autonomy of morphology’, Folia Linguistica
31: 25–56. doi:10.1515/flin.1997.31.1-2.25
Booij, Geert E. (2010). Construction Morphology. Oxford: Oxford University Press.
Boyé, Gilles and Patricia Cabredo Hofherr (2006). ‘The structure of allomorphy in Spanish
verbal inflection’, Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 13:
9–24.
Bozic, Mirjana and William Marslen-Wilson (2010). ‘Neurocognitive contexts for mor-
phological complexity: Dissociating inflection and derivation’, Language and Linguistics
Compass 4(11): 1063–73. doi:10.1111/j.1749-818X.2010.00254.x
Brandão, Ana Paula B. (2014). A Reference Grammar of Paresi-Haliti (Arawak). University
of Texas at Austin PhD dissertation.
Bresnan, Joan (2007). ‘Is syntactic knowledge probabilistic? Experiments with the English
dative alternation’, in Sam Featherston and Wolfgang Sternefeld (eds), Roots: Linguistics
in Search of Its Evidential Base. Berlin: Mouton de Gruyter, 77–96.
Bresnan, Joan and Marilyn Ford (2013). ‘Predicting syntax: Processing dative constructions
in American and Australian varieties of English’, Language 86(1): 186–213. doi:10.1353/
lan.0.0189
Brown, Dunstan, Greville G. Corbett, Norman M. Fraser, Andrew Hippisley, and Alan
Timberlake (1996). ‘Russian noun stress and Network Morphology’, Linguistics 34(1):
53–107. doi:10.1515/ling.1996.34.1.53
Brown, Dunstan and Andrew Hippisley (2012). Network Morphology: A Defaults-Based
Theory of Word Structure. Cambridge: Cambridge University Press.
Burzio, Luigi (2004). ‘Paradigmatic and syntagmatic relations in Italian verbal inflection’, in
Julie Auger, J. Clancy Clements, and Barbara Vance (eds), Contemporary Approaches to
Romance Linguistics. Amsterdam: John Benjamins, 17–44.
Bybee, Joan L. (1985). Morphology: A Study of the Relation between Meaning and Form.
Amsterdam: John Benjamins.
Bybee, Joan L. (1995). ‘Regular morphology and the lexicon’, Language and Cognitive
Processes 10(5): 425–55. doi:10.1080/01690969508407111
Bybee, Joan L. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford
University Press.
Bybee, Joan L. and Clay Beckner (2015). ‘Language use, cognitive processes, and linguistic
change’, in Claire Bowern and Bethwyn Evans (eds), The Routledge Handbook of
Historical Linguistics. London: Routledge, 503–18.
Bybee, Joan L. and Carol Lynn Moder (1983). ‘Morphological classes as natural categories’,
Language 59: 251–70. doi:10.2307/413574
Bybee, Joan and Dan I. Slobin (1982). ‘Rules and schemas in the development and use of the
English past tense’, Language 58(2): 265–89. doi:10.2307/414099
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 353

Cadely, Jean-Robert (1994). Aspects de la phonologie du créole haïtien. Université du

Québec à Montréal PhD dissertation.
Camara, Sana (2006). Wolof Lexicon and Grammar. Madison, WI: NALRC Press.
Cameron-Faulkner, Thea and Andrew Carstairs-McCarthy (2000). ‘Stem alternants as
morphological signata: Evidence from blur avoidance in Polish nouns’, Natural
Language and Linguistic Theory 18(4): 813–35. doi:10.1023/A:1006496821412
Campbell, Lyle (2012). ‘Typological characteristics of South American indigenous lan-
guages’, in Lyle Campbell and Verónica Grondona (eds), The Indigenous Languages of
South America: A Comprehensive Guide. Berlin: Mouton de Gruyter, 259–330.
Carlin, Eithne (2006). ‘Feeling the need: The borrowing of Cariban functional categories
into Mawayana (Arawak)’, in Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds),
Grammars in Contact: A Cross-Linguistic Perspective. Oxford: Oxford University Press,
313–32.
Carstairs, Andrew (1983). ‘Paradigm economy’, Journal of Linguistics 19(1): 115–28.
doi:10.1017/S0022226700007477
Carstairs, Andrew (1987). Allomorphy in Inflexion. London: Croom Helm.
Carstairs-McCarthy, Andrew (1994). ‘Inflection classes, gender, and the Principle of
Contrast’, Language 70(4): 737–88.
Carstairs-McCarthy, Andrew (1998). ‘How lexical semantics constrains inflectional allo-
morphy’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 1997.
Dordrecht: Springer, 1–24. doi:10.1007/978-94-011-4998-3_1
Carstairs-McCarthy, Andrew (2010). The Evolution of Morphology. Oxford: Oxford
University Press.
Chao, Yuen Ren (1968). A Grammar of Spoken Chinese. Berkeley, CA: University of
California Press.
Chaudenson, Robert (2003). La créolisation. Théorie, applications, implications. Paris:
L’Harmattan.
Childs, G. Tucker (1983). ‘Noun class affix renewal in Southern West Atlantic’, in Jonathan
D. Kaye, Hilda Koopman, Dominique Sportiche, and André Dugas (eds), Current
Approaches to African Linguistics II. Dordrecht: Mouton de Gruyter and Foris
Publications, 17–29.
Childs, G. Tucker (2009). ‘What happens when a language dies? Language change vs.
language death’, Studies in African Linguistics 38(2): 113–30.
Chirikba, Viacheslav A. (2008). ‘The problem of the Caucasian Sprachbund’, in Pieter
C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam: John
Benjamins, 25–94.
Ciucci, Luca (2014). ‘Tracce di contatto tra la famiglia zamuco (ayoreo, chamacoco) e altre
lingue del Chaco. Prime prospezioni’, Quaderni del Laboratorio di Linguistica 13: 1–52.
Clahsen, Harald, Claudia Felser, Kathleen Neubauer, Mikako Sato, and Renita Silva (2010).
‘Morphological structure in native and nonnative language processing’, Language
Learning 60: 21–43. doi:10.1111/j.1467-9922.2009.00550.x
Cobbinah, Alexander (2010). ‘The Casamance as an area of intense language contact: The
case of Baïnounk Gubaher’, in Friederike Lüpke and Mary Raymond (eds), Documenting
Atlantic–Mande convergence and diversity. Special issue of the Journal of Language
Contact—THEMA 3: 175–202.
Cole, Desmond T. (1967). Some Features of Ganda Linguistic Structure. Johannesburg:
Witwatersrand University Press.
Comrie, Bernard (1989). Language Universals and Linguistic Typology. 2nd ed. Chicago:
University of Chicago Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

354 

Comrie, Bernard (1992). ‘Before complexity’, in John A. Hawkins and Murray Gell-Mann
(eds), The Evolution of Human Languages. London: Addison-Wesley, 193–211.
Comrie, Bernard, Lucía A. Golluscio, Hebe Gonzáles, and Alejandra Vidal (2010). ‘El
Chaco como área lingüística’, in Z. Estrada Fernández and R. Arzápalo Marín (eds),
Estudios de lenguas amerindias, vol. 2: Contribuciones al estudio de las lenguas originarias
de América. Hermosillo, Sonora (Mexico): Editorial Unison, 85–130.
Corbett, Greville G. (1982). ‘Gender in Russian: An account of gender specification and its
relationship to declension’, Russian Linguistics 6(2): 197–232.
Corbett, Greville G. (1991). Gender. Cambridge: Cambridge University Press.
Corbett, Greville G. (2000). Number. Cambridge: Cambridge University Press.
Corbett, Greville G. (2007). ‘Canonical typology, suppletion, and possible words’, Language
83(1): 8–42. doi:10.1353/lan.2007.0006
Corbett, Greville G. (2009). ‘Suppletion: Typology, markedness, complexity’, in Patrick
O. Steinkrüger and Manfred Krifka (eds), On Inflection. Berlin: Mouton de Gruyter,
25–40.
Corbett, Greville G. (2013a). ‘Canonical morphosyntactic features’, in Dunstan Brown,
Marina Chumakina, and Greville Corbett (eds), Canonical Morphology and Syntax.
Oxford: Oxford University Press, 48–65.
Corbett, Greville G. (2013b). ‘The unique challenge of the Archi paradigm’, in Chundra
Cathcart, Shinae Kang, and Clare S. Sandy (eds), Proceedings of the 37th Annual Meeting,
Berkeley Linguistics Society: Special Session on Languages of the Caucasus, 52–67.
Corbett, Greville G. (2015). ‘Morphosyntactic complexity: A typology of lexical splits’,
Language 91(1): 145–93. doi:10.1353/lan.2015.0003
Corbett, Greville G. and Sebastian Fedden (2016). ‘Canonical gender’, Journal of Linguistics
52: 495–531. doi:10.1017/S0022226715000195
Corbett, Greville G. and Norman M. Fraser (1993). ‘Network Morphology: A DATR
account of Russian nominal inflection’, Journal of Linguistics 29(1): 113–42.
doi:10.1017/S0022226700000074
Corbett, Greville G., Andrew Hippisley, Dunstan Brown, and Paul Marriott (2001).
‘Frequency, regularity and the paradigm: A perspective from Russian on a complex
relation’, in Joan Bybee and Paul J. Hopper (eds), Frequency and the Emergence of
Linguistic Structure. Amsterdam: John Benjamins, 201–26.
Corne, Chris (1982). ‘A contrastive analysis of Reunion and Isle de France Creole French:
Two typologically diverse languages’, in Philip Baker and Chris Corne (eds), Isle de
France Creole: Affinities and Origins. Ann Arbor, MI: Karoma, 8–129.
Corne, Chris (1999). From French to Creole. London: University of Westminster Press.
Cotterell, Ryan, Christo Kirov, Mans Hulden, and Jason Eisner (2019). ‘On the complexity
and typology of inflectional morphological systems’, Transactions of the Association for
Computational Linguistics 7: 327–42. doi: 10.1162/tacl_a_00271
Crevels, Mily and Hein van der Voort (2008). ‘The Guaporé-Mamoré Region as a Linguistic
Area’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam:
John Benjamins, 151–79.
Croft, William (1991). Syntactic Categories and Grammatical Relations: The Cognitive
Organization of Information. Chicago: University of Chicago Press.
Croft, William (2001). Radical Construction Grammar: Syntactic Theory in Typological
Perspective. Oxford: Oxford University Press.
Cruschina, Silvio, Martin Maiden, and John C. Smith (eds) (2013). The Boundaries of Pure
Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 355

Cuskley, Christine, Francesca Colaiori, Claudio Castellano, Vittorio Loreto, Martina

Pugliese, and Francesca Tria (2015). ‘The adoption of linguistic rules in native and
non-native speakers: Evidence from a Wug task’, Journal of Memory and Language 84:
205–23. doi:10.1016/j.jml.2015.06.005
Dahl, Östen (2004). The Growth and Maintenance of Linguistic Complexity. Amsterdam:
John Benjamins.
Dahl, Östen (2009). ‘Increases in complexity as a result of language contact’, in Kurt
Braunmüller and Juliane House (eds), Convergence and Divergence in Language
Contact Situations. Amsterdam: John Benjamins, 41–52.
Dahl, Östen (2017). ‘Polysynthesis and complexity’, in Michael Fortescue, Marianne
Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford:
Oxford University Press, 19–29.
Dahl, Östen (2018). ‘Grammaticalization in the languages of Europe’, in Bernd Heine and
Heiko Narrog (eds), Grammaticalization from a Typological Perspective. New York:
Oxford University Press, 79–96.
Dale, Rick and Gary Lupyan (2012). ‘Understanding the origins of morphological diversity:
The Linguistic Niche Hypothesis’, Advances in Complex Systems 15(3–4): 1150017.
doi:10.1142/S0219525911500172
Danielsen, Swintha (2007). Baure: An Arawak Language of Bolivia. Leiden: CNWS Publications.
Dard, Jean (1825). Dictionnaire français–wolof et français–bambara, suivi du dictionnaire
wolof–français. Paris: Imprimerie Royale.
Dard, Jean (1826). Grammaire wolofe ou méthode pour étudier la langue des noirs qui
habitent les royaumes de Bourba-Yolof, de Walo, de Damel, de Bour-Sine, de Saloume, de
Baole, en Sénégambie. Paris: Imprimerie Royale.
Daugherty, Kim G. and Mark S. Seidenberg (1994). ‘Beyond rules and exceptions:
A connectionist approach to inﬂectional morphology’, in Susan D. Lima, Roberta
L. Corrigan, and Gregory Iverson (eds), The Reality of Linguistic Rules. Amsterdam:
John Benjamins, 353–88.
de Boeck, Egide (1904). Grammaire et vocabulaire du Lingala, ou Langue du Haut-Congo.
Brussels: Polleunis-Ceuterick.
DeGraff, Michel (2001). ‘On the origin of creoles: A Cartesian critique of Neo-Darwinian
linguistics’, Linguistic Typology 5(2–3): 213–310. doi:10.1515/lity.2001.002
DeGraff, Michel (2003). Against creole exceptionalism. Language 79(4): 391–410.
DeGraff, Michel (2005). ‘Linguists’ most dangerous myth: The fallacy of creole exception-
alism’, Language in Society 34: 533–91. doi:10.1017/S0047404505050207
DeGraff, Michel (2007). ‘Haitian creole’. In John Holm and Peter L. Patrick (eds),
Comparative Creole Syntax: Parallel Outlines of Eighteen Creole Grammars, vol. 7 of
Westminster Creolistic Series. London: Battlebridge Publications, 101–26.
de Groot, Casper (2008). ‘Morphological complexity as a parameter of linguistic typology:
Hungarian as a contact language’, in Matti Miestamo, Kaius Sinnemäki, and Fred
Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John
Benjamins, 191–214.
de Haan, Ferdinand (2013). ‘Semantic distinctions of evidentiality’, in Matthew S. Dryer
and Martin Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig:
Max Planck Institute for Evolutionary Anthropology. URL: http://wals.info/chapter/77
de Jong, Nivja Helena (2002). Morphological Families in the Mental Lexicon. Universiteit
Nijmegen PhD dissertation.
DeKeyser, Robert M. (2005). ‘What makes learning second-language grammar difﬁcult?
A review of issues’, Language Learning 55: 1–25. doi:10.1111/j.0023-8333.2005.00294.x
de Leeuw, Joshua R. (2014). ‘jsPsych: A JavaScript library for creating behavioral experiments
in a Web browser’, Behavior Research Methods 47(1): 1–12. doi:10.3758/s13428-014-0458-y
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

356 

Delafosse, Maurice (1927). ‘Les classes nominales en wolof ’, in Festschrift Meinhof.

Sprachwissenschaftliche und andere Studien. Glückstadt: L. Friedrichsen, 29–44.
[Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr.
Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press,
25–42.]
DeLancey, Scott (2011). ‘On the origin of Sinitic’, in Zhuo Jing-Schmidt (ed.), Proceedings
of the 23rd North American Conference on Chinese Linguistics. Eugene: University of
Oregon, 51–64.
Derbyshire, Desmond (1987). ‘Morphosyntactic areal characteristics of Amazonian lan-
guages’, International Journal of American Linguistics 53: 311–26. doi:10.1086/466060
Derbyshire, Desmond and Doris Payne (1990). ‘Noun classiﬁcation systems of Amazonian
languages’, in Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South
American Languages. Austin, TX: University of Texas Press, 243–71.
Derwing, Bruce L. (1990). ‘Morphology and the mental lexicon: Psycholinguistic evidence’,
in Wolfgang U. Dressler, Hans C. Luschützky, Oskar E. Pfeiffer, and John R. Rennison
(eds), Contemporary Morphology. Berlin: Mouton de Gruyter, 249–65.
Deutscher, Guy (2009). ‘ “Overall complexity”: A wild goose chase?’, in Geoffrey Sampson,
David Gil, and Peter S. Trudgill (eds), Language Complexity as an Evolving Variable.
Oxford: Oxford University Press, 243–51.
Diagne, Anna M., Sascha Kesseler, and Christian Meyer (eds) (2011). Communication wolof
et société sénégalaise. Héritage et création. Paris: L’Harmattan.
Diallo, Abdourahmane (2010). ‘Morphological consequences of Mande borrowings in Fula:
The case of Pular, Fuuta–Jaloo’, in Friederike Lüpke and Mary Raymond (eds),
Documenting Atlantic–Mande Convergence and Diversity. Special issue of the Journal
of Language Contact—THEMA 3: 71–85.
Diallo, Abdourahmane (2014). Language Contact in Guinea: The Case of Pular and Mande
Varieties. Köln: Köppe.
Di Garbo, Francesca (2014). Gender and Its Interaction with Number and Evaluative
Morphology: An Intra- and Intergenealogical Typological Survey of Africa. Stockholm
University PhD dissertation.
Di Garbo, Francesca (2016). ‘Exploring grammatical complexity crosslinguistically: The
case of gender’, Linguistic Discovery 14: 46–85. doi:10.1349/PS1.1537-0852.A.468
Di Garbo, Francesca and Matti Miestamo (2019). ‘The evolving complexity of gender
agreement systems’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli
(eds), Grammatical Gender and Linguistic Complexity, vol. II: World-Wide
Comparative Studies. Berlin: Language Science Press, 15–60. doi:10.5281/
zenodo.3462778
Dimmendaal, Gerrit J. (2011). Historical Linguistics and the Comparative Study of African
Languages. Amsterdam: John Benjamins.
Diouf, Jean Léopold (2009). Grammaire du wolof contemporain. Edition revue et complétée.
Paris: L’Harmattan.
Dixon, Robert M. W. (2002). Australian Languages: Their Nature and Development.
Cambridge: Cambridge University Press.
Dixon, Robert M. W. (2004). The Jarawara Language of Southern Amazonia. Oxford:
Oxford University Press.
Dixon, Robert M. W. and Alexandra Y. Aikhenvald (1999). ‘Introduction’, in Robert
M. W. Dixon and Alexandra Y. Aikhenvald (eds), The Amazonian Languages.
Cambridge: Cambridge University Press, 1–22.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 357

Doneux, Jean Léonce (1975). ‘Hypothèses pour la comparative des langues atlantiques’,
Africana Linguistica 6: 41–129.
Doneux, Jean Léonce (1978). ‘Les liens historiques entre les langues du Sénégal’, Réalités
africaines et langue française 7: 6–55.
Donohue, Mark (2009). ‘Flores languages’, in Keith Brown and Sarah Ogilvie (eds), Concise
Encyclopedia of Languages of the World. Oxford: Elsevier, 420–1.
Donohue, Mark and Tim Denham (to appear). ‘Becoming Austronesian: Mechanisms of
language dispersal across southern island Southeast Asia’, in David Gil and Antoinette
Schapper (eds), Austronesian Undressed.
Donohue, Mark and Johanna Nichols (2011). ‘Does phoneme inventory size correlate with
population size?’, Linguistic Typology 15(2): 161–70. doi:10.1515/lity.2011.011
Dorian, Nancy (1978). ‘The fate of morphological complexity in language death: Evidence
from East Sutherland Gaelic’, Language 54(3): 590–609.
Dressler, Wolfgang U. (2003). ‘Degrees of grammatical productivity in inflectional morph-
ology’, Italian Journal of Linguistics 15(1): 31–62.
Dressler, Wolfgang U. (2005). ‘Morphological typology and first language acquisition:
Some mutual challenges’, in Geert E. Booij, Emiliano Guevara, Angela Ralli, Salvatore
Sgroi, and Sergio Scalise (eds), Morphology and Linguistic Typology: On-line Proceedings
of the Fourth Mediterranean Morphology Meeting (MMM4), Catania, 21–23 September
2003, 7–20.
Dressler, Wolfgang U. (2011). ‘The rise of complexity in inflectional morphology’, Poznań
Studies in Contemporary Linguistics 47(2): 159–76. doi:10.2478/psicl-2011-0013
Dressler, Wolfgang U. (2019). ‘Natural morphology’, in Mark Aronoff (ed.), The Oxford
Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/
acrefore/9780199384655.013.576
Dressler, Wolfgang U. and Marianne Kilani-Schoch (2016). ‘Natural morphology’, in
Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of
Morphology. Cambridge: Cambridge University Press, 356–89.
Dressler, Wolfgang U., Alona Kononenko, Sabine Sommer-Lolei, Katharina Korecky-Kröll,
Paulina Zydorowicz, and Laura Kamandulytė-Merfeldienė (2019). ‘Morphological rich-
ness, transparency and the evolution of morphonotactic patterns’, Folia Linguistica
s40(1): 85–106. doi:10.1515/flih-2019-0005
Dressler, Wolfgang U., Willi Mayerthaler, Oswald Panagl, and Wolfgang U. Wurzel (1987).
Leitmotifs in Natural Morphology. Amsterdam: John Benjamins.
Dressler, Wolfgang U., Sabine Sommer-Lolei, Katharina Korecky-Kröll, Reili Argus, Ineta
Dabašinskienė, Laura Kamandulytė-Merfeldienė, Johanna J. Ijäs, Victoria
V. Kazakovskaya, Klaus Laalo, and Evangelia Thomadaki (2019). ‘First-language acqui-
sition of synthetic compounds in Estonian, Finnish, German, Greek, Lithuanian, Russian
and Saami’, Morphology 29(3): 409–29. doi:10.1007/s11525-019-09339-0
Dryer, Matthew S. (2013). ‘Coding of nominal plurality’, in Matthew S. Dryer and Martin
Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck
Institute for Evolutionary Anthropology. URL: https://wals.info/chapter/33
Dryer, Matthew and Martin Haspelmath (eds) (2013). The World Atlas of Language
Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL:
http://wals.info
Duke, Janet (2010). ‘Gender reduction and loss in Germanic: The Scandinavian, Dutch, and
Afrikaans case studies’, in Antje Dammel, Sebastian Kürschner, and Damaris Nübling
(eds), Kontrastive germanistische Linguistik. Hildesheim: Olms, 643–72.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

358 

Ehret, Katharina and Benedikt Szmrecsanyi (2016). ‘An information-theoretic approach to

assess linguistic complexity’, in Raffaela Baechler and Guido Seiler (eds), Complexity,
Isolation, and Variation. Berlin: de Gruyter Mouton, 71–94.
Ehrhart, Sabine (1993). Le créole français de St-Louis (le tayo) en Nouvelle-Calédonie.
Hamburg: Helmut Buske.
Epps, Patience (2005). ‘Areal diffusion and the development of evidentiality: Evidence from
Hup’, Studies in Language 29(3): 617–50. doi:10.1075/sl.29.3.04epp
Epps, Patience (2007a). ‘The Vaupés melting pot: Tukanoan influence on Hup’, in
Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds), Grammars in Contact:
A Cross-Linguistic Typology. Oxford: Oxford University Press, 267–89.
Epps, Patience (2007b). ‘Birth of a noun classification system: The case of Hup’, in Leo
Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and
Anthropological Studies with Special Emphasis on the Languages and Cultures of the
Andean-Amazonian Border Area. The Netherlands: Leiden University, 107–28.
Epps, Patience (2008). A Grammar of Hup. Berlin: Mouton de Gruyter.
Epps, Patience (2010). ‘Linking valence change and modality: Diachronic evidence
from Hup’, International Journal of American Linguistics 76(3): 335–56. doi:10.1086/
652792
Epps, Patience (2013). ‘Inheritance, calquing, or independent innovation? Reconstructing
morphological complexity in Amazonian numerals’, Journal of Language Contact 6:
329–57. doi:10.1163/19552629-00602007
Epps, Patience (2020). ‘Amazonian linguistic diversity and its sociocultural correlates’, in
Mily Crevels, and Pieter C. Muysken (eds), Language Dispersal, Diversification, and
Contact: A Global Perspective. Oxford: Oxford University Press, 275–90.
Epps, Patience and Lev Michael (2017). ‘The areal linguistics of Amazonia’, in Raymond
Hickey (ed.), The Cambridge Handbook of Areal Linguistics. Cambridge: Cambridge
University Press, 934–63.
Evans, Nicholas (2003). Bininj Gun-Wok: A Pan-Dialectal Grammar of Mayali, Kunwinjku
and Kune. Canberra: Pacific Linguistics.
Facundes, Sidney da Silva (2000). The Language of the Apurinã People of Brazil. The State
University of New York at Buffalo PhD dissertation.
Fal, Arame, Rosine Santos, and Jean Léonce Doneux (1990). Dictionnaire wolof-français.
Paris: Karthala.
Falkenberg, Johannes (1962). Kin and Totem: Group Relations of Aborigines in the Port
Keats District. Oslo: Oslo University Press.
Faye, Souleymane (2013). Grammaire dialectale du seereer. Dakar: La maison du livre
universel E.L.U.
Fedden, Sebastian and Greville G. Corbett (2017). ‘Gender and classifiers as concurrent
systems: Refining the typology of nominal classification’, Glossa 2(1), 34. doi: 10.5334/
gjgl.177
Feist, Timothy (2015). A Grammar of Skolt Saami. Helsinki: Suomalais-Ugrilainen Seura.
Feldman, Laurie B. (2000). ‘Are morphological effects distinguishable from the effects of
shared meaning and shared form?’, Journal of Experimental Psychology. Learning,
Memory, and Cognition 26(6): 1431–44. doi:10.1037//0278-7393.26.6.1431
Fenk-Oczlon, Gertraud and August Fenk (2008). ‘Complexity trade-offs between the
subsystems of language’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson
(eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins,
43–65.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 359

Fenk-Oczlon, Gertraud and August Fenk (2014). ‘Complexity trade-offs do not prove the
equal complexity hypothesis’, Poznań Studies in Contemporary Linguistics 50(2): 145–55.
doi:10.1515/psicl-2014-0010
Ferguson, Charles A. (1971). ‘Absence of copula and the notion of simplicity: A study of
normal speech, baby talk, foreigner talk, and pidgins’, in Dell Hymes (ed.), Pidginization
and Creolization of Languages. Cambridge: Cambridge University Press, 141–50.
Ferronha, António Luís (ed.) (1994). Tratado breve dos Rios de Guiné do Cabo-Verde. Feito
pelo Capitão André Álvares d’Almada. Ano de 1594. Lisboa: Grupo de Trabalho do
Ministério da Educação para as Comemorações dos Descobrimentos Portugueses.
Ferry, Marie-Paule and Konstantin Pozdniakov (2001). ‘Dialectique du régulier et de
l’irrégulier. Le système des classes nominales dans le groupe tenda des langues atlan-
tiques’, in Robert Nicolaï (ed.), Leçons d’Afrique. Filiations, ruptures et reconstitution de
langues. Un hommage à Gabriel Manessy. Louvain: Peeters, 153–67.
Fertig, David (2000). Morphological Change Up Close: Two and a Half Centuries of Verbal
Inflection in Nuremberg. Berlin: De Gruyter.
Field, Andy, Jeremy Miles, and Zoë Field (2012). Discovering Statistics Using R. London:
Sage.
Finkel, Raphael and Gregory Stump (2007). ‘Principal parts and morphological typology’,
Morphology 17(1): 39–75. doi:10.1007/s11525-007-9115-9
Finkel, Raphael and Gregory Stump (2009). ‘Principal parts and degrees of paradigmatic
transparency’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form
and Acquisition. Oxford: Oxford University Press, 13–53.
Finkel, Raphael and Gregory Stump (2013). Principal parts analyzer. URL: http://www.cs.
uky.edu/raphael/linguistics/analyze.html (accessed July 2016).
Fiorentino, Robert and David Poeppel (2007). ‘Compound words and structure in the
lexicon’, Language and Cognitive Processes 22(7): 953–1000. doi:10.1080/
01690960701190215
Fitch, W. Tecumseh and Marc D. Hauser (2004). ‘Computational constraints on syntactic
processing in a nonhuman primate’, Science 303(5656): 377–80. doi:10.1126/
science.1089401
Fleck, David (2007). ‘Evidentiality and double tense in Matses’, Language 83: 589–614.
doi:10.1353/lan.2007.0113
Forshaw, William (2016). Little Kids, Big Verbs: The Acquisition of Murrinhpatha Bipartite
Stem Verbs. University of Melbourne PhD dissertation.
Fortescue, Michael (1992). ‘Morphophonemic complexity and typological stability in a
polysynthetic language family’, International Journal of American Linguistics 58(2):
242–8. doi:10.1086/ijal.58.2.3519761
Fowler, Catherine S. (1972). ‘Some ecological clues to Proto-Numic homelands’, in Don
D. Fowler (ed.), Great Basin Cultural Ecology: A Symposium. Reno Desert Research
Institute Publications in the Social Sciences, 105–21.
Frenda, Alessio (2011). ‘Gender in Irish between continuity and change’, Folia Linguistica
45: 283–316. doi:10.1515/flin.2011.012
Gabas Jr, Nilson (1999). A Grammar of Karo, Tupi (Brazil). University of California at
Santa Barbara PhD dissertation.
Gal, Susan (1989). ‘Lexical innovation and loss: Restricted Hungarian’, in Nancy Dorian
(ed.), Investigating Obsolescence: Studies in Language Contraction and Death.
Cambridge: Cambridge University Press, 313–31.
Gamble, David (1957). Elementary Wolof Grammar. London: Research Department
Colonial Office. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

360 

Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of

Dakar Press, 131–61.]
Gao, Yongming (1998). Mental Representations of Chinese Numeral Classifiers. Lehigh
University PhD dissertation.
Gardani, Francesco (2008). Borrowing of Inflectional Morphemes in Language Contact.
Frankfurt am Main: Peter Lang.
Gardani, Francesco (2012). ‘Plural across inflection and derivation, fusion and agglutin-
ation’, in Lars Johanson and Martine I. Robbeets (eds), Copies versus Cognates in Bound
Morphology. Leiden: Brill, 71–97.
Gardani, Francesco (2013). Dynamics of Morphological Productivity: The Evolution of Noun
Classes from Latin to Italian. Leiden: Brill.
Gardani, Francesco (2015). ‘Affix pleonasm’, in Peter O. Müller, Ingeborg Ohnheiser, Susan
Olsen, and Franz Rainer (eds), Word-Formation. An International Handbook of the
Languages of Europe, vol. 1. Berlin: De Gruyter Mouton, 537–50.
Gardani, Francesco (2018). ‘On morphological borrowing’, Language and Linguistics
Compass 12(10): 1–17. doi:10.1111/lnc3.12302
Gardani, Francesco, Franz Rainer, and Hans Christian Luschützky (2019). ‘Competition in
morphology: A historical outline’, in Franz Rainer, Francesco Gardani, Wolfgang
U. Dressler, and Hans Christian Luschützky (eds), Competition in Inflection and
Word-Formation. Cham: Springer, 3–36. doi:10.1007/978-3-030-02550-2_1
Gblem-Poidi, Massanvi Honorine (2007). ‘Nominal classes and concord in Igo (Ahlon)’, in
Mary Esther Kropp Dakubu, George Akanlig-Pare, Kweku E. Osam, and Kofi K. Saah
(eds), Proceedings of the Annual Colloquium of the Legon-Trondheim Linguistics Project
10–20 January 2005, vol. 4. Legon: Linguistics Department, University of Ghana, 52–60.
Gell-Mann, Murray (1994). The Quark and the Jaguar: Adventures in the Simple and the
Complex. London: Little Brown.
Gell-Mann, Murray (1995). ‘What is complexity?’, Complexity 1(1): 16–19.
Gervain, Judith and Jacques Mehler (2010). ‘Speech perception and language acquisition in
the first year of life’, Annual Review of Psychology 61: 191–218. doi:10.1146/annurev.
psych.093008.100408
Gibbons, Jean Dickinson (1992). Nonparametric Measures of Association. Newbury Park,
CA: Sage.
Gippert, Jost, Wolfgang Schulze, Zaza Aleksidze, and Jean-Pierre Mahé (2009). The
Caucasian Albanian Palimpsests of Mount Sinai. Turnhout, Belgium: Brepols.
Givón, Talmy (1971). ‘Historical syntax and synchronic morphology: An archeologist’s
fieldtrip’, Proceedings of the Chicago Linguistic Society 7: 394–415.
Goertzel, Ben (1994). Chaotic Logic: Language, Thought, and Reality from the Perspective of
Complex Systems Science. Boston: Springer.
Goldsmith, John (2001). ‘Unsupervised learning of the morphology of a natural language’,
Computational Linguistics 27(2): 153–98. doi:10.1162/089120101750300490
Goldsmith, John (2011). ‘The evaluation metric in Generative Grammar.’ Paper presented
at the 50th anniversary celebration for the MIT Department of Linguistics.
Gomez-Imbert, Elsa (1996). ‘When animals become “rounded” and “feminine”: Conceptual
categories and linguistic classification in a multilingual setting’, in John J. Gumperz and
Stephen C. Levinson (eds), Rethinking Linguistic Relativity. Cambridge: Cambridge
University Press, 438–69.
Gomez-Imbert, Elsa (2007). ‘Tukanoan nominal classification: The Tatuyo system’, in Leo
Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 361

Anthropological Studies with Special Emphasis on the Languages and Cultures of the
Andean-Amazonian Border Area. Leiden: Leiden University, 401–28.
Good, Jeff (2012a). ‘How to become a “Kwa” noun’, Morphology 22: 293–335. doi:10.1007/
s11525-011-9197-2
Good, Jeff (2012b). ‘Typologizing grammatical complexities: Or why creoles may be
paradigmatically simple but syntagmatically average’, Journal of Pidgin and Creole
Languages 27(1): 1–47. doi:10.1075/jpcl.27.1.01goo
Good, Jeff (2015). ‘Paradigmatic complexity in pidgins and creoles’, Word Structure 8(2):
184–227. doi:10.3366/word.2015.0081
Good, Jeff (2016). The Linguistic Typology of Templates. Cambridge: Cambridge University
Press.
Grant, Anthony P. (1996). ‘The evolution of functional categories in Grande Ronde
Chinook Jargon: Ethnolinguistic and grammatical considerations’, in Philip Baker and
Anand Syea (eds), Changing Meanings, Changing Functions: Papers Relating to
Grammaticalization in Creole Languages. London: University of Westminster Press,
225–42.
Grant, Anthony (2009). ‘Admixture, structural transmission, simplicity and complexity’, in
Nicholas Faraclas and Thomas Klein (eds), Simplicity and Complexity in Creoles and
Pidgins. London: Battlebridge Publications, 125–52.
Green, Ian (2003). ‘The genetic status of Murrinh-patha’, in Nicholas Evans (ed.), The Non-
Pama-Nyungan Languages of Northern Australia. Canberra: Paciﬁc Linguistics, 125–58.
Greenberg, Joseph H. (1954). ‘A quantitative approach to the morphological typology of
language’, in Robert F. Spencer (ed.), Method and Perspective in Anthropology: Papers in
Honor of Wilson D. Wallis. Minneapolis: Minnesota University Press, 192–220.
Greenberg, Joseph H. (1960). ‘A quantitative approach to the morphological typology of
language’, International Journal of American Linguistics 26(3): 178–94. doi:10.1086/
464575
Grijns, Cornelis D. (1991). Jakarta Malay: A Multidimensional Approach to Spatial
Variation. Leiden: KITLV Press.
Grinevald, Colette and Frank Seifart (2004). ‘Noun classes in African and Amazonian
languages: Towards a comparison’, Linguistic Typology 8: 243–85. doi:10.1515/
lity.2004.007
Grünwald, Peter D. (2007). The Minimum Description Length Principle. Cambridge, MA:
The MIT Press.
Guérin, Maximilien (2011). Le syntagme nominal en wolof. Une approche typologique. Paris:
Université Sorbonne Nouvelle—Paris 3 MA thesis.
Guillaume, Antoine (2008). A Grammar of Cavineña. Berlin: Mouton de Gruyter.
Guillaume, Antoine (2016). ‘Associated motion in South America: Typological and areal
perspectives’, Linguistic Typology 20: 81–177. doi:10.1515/lingty-2016-0003
Guillaume, Antoine and Françoise Rose (2010). ‘Sociative causative markers in South
American languages: A possible areal feature’, in Franck Floricic (ed.), Essais de typologie
et de linguistique générale, Mélanges offerts à Denis Creissels. Lyon: ENS Éditions,
383–402.
Guy, Gregory (1991). ‘Explanation in variable phonology: An exponential model of mor-
phological constraints’, Language Variation and Change 3: 1–22. doi:10.1017/
S0954394500000429
Hale, Kenneth (1969). Walbiri Conjugations. Cambridge, MA: MIT.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

362 

Halle, Moris (1994). ‘The Russian declension: An illustration of the theory of Distributed
Morphology’, in Jennifer S. Cole and Charles Kisseberth (eds), Perspectives in Phonology.
Stanford: CSLI Publications, 29–60.
Hammarström, Harald, Robert Forkel, and Martin Haspelmath (eds) (2019). Glottolog 3.4.
Jena: Max Planck Institute for the Science of Human History. URL: https://glottolog.org
Hansson, Inga-Lill (2003). ‘Akha’, in Randy LaPolla and Graham Thurgood (eds). The
Sino-Tibetan Languages. London: Routledge, 236–51.
Harris, Alice (2004). ‘History in support of synchrony’, in Charles Chang, Michael
J. Houser, Yuni Kim, David Mortensen, and Mischa Park-Doob (eds), Proceedings of
the Berkeley Linguistics Society. Berkeley Linguistics Society, 142–59.
Harris, Alice (2017). Multiple Exponence. Oxford: Oxford University Press.
Harris, Alice and Lyle Campbell (1995). Historical Syntax in Cross-linguistic Perspective.
Cambridge: University of Cambridge Press.
Haspelmath, Martin (2009). ‘An empirical test of the Agglutination Hypothesis’, in Sergio
Scalise, Elisabetta Magni, and Antonietta Bisetto (eds), Universals of Language Today.
Dordrecht: Springer, 13–29.
Haspelmath, Martin (2011). ‘The indeterminacy of word segmentation and the nature of
morphology and syntax’, Folia Linguistica 45(1): 31–80. doi:10.1515/flin-2017-1005
Haspelmath, Martin, Matthew Dryer, David Gil, and Bernard Comrie (eds) (2005). The
World Atlas of Language Structures. Oxford: Oxford University Press.
Haspelmath, Martin and Thomas Müller-Bardey (2004). ‘Valency change’, in Geert
E. Booij, Christian Lehmann, Joachim Mugdan, and Stavros Skopeteas (in collaboration
with Wolfgang Kesselheim) (eds), Morphology: A Handbook on Inflection and Word
Formation, vol. 2. Berlin: de Gruyter, 1130–45.
Haspelmath, Martin and Andrea D. Sims (2010). Understanding Morphology. 2nd ed.
London: Hodder Education.
Haude, Katharina (2006). A Grammar of Movima. Universiteit Nijmegen PhD dissertation.
Hauser, Marc D., Noam Chomsky, and Tecumseh W. Fitch (2002). ‘The faculty of
language: What is it, who has it, and how did it evolve?’, Science 298(5598): 1569–79.
doi:10.1126/science.298.5598.1569
Hawkins, John A. (2004). Efficiency and Complexity in Grammars. New York: Oxford
University Press.
Hawkins, John A. (2007). ‘Processing typology and why psychologists need to know about
it’, New Ideas in Psychology 25: 87–107. doi:10.1016/j.newideapsych.2007.02.003
Hawkins, John A. (2014). Cross-Linguistic Variation and Efficiency. Oxford: Oxford
University Press.
Hay, Jennifer (2001). ‘Lexical frequency in morphology: Is everything relative?’, Linguistics
39(6): 1041–70. doi:10.1515/ling.2001.041
Hay, Jennifer (2003). Causes and Consequences of Word Structure. New York: Routledge.
Hay, Jennifer and Laurie Bauer (2007). ‘Phoneme inventory size and population size’,
Language 83(2): 388–400. doi:10.1353/lan.2007.0071
Haynie, Hannah, Claire Bowern, Patience Epps, Jane Hill, and Patrick McConvell (2014).
‘Wanderwörter in languages of the Americas and Australia’, Ampersand 1: 1–18.
doi:10.1016/j.amper.2014.10.001
Hazaël-Massieux, Marie-Christine (2002). ‘Les créoles à base française: une introduction’,
Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA)
21: 63–86.
Hengeveld, Kees and Sterre Leufkens (2018). ‘Transparent and non-transparent languages’,
Folia Linguistica 52(1): 139–75. doi:10.1515/flin-2018-0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 363

Henri, Fabiola (2010). A Constraint-Based Approach to Verbal Constructions in Mauritian.

University of Mauritius and Université Paris Diderot PhD dissertation.
Henri, Fabiola (2012). ‘Attenuative reduplication in Mauritian’. In Enoch Aboh, Norval
Smith, and Anne Zribi-Hertz (eds), The Morphosyntax of Reiteration. Amsterdam: John
Benjamins, 203–34.
Henri, Fabiola (forthcoming). ‘Morphomic structure in Mauritian: On change, complexity
and creolization’, Morphology.
Henri, Fabiola and Alain Kihm (2015). ‘The morphology of TMA marking in creole languages:
A comparative study’, Word Structure 8(2): 248–82. doi:10.3366/word.2015.0083
Henri, Fabiola, Jean-Marie Marandin, and Anne Abeillé (2008). ‘Information structure coding
in Mauritian: Verum Focus expressed by long forms of verbs’. Paper presented at the
Workshop on Predicate Focus, Verum Focus, Verb Focus, University of Potsdam.
Hill, Jane H. (2001). ‘Proto-Uto-Aztecan: A community of cultivators in Central America?’,
American Anthropologist 103: 913–34. doi:10.1525/aa.2001.103.4.913
Hill, Jane H. (2010). ‘New evidence for a Mesoamerican homeland for Proto-Uto-Aztecan’,
PNAS 107(11): E33. doi:10.1073/pnas.0914473107
Hill, Nathan (2014). ‘Grammatically conditioned sound change’, Language and Linguistics
Compass 8: 211–29. doi:10.1111/lnc3.12073
Hippisley, Andrew, Marina Chumakina, Greville G. Corbett, and Dunstan Brown (2004).
‘Suppletion: Frequency, categories and distribution of stems’, Studies in Language 28(2):
387–418. doi:10.1075/sl.28.2.05hip
Hock, Hans Henrich and Brian D. Joseph (1996). Language History, Language Change, and
Language Relationship. Berlin: Walter de Gruyter.
Hockett, Charles F. (1947). ‘Problems of morphemic analysis’, Language 23(4): 321–43.
Hockett, Charles F. (1958). A Course in Modern Linguistics. New York: Macmillan.
Hodge, Carleton (1970). ‘The linguistic cycle’, Language Sciences 13: 1–7. [Reprinted in
Scott Noegel and Alan S. Kaye (eds) (2004), Afroasiatic Linguistics, Semitics, and
Egyptology: Selected Writings of Carleton T. Hodge, Bethesda, MD: CDL Press, 1–17.]
Hopper, Paul (1990). ‘Where do words come from?’, in William Croft, Keith Denning, and
Suzanne Kemmer (eds), Studies in Typology and Diachrony: Papers Presented to Joseph
H. Greenberg on his 75th Birthday. Amsterdam: John Benjamins, 151–60.
Hualde, José Ignacio, Gorka Elordieta, and Arantzazu Elordeta (1994). The Basque Dialect
of Lekeitio. Bilbo: Universidad del País Vasco/Euskal Herriko Univertsitatea.
Hualde, José Ignacio and Jon Ortiz de Urbina (2003). A Grammar of Basque. Berlin:
Mouton de Gruyter.
Huber, Christian (2011). ‘Some notes on gender and number marking in Shumcho’, in
Gerda Lechleitner and Christian Liebl (eds), Jahrbuch des Phonogrammarchivs, vol. 2.
Göttingen: Cuvillier Verlag, 52–90.
Hudson, Carla L. and Elissa L. Newport. (1999). ‘Creolization: Could adults really have
done it all’, in Annabel Greenhill, Heather Littleﬁeld, and Cheryl Tano (eds), Proceedings
of the 23rd Annual Boston University Conference on Language Development. Somerville:
Cascadilla Press, 265–76.
Hudson Kam, Carla L. and Elissa L. Newport (2005). ‘Regularizing unpredictable variation:
The roles of adult and child learners in language formation and change’, Language
Learning and Development 1(2): 151–95. doi:10.1080/15475441.2005.9684215
Hudson Kam, Carla L. and Elissa L. Newport (2009). ‘Getting it right by getting it wrong:
When learners change languages’, Cognitive Psychology 59(1): 30–66. doi:10.1016/j.
cogpsych.2009.01.001
Huldén, Lars (1972). ‘Genussystemet i Karleby och Nedervetil’, Folkmålsstudier 22: 47–82.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

364 

Hull, Geoffrey (1998). ‘The basic lexical affinities of Timor’s Austronesian languages:
A preliminary investigation’, Studies in the Languages and Cultures of East Timor 1:
97–174.
Hull, Geoffrey (1999). Standard Tetum-English Dictionary. Sydney: Allen & Unwin.
Hultman, Oskar Fredrik (1894). De östsvenska dialekterna. Helsinki: Svenska
landsmålsföreningen.
Humboldt, Wilhelm von (1836). Über die Verschiedenheit des menschlichen Sprachbaues
und ihren Einfluss auf die geistige Entwickelung des Menschengeschlechts. Berlin:
F. Dümmler.
Hyman, Larry M. (2004). ‘How to become a Kwa verb’, Journal of West African Languages
30: 69–88.
Igartua, Iván (2019). ‘Loss of grammatical gender and language contact’, Diachronica 36:
181–221. doi:10.1075/dia.17004.iga
Irvine, Judith (1978). ‘Wolof noun classification: The social setting of divergent change’,
Language in Society 7: 37–64. doi:10.1017/S0047404500005327
Irvine, Judith (2011). ‘Société et communication chez les Wolof à travers le temps et
l’espace’, in Anna M. Diagne, Sascha Kesseler, and Christian Meyer (eds),
Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan,
37–70.
Jakobson, Roman (1929). Remarques sur l’évolution phonologique du russe comparée à celle
des autres langues slaves. Praha: Jednota československých matematiků a fysiků.
Jakobson, Roman (1959). ‘On linguistic aspects of translation’, in Reuben A. Brower (ed.),
On Translation. Cambridge, MA: Harvard University Press, 232–9.
Jamieson, Carole Ann (1982). ‘Conflated subsystems marking person and aspect in
Chiquihuitlán Mazatec verbs’, International Journal of American Linguistics 48(2):
139–67. doi:10.1086/465725
Janda, Laura A. (1994). ‘The spread of athematic 1sg -m in the major West Slavic
languages’, The Slavic and East European Journal 38(1): 90–119. doi:10.2307/308549
Janhunen, Juha (2008). ‘Mongolic as an expansive language family’, in Tokusu Kurebito
(ed.), Past and Present Dynamics: The Great Mongolian State. Tokyo: Tokyo University
of Foreign Studies, Research Institute for Languages and Cultures of Asia and Africa,
127–37.
Janse, Mark and Sijmen Tol (eds). (2003). Language Death and Language Maintenance:
Theoretical, Practical and Descriptive Approaches. Amsterdam: John Benjamins.
Jespersen, Otto (1949). A Modern English Grammar on Historical Principles. London: Allen
& Unwin.
Joanisse, Marc F. and Mark S. Seidenberg (2005). ‘Imaging the past: Neural activation in
frontal and temporal regions during regular and irregular past-tense processing’,
Cognitive, Affective & Behavioral Neuroscience 5(3): 282–96.
Johnson, Jacqueline S., Kenneth D. Shenkman, Elissa L. Newport, and Douglas L. Medin
(1996). ‘Indeterminacy in the grammar of adult language learners’, Journal of Memory
and Language 35: 335–52. doi:10.1006/jmla.1996.0019
Joseph, Brian D. and Richard D. Janda (1988). ‘The how and why of diachronic morpho-
logization and demorphologization’, in Michael Hammond and Michael Noonan (eds),
Theoretical Morphology. New York: Academic Press, 193–210.
Joseph, John E. and Frederick J. Newmeyer (2012). ‘ “All languages are equally complex”:
The rise and fall of a consensus’, Historiographia Linguistica 39(2–3): 341–68.
doi:10.1075/hl.39.2-3.08jos
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 365

Juola, Patrick (1998). ‘Measuring linguistic complexity: The morphological tier’, Journal of
Quantitative Linguistics 5: 206–13. doi:10.1080/09296179808590128
Karatsareas, Petros (2009). ‘The loss of grammatical gender in Cappadocian Greek’,
Transactions of the Philological Society 107: 196–230. doi:10.1111/j.1467-968X. 2009.01217.x
Karatsareas, Petros (2014). ‘On the diachrony of gender in Asia Minor Greek: The
development of semantic agreement in Pontic’, Language Sciences 43: 77–101.
doi:10.1016/j.langsci.2013.10.005
Kelly, Barbara, Gillian Wigglesworth, Rachel Nordlinger, and Joseph Blythe (2014). ‘The
acquisition of polysynthetic languages’, Language and Linguistics Compass 8(2): 51–64.
doi:10.1111/lnc3.12062
Kendall, Maurice and Jean Dickinson Gibbons (1990). Rank Correlation Methods. 5th ed.
Oxford: Oxford University Press.
Kibrik, Aleksandr E. (1991). ‘Organizing principles for nominal paradigms in Daghestanian
languages: Comparative and typological observations’, in Frans Plank (ed.), Paradigms:
The Economy of Inﬂection. Berlin: Mouton de Gruyter, 255–74.
Kibrik, Aleksandr E. (2003). ‘Nominal inﬂection galore: Daghestanian, with side glances at
Europe and the world’, in Frans Plank (ed.), Noun Phrase Structure in the Languages of
Europe. Berlin: Mouton de Gruyter, 37–112.
Kibrik, Andrej A. (2012). ‘What’s in the head of head-marking languages?’, in Pirkko
Suihkonen, Bernard Comrie, and Valery Solovyev (eds), Argument Structure and
Grammatical Relations: A Crosslinguistic Typology. Amsterdam: John Benjamins, 211–40.
Kielhorn, Franz (1871). The Paribhāṣenduśekhara of Nāgojībhaṭṭa (2 vols). Bombay: Indu-
Prakāsh Press.
Kihm, Alain (1994). Kriyol Syntax. Amsterdam: John Benjamins.
Kihm, Alain (2014). ‘Theories of morphology and theories of creole emergence: The inner
connection’. PAPIA, São Paulo, 24(1): 43–89.
Killian, Don (2015). Topics in Uduk Phonology and Morphosyntax. University of Helsinki
PhD dissertation.
Kirby, Simon, Hannah Cornish, and Kenny Smith (2008). ‘Cumulative cultural evolution in
the laboratory: An experimental approach to the origins of structure in human language’,
Proceedings of the National Academy of Sciences 105(31): 10681–6. doi:10.1073/
pnas.0707835105
Kiso, Andrea (2012). Tense and Aspect in Chichewa, Citumbuka and Cisena: A Description
and Comparison of the Tense-Aspect Systems in Three Southeastern Bantu Languages.
Stockholm University dissertation.
Klausenburger, Jurgen (1976). ‘(De)morphologization in Latin’, Lingua 40(4): 305–20.
doi:10.1016/0024-3841(76)90082-6
Klingler, Thomas (2003). If I Could Turn My Tongue Like That: The Creole of Pointe Coupee
Parish, Louisiana. Baton Rouge: Louisiana State University Press.
Kobès, Aloys (1869). Grammaire de la langue volofe. Ouvrage nouveau. Saint-Joseph de
Ngasobil: Imprimerie de la Mission.
Kobès, Aloys (1875). Dictionnaire volof-francais. Saint-Joseph de Ngasobil: Mission
Catholique [cited from the new edition: Kobès, Aloys and Olivier Abiven (1923),
Dictionnaire volof-francais. Nouvelle édition revue et considerablement augmentée par
le R. P. O. Abiven. Dakar: Mission Catholique].
Koopman, Hilda and Claire Lefebvre (1981). ‘Haitian Creole pu’, in Pieter C. Muysken
(ed.), Generative Studies on Creole Languages. Dordrecht: Foris, 201–21.
Koptjevskaja-Tamm, Maria and Bernhard Wälchli (2001). ‘The Circum-Baltic languages:
An areal-typological approach’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds),
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

366 

Circum-Baltic Languages, vol. 2: Grammar and Typology. Amsterdam: John Benjamins,

615–750.
Kortmann, Bernd and Benedikt Szmrecsanyi (eds) (2012). Linguistic Complexity: Second
Language Acquisition, Indigenization, Contact. Berlin: De Gruyter.
Krashnoukhova, Olga (2012). The Noun Phrase in the Languages of South America.
Universiteit Nijmegen PhD dissertation.
Kreyer, Rolf (2003). ‘Genitive and of-construction in modern written English:
Processability and human involvement’, International Journal of Corpus Linguistics 8
(2): 169–207. doi:10.1075/ijcl.8.2.02kre
Kusters, Wouter (2003). Linguistic Complexity: The Influence of Social Change on Verbal
Inflections. Utrecht: LOT.
Kusters, Wouter (2008). ‘Complexity in linguistic theory, language learning and language
change’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language
Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 3–22.
Labouret, Henri (1935). ‘Remarques sur la langue des wolof ’, in Nicolas Leca (ed.), Les
pêcheurs de Guet N’dar. Paris: Larose, 16–27. [Reprinted in Gabriel Manessy and Serge
Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive.
Dakar: University of Dakar Press, 45–56.]
Labov, William (1963). ‘The social motivation of a sound change’, Word 19: 273–309.
Ladd, D. Robert, Seán G. Roberts, and Dan Dediu (2015). ‘Correlational studies in typo-
logical and historical linguistics’, Annual Review of Linguistics 1: 221–41. doi:10.1146/an-
nurev-linguist-030514-124819
Landaburu, Jon (2005). ‘Expresión gramatical de lo epistémico en algunas lenguas del norte
de Suramerica’, Proceedings of the Conference on Indigenous Languages of Latin America,
1–13. URL: lanic.utexas.edu/project/etext/llilas/cilla/landaburu2.pdf
Leclerc, Jacques (2015). L’aménagement linguistique dans le monde. URL: http://www.axl.
cefan.ulaval.ca/afrique/senegal.htm
Leer, Jeff (1991). ‘Evidence for a Northern Northwest Coast language area: Promiscuous
number marking and periphrastic possessive constructions in Haida, Eyak, and Aleut’,
International Journal of American Linguistics 57(2): 158–93. doi:10.1086/
ijal.57.2.3519765
Lefebvre, Claire (1998). Creole Genesis and the Acquisition of Grammar. Cambridge:
Cambridge University Press.
Lefebvre, Claire and Anne-Marie Brousseau (2002). Fongbe. Berlin: Mouton de Gruyter.
Lehmann, Christian (1985). ‘Grammaticalization: Synchronic variation and diachronic
change’, Lingua e Stile 20: 303–18.
Lewis, Geoffrey L. (2001). Turkish Grammar. 2nd ed. Oxford: Oxford University Press.
Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds) (2015). Ethnologue:
Languages of the World. 18th ed. Dallas, TX: SIL International. URL: http://www.
ethnologue.com
Li, Charles N. and Sandra A. Thompson (1976). ‘Development of the causative in Mandarin
Chinese: Interaction of diachronic processes in syntax’, in Masayoshi Shibatani (ed.), The
Grammar of Causative Constructions. New York: Academic Press, 477–92.
Li, Charles N. and Sandra A. Thompson (1981). Mandarin Chinese: A Functional Reference
Grammar. Berkeley, CA: University of California Press.
Lindström, Eva (2008). ‘Language complexity and interlinguistic difficulty’, in Matti
Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology,
Contact, Change. Amsterdam: John Benjamins, 217–42.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 367

Loporcaro, Michele (2018). Gender from Latin to Romance: History, Geography, Typology.
Oxford: Oxford University Press.
Loporcaro, Michele, Francesco Gardani, and Alberto Giudici (forthcoming). ‘Contact-
induced complexification in the gender system of Istro-Romanian’. Journal of
Language Contact.
Loporcaro, Michele and Tania Paciaroni (2011). ‘Four gender-systems in Indo-European’,
Folia Linguistica 45(2): 389–434. doi:10.1515/flin.2011.015
Lowe, Ivan (1999). ‘Nambiquara’, in Robert M. W. Dixon and Aikhenvald Y. Aikhenvald
(eds), The Amazonian Languages. Cambridge: Cambridge University Press, 269–92.
Ludwig, Ralph, Sylviane Telchid, and Florence Bruneau-Ludwig (eds) (2001). Corpus créole.
Hamburg: Helmut Buske.
Luís, Ana R. (2009). ‘The loss and survival of inflectional morphology: Contextual vs.
inherent inflection in creoles’, in Sonia Colina, Antxon Olarrea, and Ana Carvalho (eds),
Romance Linguistics 2009. Amsterdam: John Benjamins, 323–36.
Luís, Ana R. (2014). ‘Inflectional structure without morphemes: Similarities between
creoles and non-creoles’, PAPIA, São Paulo, 24(2): 381–406.
Lüpke, Friederike and Mary Raymond (eds) (2010). Documenting Atlantic-Mande
Convergence and Diversity. Special issue of the Journal of language contact—THEMA 3.
Lupyan, Gary and Rick Dale (2010). ‘Language structure is partly determined by social
structure’, PLoS ONE 5(1): e8559. doi:10.1371/journal.pone.0008559
MacWhinney, Brian, Elizabeth Bates, and Reinhold Kliegl (1984). ‘Cue validity and sen-
tence interpretation in English, German, and Italian’, Journal of Verbal Learning and
Verbal Behavior 23(2): 127–50. doi:10.1016/S0022-5371(84)90093-8
Madsen, David and David Rhode (1994). Across the West: Human Population Movement
and the Expansion of the Numa. Salt Lake City, UT: University of Utah Press.
Maiden, Martin (2005). ‘Morphological autonomy and diachrony’, in Geert E. Booij and
Jaap van Marle (eds), Yearbook of Morphology 2004. Dordrecht: Springer, 137–75.
doi:10.1007/1-4020-2900-4_6
Maiden, Martin (2013). ‘ “Semi-autonomous” morphology? A problem in the history of the
Italian (and Romanian) verb’, in Silvio Cruschina, Martin Maiden, and John C. Smith
(eds), The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives.
Oxford: Oxford University Press, 24–44.
Maiden, Martin (2018). The Romance Verb: Morphomic Structure and Diachrony. Oxford:
Oxford University Press.
Maiden, Martin, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds) (2011).
Morphological Autonomy: Perspectives from Romance Inflectional Morphology. Oxford:
Oxford University Press.
Maitz, Péter and Attila Németh (2014). ‘Language contact and morphosyntactic complex-
ity: Evidence from German’, Journal of Germanic Linguistics 26(1): 1–29. doi:10.1017/
S1470542713000184
Malone, Terrell A. (1988). ‘The origin and development of Tuyuca evidentials’,
International Journal of American Linguistics 54: 119–40. doi:10.1086/466079
Manessy, Gabriel and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et
de grammaire descriptive. Dakar: University of Dakar Press.
Mansfield, John (2014). Polysynthetic Sociolinguistics: The Language and Culture of
Murrinh Patha Youth. Australian National University PhD dissertation.
Mansfield, John (2015a). ‘Consonant lenition as a sociophonetic variable in Murrinh Patha
(Australia)’, Language Variation and Change 27(2): 203–25. doi:10.1017/
S0954394515000046
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

368 

Mansfield, John (2015b). ‘Morphotactic variation, prosodic domains and the changing
structure of the Murrinhpatha verb’, Asia-Pacific Language Variation 1(2): 163–89.
doi:10.1075/aplv.1.2.03man
Mansfield, John (2016). ‘Intersecting formatives and inflectional predictability: How do
speakers and learners predict the correct form of Murrinhpatha verbs?’, Word Structure
9(2): 183–214. doi:10.3366/word.2016.0093
Mansfield, John (2019). Murrinhpatha Morphology and Phonology. Berlin: De Gruyter
Mouton.
Marschner, Ian C. (2011). ‘glm2: Fitting generalized linear models with convergence
problems’, The R Journal 3(2): 12–15.
Marslen-Wilson, William D. (2007). ‘Morphological processes in language comprehen-
sion’, in M. Gareth Gaskell (ed.), The Oxford Handbook of Psycholinguistics. Oxford:
Oxford University Press, 175–93.
Marzi, Claudia, Marcello Ferro, Ouafae Nahli, Patrizia Belik, Stavros Bompolas, and Vito
Pirrelli (2018). ‘Evaluating inflectional complexity crosslinguistically: A processing per-
spective’, in Nicoletta Calzolari (ed.), LREC 2018: Eleventh International Conference on
Language Resources and Evaluation: May 7–12, 2018, Miyazaki, Japan. Paris: European
Language Resources Association ELRA, article n. 745.
Matras, Yaron (1998). ‘Utterance modifiers and universals of grammatical borrowing’,
Linguistics 36: 281–331. doi:10.1515/ling.1998.36.2.281
Matras, Yaron (2009). Language Contact. Cambridge: Cambridge University Press.
Matras, Yaron and Jeanette Sakel (eds) (2007). Grammatical Borrowing in Cross-Linguistic
Perspective. Berlin: Mouton de Gruyter.
Matthews, Peter H. (1972). Inflectional Morphology. Cambridge: Cambridge University
Press.
Matthews, Peter. H. (1991). Morphology. 2nd ed. Cambridge: Cambridge University Press.
McGregor, William (2010). ‘Optional ergative case marking systems in a typological-
semiotic perspective’, Lingua 120: 1610–36. doi:10.1016/j.lingua.2009.05.010
McGregor, William and Jean-Christophe Verstraete (2010). ‘Optional ergative marking
and its implications for linguistic theory’, Lingua 120: 1607–9. doi:10.1016/j.
lingua.2009.05.009
Mc Laughlin, Fiona (1997). ‘Noun classification in Wolof: When affixes are not renewed’,
Studies in African Linguistics 26(1): 1–28.
Mc Laughlin, Fiona (2000). ‘Consonant mutation and reduplication in Seereer-Siin’,
Phonology 17: 333–63. doi:10.1017/S0952675701003955
Mc Laughlin, Fiona (2001). ‘Dakar Wolof and the configuration of an urban identity’,
Journal of African Cultural Studies 14(2): 153–72. doi:10.1080/13696810120107104
McLeod, A. Ian (2011). ‘Package “Kendall”. R package documentation’. URL: https://
cran.r-project.org/web/packages/Kendall/Kendall.pdf
McWhorter, John H. (1994). ‘From focus marker to copula in Swahili’, in Kevin E. Moore,
David Peterson, and Comfort Wentum (eds), Proceedings of the Berkeley Linguistics
Society, Special Session on Historical Issues in African Linguistics. Berkeley, CA: Berkeley
Linguistics Society, 57–66.
McWhorter, John H. (1998). ‘Identifying the creole prototype: Vindicating a typological
claim’, Language 74: 788–818. doi:10.2307/417003
McWhorter, John H. (2001). ‘The world’s simplest grammars are creole grammars’,
Linguistic Typology 5(2–3): 125–66. doi:10.1515/lity.2001.001
McWhorter, John H. (2002). ‘What happened to English?’, Diachronica 19: 217–72.
doi:10.1075/dia.19.2.02wha
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 369

McWhorter, John H. (2005). Defining Creole. New York: Oxford University Press.
McWhorter, John H. (2007). Language Interrupted: Signs of Non-Native Acquisition in
Standard Language Grammars. New York: Oxford University Press.
McWhorter, John H. (2008). ‘Why does a language undress? Strange cases in Indonesia’, in
Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity:
Typology, Contact, Change. Amsterdam: John Benjamins, 167–90.
McWhorter, John H. (2011). Linguistic Simplicity and Complexity: Why Do Languages
Undress? Berlin: Walter de Gruyter.
McWhorter, John H. (2012). ‘Case closed? Testing the Feature Pool Hypothesis’, Journal of
Pidgin and Creole Languages 27: 171–82. doi:10.1075/jpcl.27.1
McWhorter, John H. (2016). ‘Is radical analyticity normal? Implications of Niger-Congo
and Sino-Tibetan for typology and diachronic theory’, in Elly van Gelderen (ed.), Cyclical
Change Continued. Amsterdam: John Benjamins, 49–91. doi:10.1075/la.227.03mcw
McWhorter, John H. (2018). The Creole Debate. Cambridge: Cambridge University Press.
McWhorter, John H. (2019). ‘The radically isolating languages of Flores: A challenge to
diachronic theory’, Journal of Historical Linguistics 9: 177–207. doi:10.1075/jhl.16021.mcw
Meakins, Felicity (2009). ‘The case of the shifty ergative marker: A pragmatic shift in the
ergative marker in one Australian mixed language’, in Jóhanna Barðdal and Shobhana
L. Chelliah (eds), The Role of Semantic, Pragmatic, and Discourse Factors in the
Development of Case. Amsterdam: John Benjamins, 59–91.
Meakins, Felicity (2011). Case Marking in Contact: The Development and Function of Case
Morphology in Gurindji Kriol. Amsterdam: John Benjamins.
Meakins, Felicity (2013). ‘Gurindji Kriol’, in Susanne Maria Michaelis, Philippe Maurer,
Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole
Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia
and the Americas. Oxford: Oxford University Press, 131–9.
Meakins, Felicity (2015). ‘From absolutely optional to only nominally ergative: The life
cycle of the Gurindji Kriol ergative suffix’, in Francesco Gardani, Peter Arkadiev, and
Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 189–218.
Meakins, Felicity, Patrick McConvell, Erika Charola, Norm McNair, Helen McNair, and
Lauren Campbell (2013). Gurindji to English dictionary. Batchelor, Australia: Batchelor
Press.
Meakins, Felicity and Rachel Nordlinger (2014). A Grammar of Bilinarra: An Australian
Aboriginal Language of the Northern Territory. Berlin: Mouton de Gruyter.
Meakins, Felicity and Carmel O’Shannessy (2010). ‘Ordering arguments about: Word order
and discourse motivations in the development and use of the ergative marker in two
Australian mixed languages’, Lingua 120(7): 1693–713. doi:10.1016/j.lingua.2009.05.013
Meakins, Felicity, Xia Hua, Cassandra Algy, and Lindell Bromham (2019). ‘Birth of a
contact language did not favor simplification’, Language 95(2): 294–332. doi:10.1353/
lan.2019.0032
Meeuwis, Michael (2013). ‘Lingala’, in Susanne Maria Michaelis, Philipe Maurer, Martin
Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol.
III: Contact Languages Based on Languages from Africa, Asia, Australia and the
Americas. Oxford: Oxford University Press, 25–33.
Meijer, Guus and Pieter C. Muysken (1977). ‘On the beginnings of pidgin and creole
studies: Schuchardt and Hesseling’, in Albert Valdman (ed.), Pidgin and Creole
Linguistics. Bloomington: Indiana University Press, 21–48.
Mel’čuk, Igor (1994). ‘Suppletion: Toward a logical analysis of the concept’, Studies in
Language 18: 339–410. doi:10.1075/sl.18.2.03mel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

370 

Merrill, William L. (2012). ‘The historical linguistics of Uto-Aztecan agriculture’,

Anthropological Linguistics 54(3): 203–60. doi:10.1353/anl.2012.0017
Meyerhoff, Miriam (2009). ‘Animacy in Bislama: Using quantitative methods to evaluate
transfer of a substrate feature’, in James Stanford and Dennis Preston (eds), Variation in
Indigenous Minority Languages. Amsterdam: John Benjamins, 369–96.
Michael, Lev (2008). Nanti Evidential Practice: Language, Knowledge, and Social Action in
an Amazonian Society. University of Texas at Austin PhD dissertation.
Michael, Lev, William Chang, and Tammy Stark (2014). ‘Exploring phonological areality in
the Circum-Andean region using a naive Bayes classifier’, Language Dynamics and
Change 4(1): 27–86. doi:10.1163/22105832-00401004
Miestamo, Matti (2008). ‘Grammatical complexity in a cross-linguistic perspective’, in
Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity:
Typology, Contact, Change. Amsterdam: John Benjamins, 23–41.
Miestamo, Matti (2017). ‘Linguistic diversity and complexity’, Lingue e Linguaggio 16(2).
227–54.
Miestamo, Matti, Kaius Sinnemäki, and Fred Karlsson (eds) (2008). Language Complexity:
Typology, Contact, Change. Amsterdam: John Benjamins.
Mihas, Elena (2015). A Grammar of Alto Perené (Arawak). Berlin: De Gruyter Mouton.
Milin, Petar, Victor Kuperman, Aleksandar Kostić, and R. Harald Baayen (2009). ‘Words
and paradigms bit by bit: An information-theoretic approach to the processing of
paradigmatic structure in inflection and derivation’, in James P. Blevins and Juliette
Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University
Press, 214–52.
Miller, Wick R. (1983). ‘Uto-Aztecan languages’, in Alfonso Ortiz (ed.), Handbook of North
American Indians, vol. 10: Southwest. Washington, DC: Smithsonian Institution, 113–24.
Mithun, Marianne (1988). ‘System-defining structural properties in polysynthetic lan-
guages’, Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung
41(4): 442–52.
Mithun, Marianne (1989). ‘The acquisition of polysynthesis’, Journal of Child Language 16:
285–312. doi:10.1017/S0305000900010424
Mithun, Marianne (1996). ‘General characteristics of North American Indian languages’, in
Ives Goddard (ed.), Handbook of North American Indians, vol. 17: Languages.
Washington, DC: Smithsonian Institution, 137–57.
Mithun, Marianne (1998). ‘Yup’ik roots and affixes’, in Osahito Miyaoka and Minoru
Oshima (eds), Languages of the North Pacific Rim, vol. 4. Kyoto: Kyoto University
Graduate School of Letters, 63–76.
Mithun, Marianne (2007). ‘Grammar, contact, and time’, Journal of Language Contact.
THEMA 1: 133–55.
Mithun, Marianne (2015). ‘Morphological complexity and language contact in languages
indigenous to North America’, Linguistic Discovery 13(2): 37–59.
Mithun, Marianne (2016). ‘Affix ordering: Motivation and interpretation’, in Andrew
Hippisley and Gregory Stump (eds), The Cambridge Handbook of Morphology.
Cambridge: Cambridge University Press, 149–85.
Miyaoka, Osahito (2011). A Grammar of Central Alaskan Yupik (CAY). Berlin: de Gruyter
Mouton.
Moscoso del Prado Martín, Fermín (2003). Paradigmatic Structures in Morphological
Processing: Computational and cross-linguistics studies. University of Nijmegen PhD
dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 371

Moscoso del Prado Martín, Fermín (2011). ‘The mirage of morphological complexity’, in
Laura Carlson, Christoph Hoelscher, and Thomas F. Shipley (eds), Proceedings of the
33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science
Society, 3524–9.
Moscoso del Prado Martín, Fermín, Aleksandar Kostic, and R. Harald Baayen (2004).
‘Putting the bits together: An information-theoretical perspective on morphological
processing’, Cognition 94(1): 1–18.
Mufwene, Salikoko S. (2001). The Ecology of Language Evolution. Cambridge: Cambridge
University Press.
Mufwene, Salikoko S. (2008). Language Evolution: Contact, Competition, and Change.
London: Continuum Press.
Mufwene, Salikoko S. (2009). ‘Restructuring, hybridization, and complexity in language
evolution’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New
Languages. Amsterdam: John Benjamins, 367–400.
Mufwene, Salikoko S., François Pellegrino, and Christophe Coupé (eds) (2017). Complexity
in Language: Developmental and Evolutionary Perspectives. Cambridge: Cambridge
University Press.
Mugdan, Joachim (1994). ‘Morphological units’, in Ron Asher (ed.), The Encyclopedia of
Language and Linguistics. Oxford: Pergamon Press, 2543–53.
Mühlhäusler, Peter (1997). Pidgin and Creole Linguistics. London: University of
Westminster.
Mukarovsky, Hans (1977). A Study of Western Nigritic, vol. I. Wien: Institut für
Ägyptologie und Afrikanistik der Universität Wien.
Müller, Neele (2013). Tense, Aspect, Modality, and Evidential Marking in South American
Indigenous Languages. Utrecht: LOT.
Munro, Pamela and Dieynaba Gaye (1997). Ay Baati Wolof: A Wolof Dictionary. Revised
ed. Los Angeles: Department of Linguistics CLA.
Muysken, Pieter C., Harald Hammarström, Joshua Birchall, Swintha Danielsen, Love
Eriksen, Ana Vilacy Galucio, Rik van Gijn, Simon van de Kerke, Vishnupraya
Kolipakam, Olga Krasnoukhova, Neele Müller, and Loretta O’Connor (2014). ‘The
languages of South America: Deep families, areal relationships, and language contact’,
in Loretta O’Connor and Pieter C. Muysken (eds), The Native Languages of South
America. Cambridge: Cambridge University Press, 299–322.
Myers-Scotton, Carol (2002). Contact Linguistics: Bilingual Encounters and Grammatical
Outcomes. Oxford: Oxford University Press.
Nakagawa, Shinichi and Holger Schielzeth (2013). ‘A general and simple method for
obtaining R2 from generalized linear mixed-effects models’, Methods in Ecology and
Evolution 4(2): 133–42.
Nash, David (1980). Topics in Warlpiri Grammar. Massachusetts Institute of Technology
PhD dissertation.
Ndiaye, Moussa D. (2004). Eléments de morphologie du wolof. Méthodes d’analyse en
linguistique. München: LINCOM Europa.
Nettle, Daniel (2012). ‘Social scale and structural complexity in human languages’,
Philosophical Transactions of the Royal Society B: Biological Sciences 367(1597):
1829–36. doi:10.1098/rstb.2011.0216
Neubauer, Kathleen and Harald Clahsen (2009). ‘Decomposition of inﬂected words in a
second language: An experimental study of German participles’, Studies in Second
Language Acquisition 31(3): 403–35. doi:10.1017/S0272263109090354
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

372 

Newmeyer, Frederick J. and Laurel B. Preston (eds) (2014). Measuring Grammatical

Complexity. Oxford: Oxford University Press.
Nichols, Johanna (1986). ‘Head-marking and dependent-marking grammar’, Language
62(1): 56–119.
Nichols, Johanna (1992). Linguistic Diversity in Space and Time. Chicago: University of
Chicago Press.
Nichols, Johanna (2003). ‘Diversity and stability in language’, in Brian D. Joseph and
Richard Janda (eds), The Handbook of Historical Linguistics. Oxford: Blackwell, 283–310.
Nichols, Johanna (2005). ‘The origin of the Chechen and Ingush: A study in alpine
linguistic and ethnic geography’, Anthropological Linguistics 46: 129–55.
Nichols, Johanna (2009). ‘Linguistic complexity: A comprehensive definition and survey’,
in Geoffrey Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an
Evolving Variable. Oxford: Oxford University Press, 110–25.
Nichols, Johanna (2013). ‘The vertical archipelago: Adding the third dimension to linguistic
geography’, in Peter Auer, Martin Hilpert, Anja Stukenbrock, and Benedikt Szmrecsanyi
(eds), Space in Language and Linguistics. Berlin: Mouton de Gruyter, 38–60.
Nichols, Johanna (2015). ‘Complexity as non-canonicality: An affordable, reliable metric
for morphology’. Paper given at the 48th annual meeting of the Societas Linguistica
Europaea (SLE), Leiden.
Nichols, Johanna (2016). ‘Complex edges, transparent frontiers: Grammatical complexity
and language spreads’, in Raffaela Baechler and Guido Seiler (eds), Complexity, Isolation,
and Variation. Berlin: de Gruyter, 117–37.
Nichols, Johanna (2017). ‘Person as an inflectional category’, Linguistic Typology 21(3):
387–456. doi:10.1515/lingty-2017-0010
Nichols, Johanna (2019). ‘Why is gender so complex? Some typological considerations’, in
Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender
and Linguistic Complexity, vol. I: General Issues and Specific Studies. Berlin: Language
Sciences Press, 63–92.
Nichols, Johanna (in prep.). The languages of the Great Caucasus range.
Nichols, Johanna, Jonathan Barnes, and David A. Peterson (2006). ‘The robust bell curve of
morphological complexity’, Linguistic Typology 10(1): 96–106.
Nichols, Johanna and Christian Bentz (2018). ‘Morphological complexity of languages
reflects the settlement history of the Americas’, in Katerina Harvati, Gerhard Jäger,
and Hugo Reyes-Centano (eds), New Perspectives on the Peopling of the Americas.
Tübingen: Kerns, 13–26.
Nichols, Johanna and Yury Lander (2020). ‘Head-dependent marking’, in Mark Aronoff
(ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press.
DOI: 10.1093/acrefore/9780199384655.013.523
Njie, Codu Mbassy (1982). Description syntaxique du wolof de Gambie. Dakar: Nouvelles
Editions africaines.
Nordlinger, Rachel (2011). ‘Transitivity in Murrinh-Patha’, Studies in Language 35(3):
702–34. doi:10.1075/sl.35.3.08nor
Nordlinger, Rachel (2015). ‘Inflection in Murrinh-Patha’, in Matthew Baerman (ed.), The
Oxford Handbook of Inflection. Oxford: Oxford University Press, 491–519.
Nordlinger, Rachel (2017). ‘The languages of the Daly River region (Northern Australia)’,
in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford
Handbook of Polysynthesis. Oxford: Oxford University Press, 782–807.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 373

Nordlinger, Rachel and Patrick Caudal (2012). ‘The tense, aspect and modality system in
Murrinh-Patha’, Australian Journal of Linguistics 32(1): 73–112. doi:10.1080/
07268602.2012.657754
Norman, Jerry (1988). Chinese. Cambridge: Cambridge University Press.
Nurse, Derek (2007). ‘Did the proto-Bantu verb have a synthetic or an analytic structure?’,
SOAS Working Papers in Linguistics 15: 239–56.
Nurse, Derek (2008). Tense and Aspect in Bantu. New York: Oxford University Press.
O’Connor, Catherine, Joan Maling, and Barbora Skarabela (2013). ‘Nominal categories and
the expression of possession: A cross-linguistic study of probabilistic tendencies and
categorial constraints’, in Kersti Börjars, David Denison, and Alan Scott (eds),
Morphosyntactic Categories and the Expression of Possession. Amsterdam: John
Benjamins, 89–121.
Olawsky, Knut (2006). A Grammar of Urarina. Berlin: Mouton de Gruyter.
Ospina Bozzi, Ana María (2002). Les structures élémentaires du Yuhup Maku, langue de
l’Amazonie Colombienne: Morphologie et syntaxe. Université Paris 7—Denis Diderot
PhD dissertation.
Öztürk, Balkız and Markus A. Pöchtrager (2011). Pazar Laz. München: LINCOM Europa.
Paauw, Scott (2007). ‘A North Papua linguistic area?’. Paper given at the ‘Workshop on the
Languages of Papua’, Manokwari.
Parker, Jeff (2016). Inflectional Complexity and Cognitive Processing: An Experimental and
Corpus-Based Investigation of Russian Nouns. The Ohio State University PhD
dissertation.
Parker, Jeff, Robert Reynolds, and Andrea D. Sims (to appear). ‘The role of language-
specific network properties in the emergence of inflectional irregularity’, in Andrea
D. Sims, Adam Ussishkin, Jeff Parker, and Samantha Wray (eds), Morphological
Typology and Linguistic Cognition. Cambridge: Cambridge University Press.
Parkvall, Mikael (2008). ‘The simplicity of creoles in cross-linguistic perspective’, in Matti
Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology,
Contact, Change. Amsterdam: John Benjamins, 265–85.
Payne, Doris L. (1990). ‘Morphological characteristics of lowland South American lan-
guages’, in Doris L. Payne (ed.), Amazonian Linguistics: Studies in Lowland South
American Languages. Austin, TX: University of Texas Press, 213–41.
Payne, Doris L. (2007). ‘Source of the Yagua nominal classification system’, International
Journal of American Linguistics 73(4): 447–74. doi:10.1086/523773
Payne, John (2013). ‘The oblique genitive in English’, in Kersti Börjars, David Denison, and
Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession.
Amsterdam: John Benjamins, 178–92.
Payne, Thomas (1997). Describing Morphosyntax. Cambridge: Cambridge University Press.
Perrin, Loïc-Michel (2012). L’expression du temps en wolof—langue atlantique parlée au
Sénégal. Köln: Köppe.
Perrott, D. V. (1950). Teach Yourself Swahili. New York: Random House.
Pienemann, Manfred (1998). Language Processing and Second Language Development:
Processability Theory. Amsterdam: John Benjamins.
Pinheiro, José C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New
York: Springer.
Pinker, Steven and Alan Prince (1988). ‘On language and connectionism: Analysis of a
parallel distributed processing model of language acquisition’, Cognition 28: 73–193.
doi:10.1016/0010-0277(88)90032-7
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

374 

Pirrelli, Vito (2000). Paradigmi in morfologia. Un approccio interdisciplinare alla ﬂessione

verbale dell’italiano. Pisa: Istituti Editoriali e Poligrafici Italiani.
Pirrelli, Vito, Marcello Ferro, and Claudia Marzi (2015). ‘Computational complexity of
abstractive morphology’, in Matthew Baerman, Dunstan Brown, and Greville Corbett
(eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford
University Press, 141–66.
Plag, Ingo (2003a). ‘Introduction: The morphology of creole languages’, in Geert Booij and
Jaap van Marle (eds), Yearbook of Morphology 2002. Alphen aan den Rijn: Kluwer, 1–2.
doi:10.1007/0-306-48223-1_1
Plag, Ingo (2003b). Phonology and Morphology of Creole Languages. Tübingen: Niemeyer.
Plag, Ingo (2008). ‘Creoles as interlanguages: Inflectional morphology’, Journal of Pidgin
and Creole Languages 23: 114–35. doi:10.1075/jpcl.23.1.06pla
Plank, Frans (1986). ‘Paradigm size, morphological typology, and universal economy’, Folia
Linguistica 20(1–2): 29–48. doi:10.1515/flin.1986.20.1-2.29
Pozdniakov, Konstantin (1993). Sravnitel’naja grammatika atlantičeskich jazykov. Moscow:
Nauka.
Pozdniakov, Konstantin (2015). ‘Diachronie des classes nominales atlantiques.
Morphonologie, morphologie, sémantique’, in Denis Creissels and Konstantin
Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe,
57–102.
Pozdniakov, Konstantin and Stéphane Robert (2015). ‘Les classes nominales en wolof.
Fonctionnalités et singularités d’un système restreint’, in Denis Creissels and
Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln:
Köppe, 545–628.
Prasada, Sandeep and Steven Pinker (1993). ‘Generalisation of regular and irregular
morphological patterns’, Language and Cognitive Processes 8(1): 1–56. doi:10.1080/
01690969308406948
Pye, Br John MSC (1972). The Port Keats Story. Darwin: Colemans.
Rambaud, Jean-Baptiste (1898). ‘De la détermination en wolof ’, Bulletin de la Société de
Linguistique de Paris 10: 122–36. [Reprinted in Gabriel Manessy and Serge Sauvageot
(eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar:
University of Dakar Press, 11–24.]
Reid, Nicholas (1990). Ngan’gityemerri: A Language of the Daly River Region, Northern
Territory of Australia. Australian National University PhD dissertation.
Reintges, Chris (2015). ‘Increasing morphological complexity and how syntax drives
morphological change’, in Theresa Biberauer and George Walkden (eds), Syntax Over
Time: Lexical, Morphological, and Information-Structural Interactions. Oxford: Oxford
University Press, 124–45.
Rescher, Nicholas (1998). Complexity: A Philosophical Overview. New Brunswick, NJ:
Transaction Publishers.
Rhodes, Richard (1987). ‘Paradigms large and small’, Proceedings of the 13th Annual
Meeting of the Berkeley Linguistics Society. Berkeley, CA: Berkeley Linguistics Society,
223–34.
Rice, Keren (2011). ‘Principles of affix ordering: An overview’, Word Structure 4(2):
169–200. doi:10.3366/word.2011.0009
Roberts, Ian (1999). ‘Verb movement and markedness’, in Michel deGraff (ed.), Language
Change: Creolization, Diachrony, and Development. Cambridge, MA: The MIT Press,
287–328.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 375

Roberts, Sarah J. and Joan Bresnan (2008). ‘Retained inflectional morphology in pidgins:
A typological study’, Linguistic Typology 12(2): 269–302. doi:10.1515/LITY.2008.039
Roberts, Seán (2018). ‘Chield: Causal hypotheses in evolutionary linguistics database’, in
Christine Cuskley, Molly Flaherty, Hannah Little, Luke McCrohon, Andrea Ravignani,
and Tessa Verhoef (eds): The Evolution of Language: Proceedings of the 12th
International Conference (EVOLANG12). doi:10.12775/3991-1.099
Robins, R. H. (1958). The Yurok Language: Grammar, Texts, Lexicon. Berkeley, CA:
University of California Press.
Romaine, Suzanne (1988). Pidgin and Creole Languages. London: Longman.
Rottet, Kevin J. (1992). ‘Functional categories and verb movement in Louisiana creole’,
Probus 4: 261–89. doi:10.1515/prbs.1992.4.3.261
Russell, Kevin (1999). ‘What’s with all these long words anyway?’, in Leora Bar-El, Rose-
Marie Dechaine, and Charlotte Reinholtz (eds), Papers from the Workshop on Structure
and Constituency in Native American Languages. Cambridge, MA: The MIT Press,
119–30.
Sadock, Jerrold (2017). ‘The subjectivity of the notion of polysynthesis’, in Michael
Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of
Polysynthesis. Oxford: Oxford University Press, 99–114.
Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport (1996). ‘Statistical learning by 8-
month infants’, Science 274(5294): 1926–8. doi:10.1126/science.274.5294.1926
Sagot, Benoît and Géraldine Walther (2011). ‘Non-canonical inflection: Data, formalisation
and complexity measures’, in Cerstin Mahlow and Michael Piotrowski (eds), Systems and
Frameworks for Computational Morphology. Berlin: Springer, 23–45. doi:10.1007/978-3-
642-23138-4_3
Samara, Anna, Kenny Smith, Helen Brown, and Elizabeth Wonnacott (2017). ‘Acquiring
variation in an artificial language: Children and adults are sensitive to socially condi-
tioned linguistic variation’, Cognitive Psychology 94: 85–114. doi:10.1016/j.
cogpsych.2017.02.004
Sampson, Geoffrey, David Gil, and Peter Trudgill (eds) (2009). Language Complexity as an
Evolving Variable. Oxford: Oxford University Press.
Sapir, Edward (1921). Language: An Introduction to the Study of Speech. New York:
Harcourt, Brace & Co.
Sapir, J. David (1965). A Grammar of Diola–Fogny, a Language Spoken in the Basse-
Casamance Region of Senegal. Cambridge: Cambridge University Press.
Sapir, J. David (1971). ‘West Atlantic: An inventory of the languages, their noun class
systems and consonant alternation’, in Thomas Sebeok (ed.), Current Trends in
Linguistics, vol. VII: Linguistics in Sub-Saharan Africa. The Hague: Mouton, 44–112.
Sauvageot, Serge (1965). Description synchronique d’un dialecte Wolof. Le parler du Dyolof.
Dakar: Institut Français de l’Afrique Noire.
Sauvageot, Serge (1967). ‘Note sur la classification nominale en baïnouk’, in Gabriel
Manessy (ed.), La classification nominale dans les langues négro-africaines. Paris:
CNRS, 225–36.
Scalise, Sergio (1984). Morfologia lessicale. Padova: CLESP.
Schiering, René, Balthasar Bickel, and Kristine Hildebrandt (2010). ‘The prosodic word is
not universal, but emergent’, Journal of Linguistics 46: 657–710. doi:10.1017/
S0022226710000216
Schlegel, Friedrich von (1808). Über die Sprache und Weisheit der Indier. Ein Beitrag zur
Begründung der Alterthumskunde. Heidelberg: Mohr & Zimmer.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

376 

Schreuder, Robert and R. Harald Baayen (1997). ‘How simplex complex words can be’,
Journal of Memory and Language 37: 118–39. doi:10.1006/jmla.1997.2510
Schwegler, Armin (2013). ‘Palenquero structure dataset’, in Susanne Maria Michaelis,
Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), Atlas of Pidgin and
Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary
Anthropology. URL: http://apics-online.info/contributions/48
Segerer, Guillaume (2010). ‘Isolates in Atlantic’. Paper given at the workshop ‘Language
Isolates in Africa’, 4 December, Lyon.
Seifart, Frank (2005). The Structure and Use of Shape-Based Noun Classes in Miraña (North
West Amazon). Universiteit Nijmegen PhD dissertation.
Seifart, Frank (2011). Bora Loans in Resígaro: Massive Morphological and Little Lexical
Borrowing in a Moribund Arawakan Language. Cadernos de Etnolingüística, Série
Monografias 2 [online publisher].
Seifart, Frank and Doris Payne (2007). ‘Nominal classification in the Northwest Amazon:
Issues in areal diffusion and typological characterization’, International Journal of
American Linguistics 73(4): 381–7. doi:10.1086/523770
Seuren, Pieter (1990). ‘Verb syncopation and predicate raising in Mauritian Creole’,
Theoretical Linguistics 1(13): 804–44. doi:10.1515/ling.1990.28.4.809
Seuren, Pieter (1998). Western Linguistics: An Historical Introduction. Oxford: Blackwell.
Seuren, Pieter and Herman Wekker (1986). ‘Semantic transparency as a factor in creole
genesis’, in Pieter Muysken and Norval Smith (eds), Substrata versus Universals in Creole
Genesis. Amsterdam: John Benjamins, 57–70.
Shalizi, Cosma Rohilla (2001). ‘Causal architecture, complexity and self-organization in the
time series and cellular automata’. University of Wisconsin-Madison PhD dissertation.
Shannon, Claude E. (1948). ‘A mathematical theory of communication’, Bell System
Technical Journal 27(3): 379–423.
Shosted, Ryan (2006). ‘Correlating complexity: A typological approach’, Linguistic Typology
10(1): 1–40. doi:10.1515/LINGTY.2006.001
Silva, Wilson de Lima (2012). A Descriptive Grammar of Desano. University of Utah PhD
dissertation.
Sims, Andrea D. (2015). Inflectional Defectiveness. Cambridge: Cambridge University Press.
Sims, Andrea D. and Jeff Parker (2016). ‘How inflection class systems work: On the
informativity of implicative structure’, Word Structure 9(2): 215–39. doi:10.3366/
word.2016.0094
Sinnemäki, Kaius (2008). ‘Complexity trade-offs in core argument marking’, in Matti
Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology,
Contact, Change. Amsterdam: John Benjamins, 67–88.
Sinnemäki, Kaius (2011). Language Universals and Linguistic Complexity: Three Case
Studies in Core Argument Marking. University of Helsinki PhD dissertation.
Sinnemäki, Kaius (2014). ‘Global optimization and complexity trade-offs’, Poznań Studies
in Contemporary Linguistics 50(2): 179–95. doi: 10.1515/psicl-2014-0013
Smith, Kenny, Amy Perfors, Olga Fehér, Anna Samara, Kate Swoboda, and Elizabeth
Wonnacott (2017). ‘Language learning, language use and the evolution of linguistic
variation’, Philosophical Transactions of the Royal Society B 372(1711): 20160051.
doi:10.1098/rstb.2016.0051
Smith, Kenny and Elizabeth Wonnacott (2010). ‘Eliminating unpredictable variation
through iterated learning’, Cognition 116(3): 444–9. doi:10.1016/j.cognition.2010.06.004
Soubrier, Aude (2013). Description de l’ikposso uwi. Lyon: Université Lumière Lyon 2
dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 377

Spencer, Andrew and Ana R. Luís (2012). Clitics: An Introduction. Cambridge: Cambridge
University Press.
Stahlke, Herbert (1970). ‘Serial verbs’, Studies in African Linguistics 1: 60–99.
Štekauer, Pavol (2015). ‘The delimitation of derivation and inflection’, in Peter O. Müller,
Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation: An
International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter
Mouton, 218–35.
Stenzel, Kristine (2008). ‘Evidentials and clause modality in Wanano’, Studies in Language
32(2): 405–45. doi:10.1075/sl.32.2.06ste
Stenzel, Kristine (2013a). A Reference Grammar of Kotiria (Wanano). Lincoln, NE:
University of Nebraska Press.
Stenzel, Kristine (2013b). ‘Contact and innovation in Vaupés possession-marking strat-
egies’, in Patience Epps and Kristine Stenzel (eds), Cultural and Linguistic Interaction in
the Upper Rio Negro Region. Rio de Janeiro: Museu do Índio-FUNAI, 353–402.
Stenzel, Kristine and Elsa Gomez-Imbert (2009). ‘Contato linguístico e mudança linguística
no noroeste amazônico: O caso do Kotiria (Wanano)’, Revista da ABRALIN 8: 71–100.
Stewart, William Alexander and William W. Gage (1970). Notes on Wolof Grammar by
William A. Stewart. Adapted by William W. Gage, in Dakar Wolof: A Basic Course
prepared by Loren V. Nussbaum, William W. Gage, and Daniel Varre. Washington, DC:
Center for Applied Linguistics, 355–412.
Stilo, Donald (2019). ‘Loss vs. expansion of gender in Tatic languages: Kafteji (Kabatei) and
Kelāsi’, in Alireza Korangy and Behrooz Mahmoodi-Bakhtiari (eds), Essays on Typology
of Iranian Languages. Berlin: De Gruyter Mouton, 34–78. doi:10.1515/9783110604443-004
Stoll, Sabine, Balthasar Bickel, and Jekaterina Mažara (2017). ‘The acquisition of polysyn-
thetic verb forms in Chintang’, in Michael Fortescue, Marianne Mithun, and Nicholas
Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press,
495–514.
Stolz, Thomas (2012). ‘Survival in a niche: On gender-copy in Chamorro (and sundry
languages)’, in Martine Vanhove, Thomas Stolz, Aina Urdze, and Hitomi Otsuka (eds),
Morphologies in Contact. Berlin: Akademie-Verlag, 93–140.
Stolz, Thomas (2015). ‘Adjective-noun agreement in language contact’, in Francesco
Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin:
Mouton de Gruyter, 269–301.
Street, Chester (1987). An Introduction to the Language and Culture of the Murrinh-Patha.
Darwin: Summer Institute of Linguistics.
Stump, Gregory (2001). Inflectional Morphology: A Theory of Paradigm Structure.
Cambridge: Cambridge University Press.
Stump, Gregory (2006a). ‘Heteroclisis and paradigm linkage’, Language 82(2): 279–322.
doi:10.1353/lan.2006.0110
Stump, Gregory (2006b). ‘Template morphology’, in Keith Brown (ed.), Encyclopedia of
Language & Linguistics. 2nd ed. Oxford: Elsevier, 559–63.
Stump, Gregory (2016). Inflectional Paradigms: Content and Form at the Syntax-
Morphology Interface. Cambridge: Cambridge University Press.
Stump, Gregory (2017). ‘The nature and dimensions of complexity in morphology’. Annual
Review of Linguistics 3(1): 65–83. doi:10.1146/annurev-linguistics-011415-040752
Stump, Gregory and Raphael A. Finkel (2013). Morphological Typology: From Word to
Paradigm. Cambridge: Cambridge University Press.
Stump, Gregory and Raphael A. Finkel (2015). ‘Contrasting modes of representation for
inflectional systems: Some implications for computing morphological complexity’, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

378 

Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and
Measuring Morphological Complexity. Oxford: Oxford University Press, 119–40.
Syea, Anand (1992). ‘The short and long forms of verbs in Mauritian Creole: Functionalism
versus formalism’, Theoretical Linguistics 18: 61–97. doi:10.1515/thli.1992.18.1.61
Sylla, Yero (1982). Grammaire moderne du Pulaar. Dakar: Nouvelles éditions africaines.
Szmrecsanyi, Benedikt and Bernd Kortmann (2009). ‘The morphosyntax of varieties of
English worldwide: A quantitative perspective’, Lingua 119(11): 1643–63. doi:10.1016/j.
lingua.2007.09.016
Taft, Marcus (1979). ‘Recognition of affixed words and the word frequency effect’, Memory
& Cognition 7(4): 263–72. doi:10.3758/BF03197599
Taft, Marcus (2004). ‘Morphological decomposition and the reverse base frequency effect’,
The Quarterly Journal of Experimental Psychology 57(4): 745–65. doi:10.1080/
02724980343000477
Taft, Marcus and Sam Ardasinski (2006). ‘Obligatory decomposition in reading prefixed
words’, The Mental Lexicon 1(2): 183–99. doi:10.1075/ml.1.2.02taf
Tallman, Adam (2018). A Grammar of Chácobo, a Southern Pano Language of the Northern
Bolivian Amazon. University of Texas at Austin PhD dissertation.
Tamba, Khady, Harold Torrence, and Malte Zimmermann (2012). ‘Wolof quantifiers’, in
Edward Keenan and Denis Paperno (eds), Handbook of Quantification in Natural
Language. New York: Springer, 891–939.
Thiam, Ndiassé (1987). Les categories nominales en wolof. Aspects sémantiques. Dakar:
Centre de linguistique appliquée de Dakar.
Thomason, Sarah G. (2001). Language Contact: An Introduction. Washington, DC:
Georgetown University Press.
Thomason, Sarah G. (2008). ‘Pidgins/creoles and historical linguistics’, in Silvia
Kouwenberg and John Victor Singler (eds), Handbook of Pidgin and Creole Languages.
Malden, MA: Wiley-Blackwell, 242–62.
Thomason, Sarah G. (2015). ‘When is the diffusion of inflectional morphology not dis-
preferred?’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed
Morphology. Berlin: Mouton de Gruyter, 27–46.
Thomason, Sarah G. and Terence Kaufman (1988). Language Contact, Creolization, and
Genetic Linguistics. Berkeley, CA: University of California Press.
Thomaz, Luis Felípe (2002). Babel Loro Sa’e: O problema linguístico de Timor-Leste. Lisboa:
Instituto Camões.
Thornton, Anna M. (2005). Morfologia. Roma: Carocci.
Thornton, Anna M. (2011). ‘Overabundance (multiple forms realizing the same cell):
A non-canonical phenomenon in Italian verb morphology’, in Martin Maiden, John
C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Morphological Autonomy:
Perspectives from Romance Inflectional Morphology. Oxford: Oxford University Press,
359–82.
Thornton, Anna M. (2019). ‘Overabundance: A canonical typology’, in Franz Rainer,
Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds),
Competition in Inflection and Word-Formation. Cham: Springer, 223–58. doi:10.1007/
978-3-030-02550-2_9
Tily, Harry and T. Florian Jaeger (2011). ‘Complementing quantitative typology with
behavioral approaches: Evidence for typological universals’, Linguistic Typology 15(2):
497–508. doi:10.1515/LITY.2011.033
Timberlake, Alan (2004). A Reference Grammar of Russian. Cambridge: Cambridge
University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 379

Tinits, Peeter (2014). ‘Language stability and morphological complexity in situations of

language contact: An experimental paradigm’, in 19th International Congress of Linguists
Papers. Geneva: Département de Linguistique de l’Université de Genève.
Tomasello, Michael (2000). ‘First steps in a usage-based theory of language acquisition’,
Cognitive Linguistics 11: 61–82. doi:10.1515/cogl.2001.012
Tomasello, Michael (2006). ‘Acquiring linguistic constructions’, in Robert Siegler and
Deanna Kuhn (eds), Handbook of Child Psychology. New York: Wiley, 1860–2010.
Torrence, Harold (2013). The Clause Structure of Wolof: Insights into the Left Periphery.
Amsterdam: John Benjamins.
Tourneux, Henry and Maurice Barbotin (2009). Dictionnaire pratique du créole de
Guadeloupe. Paris: Karthala.
Tribout, Delphine (2012). ‘Verbal stem space and verb to noun conversion in French’,
Word Structure 5: 109–28. doi:10.3366/word.2012.0022
Trudgill, Peter (1983). ‘Language contact and language change: On the rise of the creoloid’,
in Peter Trudgill (ed.), On Dialect: Social and Geographical Perspectives. Oxford:
Blackwell, 102–7.
Trudgill, Peter (1997). ‘Typology and sociolinguistics: Linguistic structure, social structure
and explanatory comparative dialectology’. Folia Linguistica 31(3–4): 349–60.
doi:10.1515/ﬂin.1997.31.3-4.349
Trudgill, Peter (1999). ‘Language contact and the function of linguistic gender’, Poznań
Studies in Contemporary Linguistics 35: 133–52.
Trudgill, Peter (2004a). ‘Linguistic and social typology: The Austronesian migrations and
phoneme inventories’, Linguistic Typology 8(3): 305–20. doi:10.1515/lity.2004.8.3.305
Trudgill, Peter (2004b). ‘The impact of language contact and social structure on linguistic
structure’, in Bernd Kortmann (ed.), Dialectology Meets Typology: Dialect Grammar from
a Cross-Linguistic Perspective. Berlin: Mouton de Gruyter, 435–51.
Trudgill, Peter (2009). ‘Sociolinguistic typology and complexiﬁcation’, in Geoffrey
Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an Evolving
Variable. Oxford: Oxford University Press, 98–109.
Trudgill, Peter (2011). Sociolinguistic Typology: Social Determinants of Linguistic
Complexity. Oxford: Oxford University Press.
Trudgill, Peter (2017). ‘The anthropological setting of polysynthesis’, in Michael Fortescue,
Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis.
Oxford: Oxford University Press, 186–202.
Tuite, Kevin (1999). ‘The myth of the Caucasian Sprachbund: The case of ergativity’,
Lingua 108(1): 1–29. doi:10.1016/S0024-3841(98)00037-0
Ullman, Michael T. (2001). ‘The declarative/procedural model of lexicon and grammar’,
Journal of Psycholinguistic Research 30(1): 37–69. doi:10.1023/A:1005204207369
Ullman, Michael T. (2004). ‘Contributions of memory circuits to language: The declarative/
procedural model’, Cognition 92(1–2): 231–70. doi:10.1016/j.cognition.2003.10.008
Valdman, Albert, Iskra Iskrova, and Benjamin Hebblethwaite (2007). Haitian Creole-
English Bilingual Dictionary. Bloomington, IN: Indiana University Creole Institute.
Valenzuela, Pilar (2003). Transitivity in Shipibo-Konibo Grammar: A Typologically
Oriented Study. University of Oregon PhD dissertation.
Valenzuela, Pilar (2010). ‘Applicative constructions in Shipibo-Konibo (Panoan)’,
International Journal of American Linguistics 76: 101–44. doi:10.1086/652756
Vallejos Yopán, Rosa (2010). A Grammar of Kokama-Kokamilla. University of Oregon
PhD dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

380 

van der Voort, Hein (2005). ‘Kwaza in comparative perspective’, International Journal of
American Linguistics 71: 365–412. doi:10.1086/501245
van der Voort, Hein (2016). ‘Recursive inflection and grammaticalized fictive interaction in
the Southwestern Amazon’, in Esther Pascual and Sergeiy Sandler (eds), The
Conversation Frame: Forms and Functions of Fictive Interaction. Amsterdam: John
Benjamins, 277–302.
Van Engelenhoven, Aone (2004). Leti, a Language of Southwest Maluku. Leiden: KITLV
Press.
van Gijn, Rik and Fernando Zúñiga (2014). ‘Word and the Americanist perspective’,
Morphology 24: 135–60. doi:10.5167/uzh-99717
Vanhove, Martine (2001). ‘Contacts de langues et complexification des systèmes: Le cas du
maltais’, Faits de Langues 18: 65–74.
Veenstra, Tonjes (2009). ‘Verb allomorphy and the syntax of phases’, in Enoch Aboh and
Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins,
99–114.
Veenstra, Tonjes and Angelika Becker (2003). ‘The survival of inflectional morphology in
French-related creoles’, Studies in Second Language Acquisition 25: 285–306.
doi:10.1017/S0272263103000123
Villoing, Florence and Maxime Deglas (2016). ‘La formation de verbes dénominaux en
guadeloupéen. La part de l’héritage et de l’innovation’, 5ème Congrès Mondial de
Linguistique Française 2016, Tours, France. doi:10.1051/shsconf/20162708004
Wälchli, Bernhard (2017). ‘The incomplete story of feminine gender loss in Northwestern
Latvian dialects’, Baltic Linguistics 8: 143–214.
Wälchli, Bernhard (2018). ‘The rise of gender in Nalca (Mek, Tanah Papua): The drift
towards the canonical gender attractor’, in Sebastian Fedden, Jenny Audring, and
Greville Corbett (eds), Non-Canonical Gender Systems. Oxford: Oxford University
Press, 68–99.
Walsh, Michael (1976). The Murinypata Language of North-West Australia. Australian
National University PhD dissertation.
Walther, Géraldine (2017). ‘Paradigm realisation and the lexicon’, in Ferenc Kiefer, James
P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Organization: Data and
Analyses. Leiden: Brill, 159–99.
Weinreich, Uriel, William Labov, and Marvin Herzog (1968). ‘Empirical foundations for a
theory of language change’, in Winfred Philip Lehmann and Yakov Malkiel (eds),
Directions for Historical Linguistics. Austin, TX: University of Texas Press, 95–198.
Wells, Rulon (1954). ‘Archiving and language typology’, International Journal of American
Linguistics 20(2): 101–7.
Wichmann, Søren and Eric W. Holman (2009). Temporal Stability of Linguistic Typological
Features. München: LINCOM Europa.
Wilson, William André Auquier (1989). ‘Atlantic’, in John Theodore Bendor-Samuel (ed.),
The Niger-Congo Languages: A Classification and Description of Africa’s Largest
Language Family. Lanham, MD: University Press of America, by arrangement with the
Summer Institute of Linguistics (SIL), 81–104.
Wilson, William André Auquier (2007). Guinea Languages of the Atlantic Group. Frankfurt
am Main: Peter Lang.
Wise, Mary Ruth (1971). Identification of Participants in Discourse: A Study of Aspects of
Form and Meaning in Nomatsiguenga. Norman, OK: Summer Institute of Linguistics of
the University of Oklahoma.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 381

Wise, Mary Ruth (1990). ‘Valence-changing afﬁxes in Maipuran Arawakan languages’, in

Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South American
Languages. Austin, TX: University of Texas Press, 89–116.
Wise, Mary Ruth (2002). ‘Applicative affixes in Peruvian Amazonian languages’, in Mily
Crevels, Simon van de Kerke, Sérgio Meira, and Hein van der Voort (eds), Current
Studies on South American Languages: Selected Papers from the 50th International
Congress of Americanists in Warsaw and the Spinoza Workshop on Amerindian
Languages in Leiden, 2000. Leiden: Research School of Asian, African, and Amerindian
Studies (CNWS), 329–44.
Wittmann, Henri and Robert Fournier (1987). ‘Interpretation diachronique de la morpho-
logie verbale du créole réunionnais.’ Revue québecoise de linguistique 6(2): 137–50.
Woodbury, Anthony (2017). ‘Central Alaskan Yupik (Eskimo-Aleut): A sketch of mor-
phologically orthodox polysynthesis’, in Michael Fortescue, Marianne Mithun, and
Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford
University Press, 536–60.
Wray, Alison and George W. Grace (2007). ‘The consequences of talking to strangers:
Evolutionary corollaries of socio-cultural influences on linguistic form’, Lingua 117(3):
543–78. doi:10.1016/j.lingua.2005.05.005
Wurzel, Wolfgang U. (1989). Inflectional Morphology and Naturalness. Dordrecht: Kluwer.
Xanthos, Aris, Sabine Laaha, Steven Gillis, Ursula Stephany, Ayhan Aksu-Koç, Anastasia
Christofidou, Natalia Gagarina, Gordana Hrzica, F. N. Ketrez, Marianne Kilani-Schoch,
Katharina Korecky-Kröll, Melita Kovačević, Klaus Laalo, Marijan Palmović, Barbara
Pfeiler, Maria D. Voeikova, and Wolfgang U. Dressler (2011). ‘On the role of morpho-
logical richness in the early development of noun and verb inflection’, First Language 31
(4): 461–79. doi:10.1177%2F0142723711409976
Yarshater, Ehsan (1969). A Grammar of Southern Tati Dialects. The Hague: Mouton.
Zaliznjak, Andrei A. (1967). Russkoe imennoe slovoizmenenie. Moscow: Nauka.
Zaliznjak, Andrei A. (1977). Grammatičeskij slovar’ russkogo jazyka. Moscow: Russkij
jazyk.
Zúñiga, Fernando (2017). ‘On the morphosyntax of indigenous languages of the Americas’,
International Journal of American Linguistics 83(1): 111–39. doi:10.1086/689548
Zwitserlood, Inge (2003). ‘Word formation below and above little x: Evidence from sign
language of the Netherlands’, Nordlyd 31(2): 488–502.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Language Index

Abkhaz-Adyghean languages, see West Bilinarra 86–7

Caucasian languages Bininj Gun-Wok (BGW) 171, 190
Abun 268 Bislama 86
Acoma 190 Bodic languages 198, 228
Aghul 208, 209, 213, 226 Bora 236, 239
Aikanã 239, 241 Boran languages 238, 239
Ainu 166–7, 190 Bulgarian 173, 180, 184, 189
Akha 274 Bunuban languages 190
Albanian 189, 273 Buy/Nyun 137
Aleut 190
Algic languages 174, 190, 191, 216 Cariban languages 238
Andoke 242 Cavineña 248, 250, 253, 254, 258, 259, 261
Apurinã 238, 241 Cayuvava 191
Arabela 246 Central Alaskan Yup’ik (CAY) 190, 248,
Araucanian languages, see Mapudungun 250–2, 254–5, 258, 259, 261, 262
Arawakan languages 237–9, 241, 243–6, Central Malayo-Polynesian languages 274
248, 254 Central Pomo 308–16, 322, 326–7
Archi 180, 184, 189, 226 Chácobo 239, 240–1, 248, 250, 254–5,
Ashéninka Perené 248, 255, 259 257–61, 263
Athabaskan languages 190 Chamorro 197, 198, 205, 211–13,
Atlantic languages 16, 136–60, 188, 273, 215, 228
280, 303 Chayahuita 246
Atlantic-Congo languages, see also Niger-Congo Chimariko 191
languages 196, 197, 214, 218, 223 Chinese, Mandarin, see Mandarin Chinese
Austroasiatic languages 206, 226, 278 Chinook Jargon 280
Austronesian languages 110, 190, 197, 205, 211, Chinookan languages 191
228, 268–9, 280 Chintang 13
Avar 170, 171, 180, 182–3, 185, 189 Chiquihuitlán Mazatec 30
Aymara 191 Chukchi 190
Aymaran 191 Chukchi-Kamchatkan languages 190
Chuvash 190
Bagnoun, Baïnounk, Bainuk, Banyun 137, Common Slavic 28
140, 148 Cree 190, 216–17, 228
Baïnounk Gubaher 140, 148, 155, 156 Cubeo 238
Baïnounk Gunyamolo 148, 155 Cupeño 180, 184–5, 191
Balto-Slavic languages 176, 177, 198, 208, Cushitic languages 189
213, 224
Bantu languages 113, 114, 169, 171, 173, 196–7, Dahalo 189
198, 207, 216, 217–19, 223, 267, Diola-Fogny 140, 155
273, 275, 280–1 Diyari 190, 191
Bardi 174, 190 Djingulu 190
Basque 189, 198, 205, 208, 213, 215 Dogon languages 189
Lekeitio 205, 208, 213, 215–16, 220, 224
Standard 224 Eastern Pomo 190
Benue-Congo languages 171, 189 Eipo 198, 228
Berber 237 Elfdalian, see Swedish, Elfdalian
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

384  

English 13, 18, 26, 56, 81, 84, 85, 87, 108, Haitian Creole 16, 106, 113, 114, 117–18, 120,
110, 125, 163, 166, 170, 171, 196, 208, 131–5, 270, 272, 279–80
213, 225, 267, 271, 274, 276, 277, Haitian Creole English 279
279–80, 303, 310–11, 316, 320, 326, Haro 189
332, 336 Hinuq 180, 182–3, 189
African-Amercian Vernacular 279 Hopi 180, 184–5, 191
Middle 85 Huallaga Quechua 191
Old 52, 74, 271, 274 Hungarian 84, 189
Eshtehardi 220, 227 Hunzib 180, 183, 189
Eskimo-Aleut languages 190, 248, 254 Hup 232, 238, 240, 242–4, 246, 248, 250, 254–5,
Even 190 258–9, 260–1, 263
Evenki 190 Hupa 190

Finnish 170, 171, 189, 213, 227 Icari 189

Fongbe 273, 277 Icelandic 33, 276
French 16, 25–26, 33, 74, 105–6, 110–17, Igo 213–15, 223
119–20, 122–4, 127–8, 130–5, 160, Ikposo 223
216–17, 228, 270, 272, 276, 279 Indo-European languages 2, 106, 169, 171, 174,
Cajun 111 178, 182, 186, 189, 193, 196, 200, 202, 203,
Medieval 134 207, 210, 216, 224–5, 227, 230, 273,
Norman 85 276, 278
French-based creoles 16, 105–6, 110, 113–14, Indo-Portuguese creoles 113
116–18, 120 Ingush 171, 177, 189
Fula 137, 140, 144, 145–50, 152, 153, Insular Celtic languages 198, 213, 225
159, 188 Inuit 13
Fuuta-Jaloo Pular 140, 148 Iranian languages 198, 227, 272
Gombe 145–6, 153 Northwestern 201, 208, 220, 227
Fur 189 Southwestern 220
Irish 208, 213, 215, 225–6, 276
Gbe languages 213, 268, 271, 279 Ros Much 226
German 14, 171–2, 173, 184, 189, 199 Iroquoian languages 190
Germanic languages 85, 170, 189, 198, 273 Northern 316
North 200, 203, 208, 213, 227 Italian 84, 147, 196, 275, 276, 284
West 213 Itelmen 190
Ghana-Togo-Mountain languages 198, 213–14, Iwaidjan languages 168, 190
223–4
Godoberi 189 Jaminjung 87
Gooniyandi, see also Kuniyanti 90 Jamsay 189
Greek 23, 30, 32, 62, 189, 198, 208, 210, Jamul Tiipay 191
213, 225, 338 Jangshung 205, 208, 209, 228
Asia Minor dialects 210, 215, 225 Jaqaru 191
Cappadocian 202–3, 208, 210–13, Jarawara 243, 248, 250, 252, 254, 255, 258,
215, 225 259–62
Pontic 202, 204, 225 Jaru 87
Rumeic 225 Juu languages 189
Standard Modern 202, 210–11, 225
Guadeloupean Creole 16, 106, 116–18, 120, Kabardian 189
124–32, 134–5 Kafteji 198, 201–2, 208, 220–1, 227
Gullah Creole English 279 Kakua 239, 244
Gunwingguan (Gunwinyguan) languages 171, Kamayurá 244
190, 198, 224 Kanoe 239, 246
Central 224 Karata 189
Gurindji 12, 82–3, 87–8, 102–3 Karo 242, 246
Gurindji Kriol 12, 16, 81–3, 86–103, 343 Karok 191
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  385

Karrangpurru 87 Mande languages 148, 269

Kartvelian languages 173, 189, 213, 273, 275 Mandinka 137
Kashibo-Kakataibo 191 Mapudungun 191, 232
Kelasi 198, 201–2, 208, 220–1, 227 Mari 189
Keresan languages 190 Marri Ngarr 53
Ket 190, 191 Marri Tjevin 53
Khanty 189, 190 Matses 242–3
Khasi 206, 226 Mauritian Creole 16, 106, 110, 112–14, 116–18,
Khasian languages 198, 226 120–5, 128, 131, 134–5
Khinalug 177, 189 Mawng 168, 190
Kikongo 280 Mayan languages 191
Kiowa 190 Mazatec 30, 62
Kiowa-Tanoan languages 190 Mek languages 198, 228
Klamath 191 Mian 167–8, 173, 187, 190
Klamath-Sahaptian languages 191 Michif 197, 198, 207, 216–17, 228
Koasati 191 Mindi languages 190
Koiari 190 Miwokan languages 191
Kokama-Kokamilla 241, 248, 254, 255, Mohawk 316–20, 322–5, 326–7
258–9, 261 Mongolian 190
Kotiria 248, 250, 254, 255, 258–9, 261 Mongolic languages 190
Kriol 12, 82, 83, 87–8, 102, 103 Mordvin 189
Kundjeyhmi 224 Movima 191, 236–7, 248, 250, 254–5, 258, 259,
Kune 224 261, 262
Kuniyanti, see also Gooniyandi 190 Mudburra 87
Kunwinjku 224 Murrinhpatha 15, 52–80, 84
Kuuk Thaayorre 90 Muskogean languages 191
Kwa languages 213–14
Kwaza 167–8, 191, 239, 246 Nakh-Daghestanian languages 169, 170, 171,
173–4, 176, 177, 181–3, 189, 209, 226
Lak 180, 183, 189 Nalca 198, 228
Lakhota 190 Nama 171, 189
Lango 188, 191 Nambikwara 239, 242
Latin 56, 74, 109, 142, 169, 171 Nanai 190
Latvian 212, 224 Nanti 243–4
Tamian 203, 208, 212, 213 Nez Perce 191
Leti 274 Ngaliwurru 87
Lezgi 177, 180, 184, 189, 209 Nganasan 190
Lezgic (Lezgian) languages 180, 183, 198, 208, Ngarinyman 86, 87
213, 226 Niger-Congo languages 110, 136–8, 140–2, 143,
Light Warlpiri 89 148, 155, 193, 267–9, 273, 303
Lingala 218 Niger-Kordofanian languages, see Niger-Congo
Kinshasa 218–19, 223 languages
Makanza 207, 216, 218–19, 222, 223 Nilotic languages 188
Lithuanian 2–4, 6, 10, 189, 284 Nivkh 142, 190
Lower Sepik languages 190 Nomatsigenga 244–5
Luganda 169, 171, 189 Nubi Creole Arabic 280
Lyngngam 206, 226 Nupe 268, 271
Nuuchahnulth 191
Madang languages 190 Ñuun, see also Bagnoun 137, 140, 144, 147
Maidu 191 Nyulnyulan languages 174, 190
Malngin 87
Manchu 190, 191 Ok languages 167, 173, 190
Mandarin Chinese 168, 169, 175, 190, 191, 267, Omotic languages 189
270, 276, 277–8, 341 Ossetic 173, 189
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

386  

Paez 191 Tamambo 86

Paiwan 190 Tariana 237, 238, 241, 244, 246, 248, 250,
Palenquero 110, 280 253–5, 258–9, 261
Pama-Nyungan languages 15, 87, 190 Tatuyo 236
Panoan languages 191, 239–41, 242–4, 248, Tawala 190
254, 257, 259, 260 Thompson 190
Paresi 241, 245–6, 248, 250, 254, 255, 258, Tibeto-Burman languages 228
259, 261 Tindi 189
Pazar Laz 173, 189 Tok Pisin 6
Pilagá 239 Trans New Guinea languages 228
Pipil 180, 184–5, 191 Tsakhur 180, 183–4, 189
Pnar 206, 226 Tukanoan languages 236, 238–9, 241, 242, 248
Pomoan languages 190, 308, 310 Tümpisa Shoshone 180, 184–5, 191
Portuguese 113, 241, 280 Tundra Nenets 190
Pular, see Fula Tungusic languages 176, 177, 190
Turkic languages 190, 208, 209, 213, 220
Quechuan languages 191 Turkish 2–3, 7, 10, 141, 142, 147, 210–13,
225, 284, 342
Romance languages 196, 208, 213, 216, 273, Tzutujil 191
275, 276
Romanian 173, 189 Udehe 190
Rongga 268, 270, 273, 274, 276 Udi 180, 184, 189, 208, 209, 213, 226
Russian 15, 23, 25, 27–32, 34–51, 169, 171, Uralic languages 176–8, 189–9, 275, 278
172, 178, 180, 184, 189, 191, 270, 291 Urarina 248, 250, 252, 254–5, 258–9, 261
Usan 190
Saami 62 Uto-Aztecan languages 176, 177, 180, 184–5, 191
Kildin 189
Skolt 175, 178, 191 Wakashan languages 191
Salish languages 190 Wappo 191
Seereer see Seereer-Siin Wari’ 241
Seereer-Siin 137, 144–5, 149, 150, 159 Warlpiri 54–6, 57, 87
Seneca 190 West Caucasian languages 189
Seri 62, 66 Wichí 239
Shipibo-Konibo 239–40, 242, 244 Wishram 191
Shumcho 198, 205, 208, 209, 213, 228 Witotoan languages 238
Siin-Gandum, see also Seereer-Siin 144 Wolof 16, 136–41, 143–4, 148–60, 270, 273, 303
Sinitic languages 110, 268–9, 278 Mbakke 136, 143
Sino-Tibetan languages 190, 274
Siouan languages 190 Xamatauteri Yanomami 244
Slovene 173, 178, 180, 184, 189, 191
Somali 189 Yagua 167, 238–9, 246
Sorbian 178, 184, 189 Yakut 190
Lower 180, 191 Yanesha’ 241
Southern Sierra Miwok 191 Yeniseian languages 190, 278
Spanish 211–13, 215, 220, 225, 228, Yimas 190
280, 310 Yokuts 191
Sranan Creole English 279–80 Yoruba 6, 268, 270, 276, 279
Svan 189, 275 Yuhup 238
Swahili 142, 273–4, 275 Yukagir 190
Swedish 203, 204 Yuki-Wappo languages 191
Elfdalian 201, 227 Yuman languages 191
Karleby 203, 208, 213, 215, 227 Yurok 174, 191
Standard 200–1, 203–4, 227
Sεlεε 213, 223 Zuni 190
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Subject Index

abstractive (models, frameworks, biuniqueness 9, 54, 164, 230, 234, 247, 253, 254,
perspectives) 326–7 262, 341–2
acquisition 12, 13–14, 17, 53, 57, 75, 288, 303, borrowing 12, 16, 127, 160, 194, 205, 209, 212,
311, 326–7 215, 222, 233, 238–9, 246, 273
first language, L1 13, 61, 323–5 bound status 235, 248, 256, 257–8, 262, 263
native, see acquisition, first language, L1
non-native, see acquisition, second language, canonical typology 108, 340–1
L2, adult canonicality 163–92
second language, L2, adult 17, 111–12, 114, canonicity 9, 10, 16, 24, 163–4, 236, 238, 340–1
267–82, 286, 326 case 2–3, 82–3, 87–90, 163, 166, 171–2, 174, 175,
actualization 92, 96, 101 184, 246, 272–3, 274, 286, 343
adult acquisition, see acquisition, second Caucasus 171, 176, 177, 178, 180, 181, 182,
language, L2, adult 183, 184
agent 90 Chaco region 238, 239
agglutinative, agglutinating morphology 3, Circum-Baltic area 176
137, 141, 141–2, 143, 144–8, 158, class prefixation 151
234, 255 classifier stem 53, 59–75
agreement 173–4, 193–228, 236, 287, 288, classifiers 61–3, 167–8, 169, 236–9
291–8, 303 numeral 270, 277
default 139 closed classes 52, 53, 59, 61, 66, 68, 71, 75
redistribution of 200–4 co-exponence 71, 171, 184
subject-verb 284 complexification 16, 82, 83, 85, 88, 89, 103, 109,
agreement targets 151–8 111, 136–60, 183, 194, 285
algorithmic information content 331 complexity:
alignment 88 absolute (absolutive) 8, 24, 31, 106, 136, 195,
allomorphy 3, 7, 8, 9, 54–6, 57, 58, 59, 61–6, 306, 337
68–70, 72, 75, 89, 110, 148, 149, 170, 172–3, agent-related 306, 337
188, 230, 234, 247, 251, 252–3, 255, 261, canonical 163–92, 334, 340–2
317, 326, 327 compositional 335–7
Amazonian languages 17, 167, 230–63 constitutional 8–9, 141, 335
analogy 16, 26, 27, 52–4, 57, 61, 67, 70, 71–4, corpus 306
75, 326 descriptive 9, 14, 151, 163–4, 195, 204, 217,
analyticity 17, 110, 267–82 332, 335, 339, 340
Andean languages 231, 246 effective 6, 306
animacy 38, 39, 85, 90, 91, 92, 95, 96, 172, 174, enumerative (E-complexity) 8–9, 11, 24, 32,
197, 199, 201–4, 205, 213, 214, 217, 218, 56, 82, 85, 89, 102, 103, 106, 112, 163, 175,
219, 238 233, 334, 335, 336–7
argument relations 88, 90, 93, 102, 103 exponence 233, 234, 247, 251–5, 335
autonomous (or pure) morphology 6–7, 18, 24, formal 8, 13–14
119, 147, 230–1, 235, 247–51, 255, 256–62 generative 9, 151
auxiliary 101 integrative (I-complexity) 11, 12–13, 16, 24–5,
average conditional entropy, see entropy 27, 32, 56, 57, 59, 62, 65–6, 71, 75, 82, 85, 89,
103, 106–7, 108, 112–13, 122, 135, 233, 334,
bias amplification 304 335, 337–40, 343
bilingualism 193, 210, 211, 214, 215, 220, 222, inventory (IC) 163, 334–6
307, 308, 311 Kolmogorov 9, 163, 172, 185, 306, 331, 341
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

388  

complexity: (cont.) equicomplexity hypothesis 2

modes of 335 ergative 83, 88, 89, 90, 93, 95, 102, 103
objective 8, 306, 326, 337 evidentiality 231, 232, 234, 241–4, 250, 254, 262
paradigmatic 9, 84, 196 expansion (of gender marking) 200, 205–7,
relative 8, 84, 136, 164, 180, 306 216–19, 220, 222
structural 159, 269, 306, 342 exponence:
syntagmatic 3, 196 cumulative 3, 8, 171
system 13, 17, 23, 27, 41–6, 46–8, 233, 234, multiple 174, 234, 247, 251, 253
235–47, 248, 262, 306, 335, 342 partial 173–4
taxonomic(al) 9, 163, 335
compounding 14, 232, 233, 235, 238, 262, 320 frequency 13, 28, 67, 110, 114, 116, 294–5,
conditional entropy, see entropy 303, 307
conditioned variation 304 token 27, 36
consonant mutation 62, 144, 145, 150–1, 173 type 27, 33, 34, 42–3, 44, 46, 47
constructive (models, frameworks,
perspectives) 326, 335 gender 166, 167, 169–74, 176–7, 193–228, 237,
contact-induced change 12, 81–103, 194, 205, 238, 272
209–10, 211, 213, 244, 246, 286 gender marking:
contiguity 235, 248, 256, 258–9, 262 emergence of 198, 200, 205–7, 209–16, 221–2,
continuative aspect 101 238, 278
conversion 120, 123, 124, 129, 130, 132, 133, 134 erosion of 200, 203, 213, 220
co-referential pronoun 90, 92, 95, 96, 99, 102, loss of 200–7, 209–16, 222
103, 343 reduction of 200–5, 208, 212, 215, 221, 222
corepresentability 332 generalized linear mixed models (GLMM) 16,
cost 8, 12, 13, 14, 24, 136, 185, 195, 337 82, 83, 85–6, 91–6, 99
creoles 2, 12, 16, 87, 105–6, 109–13, 113–14, grammatical gender, see gender
116–18, 135, 267, 271, 272, 277, 278–80 grammaticalization 12, 110, 206, 231–5, 236,
crosslinguistic tendency 33, 57 237–8, 241–7, 262–3, 275–6, 277, 307, 343
culminativity 261 greater vs. unmarked plural 155

declension entropy, see entropy idiolect 82, 85, 216

default agreement, see agreement imperfect learning 194, 283–305
defectiveness 9, 30, 38, 42, 47, 48, 50, 157, implicative structure 25, 30, 31–3, 41, 43–6,
158, 234 49, 50
definiteness 85 inanimate, see animacy
demography 199, 209, 216, 221 incorporation 59, 232, 233, 235, 238, 244, 245,
demorphologization 16, 52–4, 70–1, 74–5 246, 250, 260, 262, 320–2
dependent marking 166 inflecting-fusional 137, 141, 142, 144, 147,
Depth-of-Inference Contrast 32 151, 158
derivation 7, 11, 13, 14, 107, 118–20, 131, 132, inflection:
134, 318–19, 335 contextual 110, 272–3
deterministic input 304 inherent 110, 270, 272–3
difficulty, see cost inflection class 23–51, 54–6, 62, 107, 147, 168–9,
dominance 212, 213 186, 333, 336
dominance analysis 83, 86, 90, 91, 93, 95, 96, inflectional categories 60, 165–6, 175, 270
97, 100 information-theoretic approach 8, 11, 24, 26, 27,
drift 270–2, 281 32, 40, 337
dual-route model 13 information theory 43, 107, 343
intergenerational change 67, 68, 89, 99, 102,
entropy 11, 27, 40–9, 55, 56–9, 65–6, 81, 84, 213–14, 215, 267, 287, 290, 293, 295, 297
296–8, 338–40 interrupted transmission 12, 290, 291–2, 294,
conditional 26, 32, 33, 40–1, 43–6, 47, 49, 57, 295, 297, 298, 305
58, 66, 71, 338–40 intersecting formative 26–7, 54, 61–6, 68, 70, 75
declension(al) 33, 338 intransitive subjects 83, 88–90, 102, 103
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  389

irregular/irregularity 8, 13, 23–51, 84, 119, 125, obsolescence 84, 311–16

136, 137, 141, 144, 148, 149, 150, 151, opacity 8, 10, 11, 75, 113, 174–5
151–8, 270, 284–5, 293–302, 303–5, 332 overabundance 9, 24, 81–6, 88–90, 99, 102, 103,
isolating languages 2, 83, 105, 137, 158, 269, 341 144, 342–3
iterated learning 285–6, 304 overspeciﬁcation 284–5, 287, 288, 290, 291–3,
296–305, 335, 342
Kolmogorov complexity, see complexity
Pāṇini’s Principle 332
language attitude 159, 221 Paradigm Cell Filling Problem 55, 59, 61
language contact 2, 12, 14, 15, 19, 50, 53, 81–103, paradigm organization 333
109, 182, 183, 193–5, 205, 209–10, 211, 213, Paradigm Structure Conditions 11
233, 235, 238, 241, 244, 246, 262, 263, 267, paradigmatic layers 25, 29–31, 34, 39, 41–6, 48,
269, 271, 280, 282, 286, 306, 308–9, 50, 54, 62
310–12, 343 passive 314–16
language ecology 209–21 pattern competition 343
language evolution 222, 285 pattern regulation 343
language genesis 12, 102, 114, 279–80 periphrastic construction 84
learnability 50, 163, 298–302, 305 person 166, 167
lexeme-based morphology 118 pidgin 2, 12, 105, 109–11, 267, 272
lexical storage 27–9, 303 pidginization 218, 219, 279, 281
lexicalization 204, 315, 316, 317, 322 portmanteau 8, 60, 167, 171, 242
lexicon qua mental lexicon 13, 28, 29, 326, possessive 85, 89
333–4, 335 predictability 1, 11, 14, 26, 33, 39–40, 45, 47,
lexiﬁer 87, 105, 109, 111–12, 114, 116, 123, 130, 52–3, 55, 56–9, 65, 68–70, 71, 84, 85, 106,
132, 135 107, 120, 123, 131, 135, 169, 171, 338
linguistic areas 213, 222, 308–9 prestige 159, 160, 195, 199, 209, 212, 213, 222
linguistic correctness 160 priming 91, 92, 93, 96, 102, 103, 343
Low (Conditional) Entropy Conjecture 11, 25, principal parts 32, 33, 333, 336
32, 33, 45, 49, 71 probabilistic input 304
Probabilistic Syntax 85
Marginal Detraction Hypothesis 33, 34 probability matching 294–6
memorization 53, 75, 303 processing 1, 12–14, 26, 53, 56, 61, 75, 106,
minimum description length 9, 26, 195, 204, 206, 322, 326
306, 331–2, 334, 337, 340, 343 processing cost, see cost
morpheme-to-word ratio 3 productivity 10, 23, 28, 53, 60, 111, 114–15, 128,
morphological decomposition 303 130, 132, 134, 135, 141, 194, 201, 203,
morphological richness 10, 136, 141–2, 336 205–6, 213, 216, 218, 232, 235, 245, 251,
morphome 11, 31, 119, 122, 247 253, 262, 286, 304, 320, 327, 332, 336
morphophonological erosion 193, 200–3 prosodic dependence 235, 248, 256, 259–61, 262
multilingualism 12, 53, 213, 307 psycholinguistic approach 11, 13, 14

Natural Morphology 10, 12–13 qualitative approach 8, 9–10

naturalness, see Natural Morphology quantitative approach 8, 9
Network Morphology 28, 62
neural networks 14 redundancy 8, 14, 141, 287, 288, 293,
nominal classification 193, 231, 234, 235–9, 303, 305
250, 262 reduplication 122, 312
North Pacific Rim 176–7 regression analysis 47, 85, 93, 95
noun class 34, 49, 136, 138–40, 144–8, regular/regularity 9, 13, 14, 23, 25–8, 34, 46–8,
150, 160, 173, 218, 219, 236, 270, 81, 84, 144, 235, 285, 297, 302, 304, 305,
273, 303 331–2
noun incorporation, see incorporation regulations 335–6
number 166, 167, 173–4 resources 163, 335–6
numeral classifier, see classifiers routinization 307, 308, 327
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

390  

set-theory 26, 32 synthesis 13, 106, 306

simplification 12, 59, 61, 67, 68–70, 71, 72, 83, synthesis index 2, 231
84, 88, 89, 99, 103, 109, 110, 112, 141, 144,
159–60, 194, 203, 267, 270, 279, 285–7, 288, templatic morphology 10, 232, 307, 317, 322
293, 305 tense 234, 235, 239–41, 243–4, 250, 252, 260, 262
sociocultural context 307, 308 topicality 85, 86
socioecological parameters 19, 283 transatlantic slave trade 279
sociolinguistic isolation 180, 186 transitive subjects 89, 90, 93, 95, 96, 99, 102
sociolinguistic typology 12, 53 transmission fidelity 298
sociolinguistics 85, 160, 180–5, 186 transparency 9, 10, 113, 114, 163–4, 175, 186,
stem alternation 24, 25, 46, 114, 142, 148 340–2
stem class 38, 170, 186
stem flexivity 168–9, 170 U-curve 302
stress: unpredictability 39, 52–4, 55–8, 60, 65, 70–1, 75,
inflectional 23, 29, 30 168, 169, 170, 171, 341
syllable 36, 276, 323
suffixation 124, 128, 129, 130, 132, valence-adjusting 234, 244–6, 250, 262
133, 149 Vaupés region 233, 238, 242, 244
suppletion 6, 8, 9, 26, 28, 33, 60, 65, 74, 117, 125,
142, 157, 234, 251, 252–3, 270, 307, 332, word formation 6, 7–8, 11, 197, 215, 320
338, 340–1 word recognition 6, 14
syncretism 6, 9, 29, 56, 81, 84, 89, 111, 113, 115, word-and-paradigm framework 11
116, 117, 120, 124, 125, 126–7, 128, 172, wordhood 172, 173, 234–5, 248, 250, 255,
173, 174, 194, 234, 250, 307, 341 256–61, 262

Vaux Bert The Phonology of Armenian
100% (2)
Vaux Bert The Phonology of Armenian
298 pages
Sergio Scalise Generative Morphology
100% (2)
Sergio Scalise Generative Morphology
252 pages
Elementary Levels 1-19
100% (1)
Elementary Levels 1-19
19 pages
Morphosyntactic Complexity A Typology of
No ratings yet
Morphosyntactic Complexity A Typology of
51 pages
Document (20)
No ratings yet
Document (20)
5 pages
Emergent Phonology
No ratings yet
Emergent Phonology
207 pages
Instant Download (eBook PDF) The Grammar of Words: An Introduction to Linguistic Morphology PDF All Chapters
100% (2)
Instant Download (eBook PDF) The Grammar of Words: An Introduction to Linguistic Morphology PDF All Chapters
50 pages
On Looking Into Words and Beyond
No ratings yet
On Looking Into Words and Beyond
628 pages
Morphological Analysis in Comparison
No ratings yet
Morphological Analysis in Comparison
273 pages
Morphological Complexity and Lexical Processing Costs
No ratings yet
Morphological Complexity and Lexical Processing Costs
23 pages
Competition in Infl Ection and Word-Formation
No ratings yet
Competition in Infl Ection and Word-Formation
334 pages
CONCEPTUALIZATION OF MORPHOLOGY Etymologically
No ratings yet
CONCEPTUALIZATION OF MORPHOLOGY Etymologically
5 pages
o10_paper
No ratings yet
o10_paper
10 pages
Morphology From Data to Theories (Antonia Fábregas, Sergio Scalise)
No ratings yet
Morphology From Data to Theories (Antonia Fábregas, Sergio Scalise)
317 pages
Chinese Morphology
No ratings yet
Chinese Morphology
19 pages
Cross-Linguistic Perspectives On Morphological Processing: An Introduction
No ratings yet
Cross-Linguistic Perspectives On Morphological Processing: An Introduction
8 pages
19077
No ratings yet
19077
47 pages
(Studies in Language Companion Series) Matti Miestamo, Kaius Sinnemaki, Fred Karlsson - Language Complexity_ Typology, Contact, Change -John Benjamins Publishing Company (2008)
No ratings yet
(Studies in Language Companion Series) Matti Miestamo, Kaius Sinnemaki, Fred Karlsson - Language Complexity_ Typology, Contact, Change -John Benjamins Publishing Company (2008)
375 pages
Joseph--diachronicMorphologyHoHL22021
No ratings yet
Joseph--diachronicMorphologyHoHL22021
26 pages
Aronoff, Mark (1976) : Word Formation in Generative Grammar. Massachussetts: The MIT Press
79% (14)
Aronoff, Mark (1976) : Word Formation in Generative Grammar. Massachussetts: The MIT Press
74 pages
Bubenik - Morphology
No ratings yet
Bubenik - Morphology
119 pages
A Grammar Of Khwarshi Zaira Khalilova pdf download
No ratings yet
A Grammar Of Khwarshi Zaira Khalilova pdf download
76 pages
Word Formation in Generative Grammar 9780262010474 9780262510172 Compress
No ratings yet
Word Formation in Generative Grammar 9780262010474 9780262510172 Compress
148 pages
Oxford Handbooks Online: Word and Paradigm Morphology
No ratings yet
Oxford Handbooks Online: Word and Paradigm Morphology
26 pages
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
No ratings yet
Emily M. Bender - Linguistic Fundamentals For Natural Language Processing-Morgan & Claypool (2013)
166 pages
The Construction of Words Advances in Construction Morphology
No ratings yet
The Construction of Words Advances in Construction Morphology
617 pages
The Handbook of Morphology - Spencer and Zwicky PDF
100% (4)
The Handbook of Morphology - Spencer and Zwicky PDF
391 pages
Handbook of Morphology
86% (7)
Handbook of Morphology
391 pages
1470377829Lings_P1_M6-eText
No ratings yet
1470377829Lings_P1_M6-eText
9 pages
Rastorgueva Vs Et Al The Gilaki Language
No ratings yet
Rastorgueva Vs Et Al The Gilaki Language
461 pages
Names
No ratings yet
Names
40 pages
The Oxford History of Romanian Morphology (Martin Maiden, Adina Dragomirescu Etc.)
No ratings yet
The Oxford History of Romanian Morphology (Martin Maiden, Adina Dragomirescu Etc.)
542 pages
Morpho 123456
No ratings yet
Morpho 123456
333 pages
Researches in Morphology
No ratings yet
Researches in Morphology
6 pages
LNGS2624 LectureWeek1
No ratings yet
LNGS2624 LectureWeek1
19 pages
(Gisbert Fanselow, Caroline Fery, Matthias Schlese (BookFi)
100% (1)
(Gisbert Fanselow, Caroline Fery, Matthias Schlese (BookFi)
416 pages
Rebin Adnan Osman
No ratings yet
Rebin Adnan Osman
7 pages
(Oxford Studies in Comparative Syntax) Marit Julien - Syntactic Heads and Word Formation (2002, Oxford University Press, USA) - 2
100% (1)
(Oxford Studies in Comparative Syntax) Marit Julien - Syntactic Heads and Word Formation (2002, Oxford University Press, USA) - 2
416 pages
Unit 1. The Emergence of Morphology
No ratings yet
Unit 1. The Emergence of Morphology
10 pages
Current Studies in Spanish Linguistics (1992)
No ratings yet
Current Studies in Spanish Linguistics (1992)
1,234 pages
Word Structure PDF
No ratings yet
Word Structure PDF
243 pages
Document (21)
No ratings yet
Document (21)
6 pages
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
No ratings yet
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
7 pages
Morphology Classes1 2 3 Lahrouchi_Ilia State University Tbilisi2023
No ratings yet
Morphology Classes1 2 3 Lahrouchi_Ilia State University Tbilisi2023
126 pages
The Morphology and Phonology of Exponence - Trommer (2012)
No ratings yet
The Morphology and Phonology of Exponence - Trommer (2012)
587 pages
The Semantics of Word Division in Northwest Semitic Writing Systems: Ugaritic, Phoenician, Hebrew, Moabite and Greek
From Everand
The Semantics of Word Division in Northwest Semitic Writing Systems: Ugaritic, Phoenician, Hebrew, Moabite and Greek
Robert S.D. Crellin
No ratings yet
English Language and Linguistic Studies
From Everand
English Language and Linguistic Studies
Yogendra Butt
No ratings yet
The Philosophy of Language: A Simple Guide to Big Ideas
From Everand
The Philosophy of Language: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Toward Human-Level Artificial Intelligence: Representation and Computation of Meaning in Natural Language
From Everand
Toward Human-Level Artificial Intelligence: Representation and Computation of Meaning in Natural Language
Philip C. Jackson
No ratings yet
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Conceptual Integration Theory in Idiom Modifications
From Everand
Conceptual Integration Theory in Idiom Modifications
Nihada Delibegović Džanić
No ratings yet
Lex Analysis and Implementation: Definitive Reference for Developers and Engineers
From Everand
Lex Analysis and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Words, Phrases, and Building a Strong Vocabulary
From Everand
Words, Phrases, and Building a Strong Vocabulary
Gauraang Asan
No ratings yet
The Evolution of Language: A Simple Guide to Big Ideas
From Everand
The Evolution of Language: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
Introduction to X-Ray Powder Diffractometry
From Everand
Introduction to X-Ray Powder Diffractometry
Ron Jenkins
No ratings yet
Second Language Use Online and its Integration in Formal Language Learning: From Chatroom to Classroom
From Everand
Second Language Use Online and its Integration in Formal Language Learning: From Chatroom to Classroom
Andrew D. Moffat
No ratings yet
COMMUNICATION SYSTEMS
From Everand
COMMUNICATION SYSTEMS
B.P. Lathi
No ratings yet
Authoritative Guide to the Katalopsi Constructed Language
From Everand
Authoritative Guide to the Katalopsi Constructed Language
J. S. Ling
No ratings yet
Microhydrodynamics: Principles and Selected Applications
From Everand
Microhydrodynamics: Principles and Selected Applications
Sangtae Kim
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Subject Pronouns: in English Subject Pronouns I, You, He, She, It, We, They
No ratings yet
Subject Pronouns: in English Subject Pronouns I, You, He, She, It, We, They
2 pages
What Is A Pronoun?: Personal Pronouns
No ratings yet
What Is A Pronoun?: Personal Pronouns
12 pages
Lyrics Plural Nouns
No ratings yet
Lyrics Plural Nouns
1 page
A Syntactic Analysis of Ògbojú - Toyin U
No ratings yet
A Syntactic Analysis of Ògbojú - Toyin U
22 pages
01) Sentences Notes
No ratings yet
01) Sentences Notes
2 pages
MTB 2 Course Outline Lesson Log
100% (1)
MTB 2 Course Outline Lesson Log
5 pages
Superlative Adjectives
100% (2)
Superlative Adjectives
2 pages
Interactive
No ratings yet
Interactive
3 pages
First Certificate in English: Examination Report
No ratings yet
First Certificate in English: Examination Report
33 pages
Absuelve Traslado
No ratings yet
Absuelve Traslado
5 pages
Quantifiers Countableuncountable Nouns Grammar Drills Grammar Guides Information Gap Acti 89144
No ratings yet
Quantifiers Countableuncountable Nouns Grammar Drills Grammar Guides Information Gap Acti 89144
3 pages
ENGLESKI JEZIK - Maturska Pitanja
No ratings yet
ENGLESKI JEZIK - Maturska Pitanja
2 pages
2024-25 s2 1st Exam Ge_rg
No ratings yet
2024-25 s2 1st Exam Ge_rg
3 pages
Uas Gramar
No ratings yet
Uas Gramar
3 pages
List-Comparatives and Superlatives
No ratings yet
List-Comparatives and Superlatives
2 pages
Simple Past 0 English Grammar Rules Explanations PDF
No ratings yet
Simple Past 0 English Grammar Rules Explanations PDF
3 pages
Y3 Letter Writing Informal Model I Can Checklist
No ratings yet
Y3 Letter Writing Informal Model I Can Checklist
2 pages
Year 3 Spelling
No ratings yet
Year 3 Spelling
269 pages
Japanese Adverbs PDF
100% (1)
Japanese Adverbs PDF
3 pages
Sva Answers PDF
No ratings yet
Sva Answers PDF
3 pages
APA 6th Edition Citation
No ratings yet
APA 6th Edition Citation
6 pages
Unit 2 Life in The Countryside Lesson 2 A Closer Look 1
No ratings yet
Unit 2 Life in The Countryside Lesson 2 A Closer Look 1
22 pages
Have, Had, Has - 2
No ratings yet
Have, Had, Has - 2
9 pages
Grammar Games: 1. Parts of Speech
No ratings yet
Grammar Games: 1. Parts of Speech
5 pages
STEng
No ratings yet
STEng
216 pages
Unit 6 Book Grammar 2
No ratings yet
Unit 6 Book Grammar 2
1 page
The Present Continuous Tense Explanation
No ratings yet
The Present Continuous Tense Explanation
4 pages
Present Continuous or Present Simple Grammar Drills 24365
No ratings yet
Present Continuous or Present Simple Grammar Drills 24365
3 pages
Mff1 - The Date, Age and Birthdays - Classroom Objects
No ratings yet
Mff1 - The Date, Age and Birthdays - Classroom Objects
5 pages