Phonetics and Phonology in Multilingual Language Development
Phonetics and Phonology in Multilingual Language Development
PHONETICS
AND PHONOLOGY
IN MULTILINGUAL
LANGUAGE DEVELOPMENT
Ulrike Gut
University of Münster
Romana Kopečková
University of Münster
Christina Nelson
University of Münster
Shaftesbury Road, Cambridge CB2 8EA, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467
www.cambridge.org
Information on this title: www.cambridge.org/9781108994743
DOI: 10.1017/9781108992527
© Ulrike Gut, Romana Kopečková and Christina Nelson 2023
This publication is in copyright. Subject to statutory exception and to the provisions
of relevant collective licensing agreements, no reproduction of any part may take
place without the written permission of Cambridge University Press & Assessment.
First published 2023
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-99474-3 Paperback
ISSN 2634-1689 (online)
ISSN 2634-1670 (print)
Cambridge University Press & Assessment has no responsibility for the persistence
or accuracy of URLs for external or third-party internet websites referred to in this
publication and does not guarantee that any content on such websites is, or will
remain, accurate or appropriate.
Phonetics and Phonology in Multilingual Language
Development
Elements in Phonetics
DOI: 10.1017/9781108992527
First published online: July 2023
Ulrike Gut
University of Münster
Romana Kopečková
University of Münster
Christina Nelson
University of Münster
Author for correspondence: Ulrike Gut, [email protected]
1 Introduction 1
5 Results 30
6 Discussion 45
7 Conclusion 49
Appendix 51
References 55
Phonetics and Phonology in Multilingual Language Development 1
1 Introduction
Foreign language instructors and language students are commonly fascinated
by the differences between the speech of native and non-native speakers of
a language. They may have their intuitions, possibly stemming from their own
experiences with teaching and learning foreign languages, about the challenges
involved in perceiving and producing the sounds of a new language. The
scientific study of the development of perception and production of sounds in
a foreign language has a seventy-year-long tradition, and numerous theoretical
models as well as empirical research have yielded important insights into
various factors that can influence the learning process. In the first decades of
research into phonological acquisition of foreign languages, the language that
was learned was usually referred to as a second language (L2), which was used
as an umbrella term for all foreign languages no matter how many and when
they were learned. With the important insight that not only the native language
(L1) but also the other languages a speaker knows can influence the phono-
logical acquisition of a target language, a terminological distinction has been
proposed between a learner’s first foreign language (L2) and their further
foreign languages (L3). In this Element, we will refer to the chronologically
first learned foreign language as the L2 and to languages that are learned
chronologically later as the L3. L3 learners will also be referred to as multilin-
gual learners.
The observation that a learner’s multiple languages interact has been widely
documented in both speech perception and production. This phenomenon,
referred to as cross-linguistic influence (CLI), has been shown in the acquisition
of target sounds (e.g., Amengual et al., 2019; Cabrelli & Pichan, 2019; Gabriel
et al., 2016; Kopečková, 2014, 2016; Kopečková et al., 2016; Llama et al.,
2010; Llama & Cardoso, 2018; Llama & López-Morelos, 2016; Morales Reyes
et al., 2017; Patience, 2018; Wrembel, 2014, 2015) and prosody (e.g., Cabrelli
& Rothman, 2010; Gut, 2010) as well as phonological processes (e.g., Cebrian,
2000; Sato, 1984). Related findings have also revealed that such an influence is
multidirectional in nature, i.e., that it concerns not only the influence of an
earlier acquired language (whether L1 or L2) on the later acquired language but
also vice versa (e.g., Aoki & Nishihara, 2013; Beckmann, 2012; Cabrelli, 2016;
Cabrelli & Rothman, 2010; Sypiańska, 2016, 2017). Research into L3 phono-
logical learning has further drawn our attention to the possibility that more than
one language can act as the source of this influence, i.e., a simultaneous
(combined) influence of more than one language at the same time (see De
Angelis, 2007; Wrembel, 2015), critically distinguishing the learning constel-
lation of L2 learners from that of L3 learners. Multiple language knowledge
2 Phonetics
the phonetic and the phonological level, the latter model aspires to address both
the perception and the production development in an L2 at the phonetic level.
Both models were developed to predict the influence of an L1 onto an L2 and
have so far not been expanded to include potential influences of any additional
languages on the learner’s perceptual mappings of the target sounds.
Best’s Perceptual Assimilation Model (PAM: Best, 1995) and its adaptation
to L2 speech learning (PAM-L2: Best & Tyler, 2007; Tyler, 2019) propose that
the learner’s ability to perceive L2 sound contrasts depends on the degree to
which L2 sounds are assimilated to the exemplars of L1 sound categories. It
follows a direct realist or ecological approach to speech perception that argues
that listeners detect speech information in terms of articulatory gestures such as
the manner and place of articulation rather than on the basis of any mental
representation of sounds (Best, 1995; Fowler, 1986). In other words, the L2
learner engages in perceptually assimilating the articulatory gestures of target
sounds to the articulatory gestures of their L1 phonemes. A non-native sound
can be heard as a good or a poor example of an L1 phoneme (categorised), as
different from any particular L1 phoneme (uncategorised) or as a non-speech
sound (non-assimilated). Accordingly, different pairwise assimilation types
associated with different levels of L2 discrimination difficulty have been
proposed in the model (for a description of all assimilation types and their
predictions, see Tyler, 2021).
For example, both sounds of an L2 contrast can be judged as members of
a single L1 sound category. When both members of the non-native contrast are
phonetically and phonologically perceived as good or poor members of a single
L1 category (single-category assimilation), then the discrimination in the L2
will be very poor. An example for such a scenario for German learners of Polish
might be Polish /w/ and /v/ (as in łata /wata/ ‘patch’ and wata /vata/ ‘cotton
wool’), both of which may be perceived as members of the German category /v/
and therefore poorly discriminated at the beginning of L2 learning. However, if
one sound of the contrast is perceived as a much poorer member of the L1
category than the other (category-goodness assimilation), then discrimination
between these two non-native sounds may range from moderate to very good,
depending on how dissimilar the two non-native sounds are from the L1 sound
category (Shinohara et al., 2019). The PAM-L2 predicts that, with increasing L2
experience, new L2 categories can be formed as a result of the learner’s growing
L2 lexicon when noticing that there are minimally contrasting words in the
target language that are communicatively relevant. As will be explained in
Section 4, an interesting learning constellation arises in this connection for
German learners of Polish as an L3 and English as an L2. For these experienced
language learners, the /v-w/ contrast is shared in both of their foreign languages;
Phonetics and Phonology in Multilingual Language Development 5
learned to read the L2 and, in the absence of spoken input, applied their L1
phonology to a considerable amount of L2 words via orthography (Tyler, 2019,
p. 617). The presence of previous language exposure can therefore hamper the
prediction of category acquisition. In any case, it is conceivable that L2 phon-
emes included in the single-category assimilation type and the category-
goodness assimilation type will be even more difficult to acquire in the
classroom setting, whereas the acquisition of the uncategorised L2 phonemes
in turn may be supported in such a learning environment, especially where L2
orthography is present and provides an unambiguous distinction.
As stated, neither the PAM nor the PAM-L2 make specific predictions
regarding the development of speech perception by L3 learners, i.e., those
with prior foreign language exposure to additional languages. In one of the
rare studies that extended the predictions of the PAM-L2 to the context of
multilingual phonological acquisition, Wrembel et al. (2019) found that multi-
linguals assimilate L3 vowel sounds to both L1 and L2 categories with
a preference for the latter, at least in their population of L1 German, L2
English and L3 Polish learners. The study also showed that even beginner L3
learners formed new L3 categories, distinguishing between highly similar L3
sibilant pairs that would typically follow the single-category assimilation pat-
tern. In other words, in terms of perceptual acquisition, beginner L3 learners
were argued to behave similarly to advanced L2 learners in that they were able
to discriminate sound contrasts predicted by the PAM-L2 to be challenging.
Unlike the PAM/PAM-L2, which assumes that perceivers extract speech
information at a gestural/articulatory level, Flege’s Speech Learning Model
(SLM: Flege, 1995, 2002, 2003) and its revised version (SLM-r: Flege &
Bohn, 2021) assume that perceivers form a long-term mental representation
for target sounds from acoustic-phonetic cues. According to the SLM/SLM-r,
this is a learning mechanism that is involved in L1 speech learning and that
remains intact throughout life. Yet monolingual-like performance is not
expected on the part of L2 learners as their L1 and L2 categories are stored
in the common phonological space, inevitably influencing each other (Flege,
1995, 2002; referred to as ‘common phonetic space’ in Flege & Bohn, 2021).
This regularly leads to category assimilation in which L1 and L2 sounds of
the learner are perceptually equated; a target language category is blocked
and a composite or merged L1–L2 category is then used to perceive and
produce both the L2 and the L1 sounds, leading to bidirectional cross-
linguistic influence. Consequently, the model predicts that learners who
have a merged L1–L2 category will not display monolingual-like perform-
ance as they are likely to produce sounds differently from monolinguals of
either language.
Phonetics and Phonology in Multilingual Language Development 7
One of the main hypotheses of the SLM/SLM-r is that the greater the
perceived dissimilarity between L1 and L2 sounds, the greater the likelihood
that learners establish target-like categories and consequently perceive and
produce the target sounds more accurately. However, similar L2 sounds may
remain difficult to learn, as L2 learners might not be able to discern the subtle
phonetic differences between the L2 and L1 sounds and thus they will readily
map the target sounds onto L1 categories, perceiving and producing such L2
sounds in terms of those L1 categories. For example, German learners of Polish
are likely to easily detect the phonetic differences between the rhotic sounds of
their L1 and L2, the uvular fricative /ʁ/ and alveolar trill /r/, respectively.
Although the Polish rhotic may be initially difficult to produce, the likelihood
of L2 category formation for such a novel sound is predicted in the model to be
high. In contrast, the subtle phonetic differences and complex acoustic cues
applied in distinguishing between word-final voiced and voiceless obstruents
may be a particular learning challenge for German learners of English because
they can be expected to apply their voiceless L1 category (in this word-final
position) in perceiving and producing the target obstruents. It is to be noted in
this regard that the position of the target language categories within a word is
highly relevant, i.e., whether an individual sound occurs in the initial, medial or
final position. According to the model, it is at the level of position-sensitive
allophones that L1 and L2 categories are compared and refined with experience.
Similarly to the PAM-L2, the SLM/SLM-r assumes a continuous, slow
refinement of L2 learners’ perception of phonetic differences between their
L1 and L2 as a function of the quality and quantity of phonetic input obtained in
meaningful communication. For instance, Flege and Liu (2001) showed that, for
two groups of L2 learners who resided in the target language country for the
same amount of time (i.e., as measured by length of residence), it was only the
students and not the age-matched learners in other occupations who showed
improvement towards native-like performance. The authors argued that the
student group was exposed to a substantially greater amount of native-like L2
input, which enhanced their identification of word-final consonants.
In the original SLM, the ability to form new phonetic categories for an L2
sound was further predicted to be mediated by the age (of onset) of L2 speech
learning. Adult (late) L2 learners were predicted in the model to readily
subsume L2 sounds into L1 categories thanks to the assimilative power of
their well-established L1 sound categories, thus failing to form new categories
for some L2 sounds. Child (early) learners, in contrast, were predicted to be less
influenced by their evolving L1 sound system, and thereby more likely to form
separate categories for L2 sounds and thus to learn the target language sounds to
native-like levels. Considering the generality of this hypothesis and mainly the
8 Phonetics
universals (i.e., ‘if A then B’, but not the other way around), how frequently the
structure occurs across languages and the order in which children acquire the
structure in their L1.
The OPM proposes an initial dominant influence of the L1 in L2 speech
learning, the emergent influence of universals and finally the possibility of
target-like L2 speech production, especially for structures dissimilar to L1
and for less marked structures. Interestingly, Major (2001) claims that the
model is equally valid for multilinguals, for whom ‘L1’ can be replaced by
‘L1 plus L2/L3/Ln’. However, there is no proposal in the model for either the
mechanisms that trigger the influence of L1 (and L2) and universals or the role
of input in L2/L3 speech development.
At present, there is no model of speech learning from a multilingual perspec-
tive, largely because of our scarce knowledge about the concurrent phono-
logical development in a multilingual speaker’s different languages. The aim
of this Element is therefore to contribute to the investigation of multilingual
phonological development with new data, including all languages of the multi-
lingual speakers, both adolescents and adults, and in terms of both their speech
perception and production. We will trace the interaction of a range of phono-
logical features and of individual speakers’ languages with a view of offering
new insights into the L3 speech learning process.
the ‘traditional’ line of research with its focus on learner competence, the
development of the learners’ linguistic repertoire over time and the (universal)
stages and sequences of learning as having a macro perspective, while the ‘new’
approach focusses on the micro level. Research on the micro level concentrates
on specific situations of language use by a learner as well as the conditions and
factors that influence their language use in such situations, for example their
pronunciation of words when reading out a word list several times. It is assumed
that in every situation where language is used, language elements or skills might
be learned, which in turn contribute to the learner’s language development over
time. Hammarberg does not present the two approaches as exclusive but rather
as complementary and connected. He argues that it is the individual events of
language use that form the basis for evolving patterns and changes in the
linguistic repertoire. In other words, a learner’s language system develops
from repeated language use in specific situations, with the recurrent activation
of linguistic elements leading to their cognitive entrenchment, i.e., their being
stored in the speaker’s memory Thus, the more a specific word, phrase or
pronunciation is being used, the likelier it is to be remembered and used
again. Languages are thus pictured as adaptable systems that emerge from the
learner’s experiences in concrete events of language use.
another; for example, the development of the pronunciation of one sound might
support the accuracy of the pronunciation of another, and their development is
typically synchronous. By contrast, in competitive relationships between sub-
systems, development is asynchronous and growth in one subsystem will result
in the decline of another subsystem; for example, the perceptual accuracy of
a sound contrast in one language might increase alongside a temporary decline
in the perception of the same contrast in another language. A third option for
a relationship between different subsystems is that one subsystem needs to be in
place before another one can start developing, as proposed, for example, in
some theorising about the relationship between segmental perception and
production in L2 speech learning (Flege, 1995; Nagle, 2018).
Language development in the CDST framework is seen as dynamic, i.e., as
a changing complex system over time. Changes can be caused by both internal
and external forces, and changes in one subsystem always give rise to changes
in the other subsystems. Thus, a learner’s motivation to learn vocabulary, their
language learning aptitude, the time spent on revision as well as their know-
ledge of other languages all can influence the learning of new words in
a language, which in turn may affect the learner’s ability to form more complex
sentences in this language and boost the vocabulary learning of another lan-
guage. Different types of change are presumed to occur: some are continuous,
others discontinuous and often chaotic (van Dijk & van Geert, 2007). Due to the
system’s constant interaction with its environment as well as its internal
changes, the system is observed to never stop changing. Rather, it will continu-
ally reorganise itself, changing its form from one relatively stable preferred state
or ‘attractor state’ to another. In summary, in the CDST conceptualisation,
language development is non-linear and unpredictable: in a complex system
with numerous interacting subsystems, even the tiniest changes to one subsys-
tem can cause the entire system to change. Changes in the different subsystems
can occur slowly or ‘in leaps and bounds’ and they can conspire to move the
entire system into a reorganisation process. However, there seem to be preferred
states and directions for the system (de Bot & Larsen-Freeman, 2011, p. 15)
such as a seeming stagnation of learning or general trends in language develop-
ment. Moreover, due to the complex interaction of the numerous variables and
subsystems, it is difficult if not impossible to predict the exact shape and
direction in which the complex system will develop.
CDST stresses the crucial effect of the initial conditions, which may have far-
reaching effects on the processes of change of the system (de Bot & Larsen-
Freeman, 2011, p.10). This consideration is clearly most relevant for the context
of learning a new L3, where the initial condition will entail knowledge of both
the L1 and the L2. CDST does not, however, assume an endpoint of language
Phonetics and Phonology in Multilingual Language Development 15
development and therefore avoids the term ‘acquisition’ that implies this. In the
CDST framework, it is acknowledged that linguistic skills can always grow and
decline and that learners undergo continuous developmental processes. Thus,
even the language system of a highly advanced learner and the first language of
a speaker are not assumed to remain stable but they are likely to undergo
changes due to external forces and the interaction of their subsystems.
Furthermore, in the CDST approach to language development, inter- and
intra-learner variability is treated as meaningful rather than an issue to be
controlled for in order to find significant differences between conditions (de
Bot et al., 2007; Lowie, 2011). It is considered essential for a description of the
process of language development and a sign of self-organisation of the
language system. Variability across learners (inter-speaker variability) occurs
due to their different initial conditions such as prior linguistic knowledge, and
cognitive and affective factors as well as the different influences of external
and internal variables, making it impossible to generalise from the language
learning process of one learner to the processes of others. In the CDST
approach, intra-speaker variability is seen as a driving force of development
and as an indicator for ongoing developmental processes (Spoelman &
Verspoor, 2010). It is assumed that the degree of intra-speaker variability
can help to explain how the language system of a learner changes over time:
low intra-speaker variability (e.g., highly consistent pronunciation of a word
across multiple occasions) is taken to indicate that the speaker’s language
system is relatively stable and settled into an attractor state, while a high
degree of variability is interpreted as an indication that developmental
changes are currently taking place. If two distinct variability patterns or
variability ranges can be identified in the individual’s learning trajectory,
commonly marked by an unexpectedly large peak or spike at the moment of
the transition, the learner’s language system is understood to have moved into
a different level of development (van Dijk & van Geert, 2007; Larsen-
Freeman & Cameron, 2008). The CDST approach is interested in identifying
and describing the transformative transition between such phases to appreci-
ate the different shapes a developmental change can take.
In summary, CDST and the traditional linear model of language ‘acquisition’
differ in their focus in the following ways. In CDST,
the points in time when productions settled into an ‘attractor state’ reflected by
invariable pronunciations as well as differences across learners in terms of both
variability and the point when the attractor state was reached.
Similarly, Yu and Lowie (2019) adopted a dense data collection procedure
and recorded the oral English performance of Chinese learners of English ten
times in weekly intervals over ten weeks. They showed that while at the group
level the learners improved both their lexical and syntactic complexity as well
as accuracy in oral English, individual learners showed unique patterns of
development which were argued to be influenced by the initial condition of
their learning path.
Working within a CDST framework not only requires different methods in terms
of data collection but also necessitates the use of different statistical methods that
are able to detect and compare the learners’ individual developments on different
timescales as well as major shifts in their language system. Studies based on the
assumption of a linear developmental process and aiming at discovering general
trends usually employ mean scores and standard deviations and imply the normal
distribution of their data in the search of discovering the effects of individual factors
on the linguistic competence of learners and the developmental process. By
contrast, the research design of CDST-inspired studies yields longitudinal individ-
ual data collected at dense intervals in order to reveal any changes in language
development. (For a detailed overview of such methods, see Verspoor et al., 2011)
The moving min-max graph, for example, is a descriptive data analysis
technique that visualises a learner’s dynamic developmental process and high-
lights the intra-learner variability while keeping the raw data visible (van Dijk
et al., 2011). This method uses a fixed window, a time frame comprising, for
example, five data points that continuously moves ahead one data point. Each
consecutive window thus largely overlaps with the preceding one, differing
only in the first and last data point. For each of these windows, the maximum
and minimum values are calculated, displaying the bandwidth of the observed
scores. This method thus shows the amount of variation at each period of time in
relation to sudden developmental ‘jumps’ that might reflect the internal reorgan-
isation of the language system (van Dijk et al., 2011, p. 76).
Moreover, a moving correlation can be calculated to estimate the relationship
between two language subsystems. For this, like for the moving min-max tech-
nique, a time window is defined and the correlation coefficient is calculated
between two data sets for all points within this window. The window is then
moved along the data set by the next data point and the correlation coefficient is
calculated at every fixed time window. This technique shows whether develop-
ments of different features in a language system are synchronous or not and allows
the characterisation of such relationships as competitive or supportive.
18 Phonetics
A longitudinal study design was implemented to shed light on intra- and inter-
individual processes of the phonetic and phonological development of multilin-
gual speakers. A group of sixteen learners with a similar language profile
20 Phonetics
(L1 German, L2 English, L3 Polish) was recorded in all of their languages four
times (T1–T4) within the first year of L3 learning. In addition, in between T2
and T4, a small subset of these participants consisting of three learners recorded
themselves at home in their L2 and L3 every week for eighteen weeks (dense
data collection; DDC). All testing sessions were designed to elicit certain
phonetic and phonological features which were either shared by the languages
or not (see Section 4.1.4). This way, potential interactions between the language
subsystems over time can be revealed.
Figure 1 visualises the longitudinal data collection with four main group
testing times. It commenced a month after the start of L3 Polish lessons and
stretched over the first ten months of L3 learning (T1 at five weeks, T2 at ten
weeks, T3 at five months and T4 at ten months into L3 learning). Due to the
expectation that there would be less development in the learners’ L1 than in
their non-native languages, L1 production data was collected at T1 and T4 only,
along with other background measures such as the learners’ language attitudes
and cross-linguistic phonological awareness. In their L2 and L3, the participants
completed various production and perception tasks (see Section 4.1.3) at all four
group testing times. Between the third and seventh month of L3 learning, three
adults each participated in eighteen weekly dense testing sessions for which
they recorded themselves at home in their L2 and L3. Each week they also
provided information on their language learning activity and progress that
week. The general language learning profile of these three as well as the other
participants are described in the following section.
4.1.2 Participants
Production task 2: Picture naming (PN). In this task, the participants were
presented with a series of simple pictures in a PowerPoint presentation (one
picture per slide). They were selected from a range of categories that language
learners are typically exposed to in the earlier stages of language learning such
as colours, animals and foods. Concerning the L3 stimuli, the respective Polish
teachers had been consulted to assure that the vocabulary items had been
covered in class. This task was self-paced.
Perception task 2: ABX. While the forced-choice task was administered to test
the learners’ judgement of naturalness employing cross-language sound con-
trasts, ABX tasks test language-specific contrasts (Strange & Shafer, 2008).
Such timed discrimination tasks assess the learners’ ability to
distinguish between two minimally contrasting sounds within a language, as
for example /v/ and /w/ in English and Polish. Therefore, this perception task
was adopted for testing the phonemic contrast between /v/ and /w/ but not the
allophonic contrasts of the other three features investigated in the study. In the
ABX task, the participants listened to a pair of stimuli (AB) recorded by one
speaker. Subsequently, they heard a third stimulus (X) produced by a different
speaker which was either the same as the first (A) or the second one (B). The
participants then had to decide whether the final stimulus (X) was more like the
first (A) or the second one (B) by pressing a button on a button box (see Figure 2
for a visualisation of the task procedure). The trials were randomised so that the
minimal pairs included appeared in all possible combinations (ABA, ABB,
BAB, BAA). Like in the case of the FC task, the ABX task was administered
through E-Prime 2.0 measuring accuracy and response times. If no response
was logged within 3,000 ms after the last stimulus, the trial was coded as
incorrect, and the experiment automatically proceeded with the next trial.
Production tasks dense data collection. The weekly dense data collection
included a picture naming, a delayed repetition and a different free speech
production task in the L3 every week, with the free speech tasks having been
carefully compiled in line with current topics and vocabulary from the partici-
pants’ L3 classes. For the L2 session, the free speech element was the only task
since the learners were much more fluent in English and were expected to
Figure 2 ABX task procedure (adapted from Nelson, 2020; p. 38). ISI = inter-
stimulus interval, RW = response window.
Phonetics and Phonology in Multilingual Language Development 25
produce sufficient tokens of interest in this single task. To help the participants
to tune into the appropriate language mode, the free speech prompt was a short
video recording featuring the L2 research assistant (who also collected the L2
data at the group testing times, so they were familiar with the person and already
associated them with English). In the video, the research assistant briefly talked
about a specific topic and then asked an open-ended question related to that
topic which the participants were asked to respond to or talk about for about
one minute. Perception dense data in the L2 and L3 was also obtained, but it is
not included in our analyses here.
Tasks
(production; Realisation in Realisation in Realisation in
Feature perception) German (L1) (British) English (L2) Polish (L3)
/v-w/ contrast DR, PN, ST; only /v/ exists both /v/ and /w/ exist (but higher both /v/ and /w/ exist (but higher
ABX frequency of /w/ than /v/) frequency of /v/ than /w/)
Final obstruents DR, PN, ST; always devoices final typically maintains a distinction usually devoices final obstruents
(de)voicing FC obstruents between voiced and voiceless
final obstruents
Vowel reduction DR, PN, ST; some degree of vowel clear reduction of nearly every no vowel reduction
FC reduction in unstressed unstressed syllable
syllables
Rhotics DR, PN, ST; voiced uvular fricative /ʁ/; voiced postalveolar voiced alveolar trill /r/;
FC sometimes voiced uvular approximant /ɹ/ sometimes tap [ɾ] (the latter
trill [ʀ] especially in fast speech)
Phonetics and Phonology in Multilingual Language Development 27
The languages in the learners’ repertoire also differ in their realisation of coda
obstruents such as /b/, /d/, /g/ and /v/. While English retains a voicing contrast in
syllable-final positions (e.g., between bad vs. bat), final obstruents are always
devoiced in German and most of the time in Polish (Rubach, 1984; Smith et al.,
2007). Many L2 studies have reported learning difficulties of the phonological
process of syllable-final voicing by speakers of different L1 backgrounds (e.g.,
Cebrian, 2000; Dmitrieva et al., 2010), accounting for their results by the marked-
ness status of final obstruent voicing. This theorising is in line with such production-
based L2 speech learning models as the OPM (Major, 2001). The same outcome
would be predicted by the SLM/SLM-r (Flege, 1995; Flege & Bohn, 2021) for L1
German speakers perceiving and producing L2 English word-final obstruents, albeit
on the account of their use of L1-based cue-weighting regarding vowel duration
and/or closure voicing rather than due to the markedness of the phonological
process. In terms of L3 Polish final obstruent devoicing, L1 German speakers
would thus be predicted in the model to be facilitated in the learning of devoiced
word-final obstruents. The PAM/PAM-L2 (Best, 1995; Best & Tyler, 2007)
focusses on the perception of sound contrasts and does not make any predictions
about phonological processes.
4.2.4 Rhotics
Rhotics are realised distinctly in the three languages of the learners in this study.
Standard German mainly uses voiced uvular fricatives /ʁ/ as well as some
uvular trills [ʀ] (Kohler, 1999). In most varieties of English, rhotics are realised
as a postalveolar approximant /ɹ/ or a retroflex approximant [ɻ], which can
sometimes be labialised to [ɹʷ] (Roach, 2004). Polish has both voiced alveolar
trills /r/ and taps [ɾ] as free-variant allophones, the latter of which is primarily
used intervocalically and in fast speech (Jassem, 2003). Thus, according to the
SLM/SLM-r (Flege, 1995; Flege & Bohn, 2021), the new L2 and L3 rhotic
sounds should be relatively easy to learn for L1 German speakers, at least in
terms of perceptual discrimination, although alveolar trills, in particular, can be
articulatorily challenging sounds to acquire (Catford, 2001). The PAM/PAM-L2
(Best, 1995; Best & Tyler, 2007) does not make any predictions for the percep-
tion of individual sounds but rather focusses on sound contrasts only.
The learners’ ability to distinguish between the two sounds perceptually, tested in
the ABX task (n = 24 for L2 and n = 24 for L3), was calculated as a per cent
accuracy score for each learner and testing time. For the three production tasks, the
sounds (n = 14 for L1, n = 24 for L2 and n = 14 for L3) were auditorily analysed,
transcribed and coded for accuracy by the three phonetically trained raters. Due to
the nature of the free speech task and because not all tokens were produced in the
Phonetics and Phonology in Multilingual Language Development 29
picture naming task, some learners produced more or fewer tokens in total. The
report of the production results is limited to /w/ in the study, as this member of the
contrast was expected to be more likely to show developmental changes in the
multilinguals’ production (cf. Iverson et al., 2008).
The perception of final voiced and devoiced obstruents, tested in the FC task
(n = 13 for L2 and n = 16 for L3), was analysed in terms of accuracy percentage
for each multilingual learner at each testing time. For the production of (under-
lying) voiced final obstruents (n = 16 for L1, n = 17 for L2 and n = 16 for L3), as
tested in the three production tasks, three measures were analysed, mirroring the
method of previous studies on final obstruent (de)voicing (e.g., Smith et al.,
2007; Smith & Peterson, 2012):
learner and testing time. In the three production tasks (n = 17 for L1, n = 17 for L2
and n = 12 for L3), the ratio of vowel length in stressed versus unstressed syllables
was established. The duration of monophthongs (in ms) in stressed and unstressed
syllables of all correctly stressed words was marked using the onset and offset of
vowel formants as cues. The ratio was yielded by dividing the duration of an
unstressed vowel by the duration of the immediately preceding stressed vowel
within the same word (values of around 1 indicating same length of stressed and
unstressed vowels, values below 1 indicating reduction of unstressed vowels, and
values above 1 indicating longer duration of unstressed vowels). Moreover, it was
annotated for each word whether it was phrase-final or not, as unstressed vowels in
phrase-final words may be longer than those in non-final position within intonation
phrases. Both average production accuracy at each main testing time and change
point analyses of the weekly dense data were computed for this feature.
4.3.4 Rhotics
The learners’ perception of naturalness of the different rhotics in L2 and L3, tested
in the FC task (n = 10 for L2 and n = 10 for L3), was calculated as an accuracy score
for each learner and testing time. The production data (n = 10 for L1, n = 14 for L2
and n = 10 for L3) that was collected in the three tasks was analysed auditorily,
transcribing all instances of the learners’ rhotic sound realisations using a range of
IPA symbols, and analysed for accuracy. The rate of the production accuracy was
calculated in raw numbers as well as percentages for each learner.
5 Results
First, we present an overview of the whole data set for perception and produc-
tion results separately. Figure 3 displays the perception accuracy of the /v-w/
contrast, the rhotics, final voiced obstruents and vowel reduction by the sixteen
learners in both their L2 English and their L3 Polish at T1, T2, T3 and T4 (for
the features rhotics, final voiced obstruents and vowel reduction no data was
collected in L3 Polish at T1). Only the perception of the /v-w/ contrast was
elicited via an ABX task, while the perception of the remaining features
was elicited via an FC task. It shows that the perception accuracy varies across
the phonological/phonetic features and processes with the perception of the
rhotics having the highest and the perception of final voiced obstruents having
the lowest accuracy overall. Only for the perception of /r/ in L3 Polish and the
perception of final voiced obstruents in both L2 and L3 can differences between
the adolescents and the adult learners be observed. Overall, perception accuracy
of a particular phonological feature or process is very similar in both the
learners’ L2 and the L3. Variation across the individual speakers, however, is
Phonetics and Phonology in Multilingual Language Development 31
evident for all features except the rhotics, where perception accuracy is above
60 per cent in their L2 English for all learners.
Figure 4 shows the mean values of accurate production of /w/, the rhotics and
vowel reduction (= the mean vowel ratio) by the sixteen learners in both their L2
English and L3 Polish at T1, T2, T3 and T4 (only T2 and T4 for vowel reduction).
The production results for final voiced obstruents are not reported at this point as
this feature was analysed in the study in terms of three different duration parameters
rather than an accuracy score. Both /w/ and the rhotics show distinct patterns from
each other but appear similar across both languages and both age groups: while /w/
is produced fairly accurately in L2 English and L3 Polish by most adolescents and
32 Phonetics
Figure 4 Mean accuracy of production of /w/, the rhotics and mean vowel
duration ratio by the learners in their L2 English and L3 Polish. Values of around
1.0 for mean vowel duration ratio indicate the same length of stressed and
unstressed vowels. Adult learners are represented by dotted lines. The group
mean values are indicated by the bold line.
adults, the accuracy of the rhotics is much more diverse with individual learners at
ceiling level in both languages and others not producing them accurately at all. The
two age groups were also comparable in their production of vowel reduction in that
they showed some progress, in a target-like direction, with less reduced forms in
their L3 Polish than in their L2 English, as suggested by the score for mean vowel
duration ratio around 1.0 for L3 Polish. A high degree of inter-learner variability
was nevertheless also evident in the L3 Polish.
Here, the results for each of our hypotheses (see Section 3.4) are presented.
Figures 3 and 4 show that the group average values depicted in bold do not
model well the learning processes of the individual learners. In fact, not a single
one of the sixteen learners shows a development that matches the group mean
value in any of the phonetic or phonological features or two languages.
Phonetics and Phonology in Multilingual Language Development 33
H2. Learners’ variability at the micro level influences and correlates with their
long-term phonetic and phonological development.
Figure 5 Variability of rhotic production for all learners in their three languages
at T1, T2 and T4 (no L1 data is available for T2).
34 Phonetics
tap [ɾ] in Polish (see Table 1 in Section 4.2) – ROGI18’s production of the Polish
rhotic varies between [r], [ɾ] and non-target ‘German’ [ʁ]. Similarly, the three
multilinguals BISC14, MESC03 and SMSC15 all demonstrate a non-variable
realisation of the rhotic in their L1 German but exhibit some variability in the
production of the rhotic in both L2 English and L3 Polish. Yet, while BISC14
produces non-target-like rhotics such as [ʁ] in both English and Polish at all data
collection points and SMSC15 produces exclusively target-like rhotics in
English only at T2 and never in Polish, MESC03 only produces some non-
target-like rhotics in English at T2 and T4. For EDMÜ06, variability occurs in
all three of his languages at T1 but is restricted to L3 Polish at T2 and T4. Thus,
for five out of the sixteen multilinguals, production of the rhotic in one language
is stable, which points towards an attractor state of this subsystem during the ten
months of our study.
Another interesting observation concerns potential changes in some of the
learners’ L1. Six of the multilinguals produce the voiceless velar fricative [x]
or a combined [ʁ/x] sound as the German rhotic at T4. In fact, half of the
learners produced such sounds in their L3 Polish in the very early stages of
learning to attempt the trill [r] when they were not able to produce it at all or
not consistently. This non-acceptable realisation of rhotics in Polish, which is
used in their L1 German at T4, can thus tentatively be interpreted as a special
kind of L3-to-L1 influence, i.e., regressive CLI from an earlier production in
the L3 to the L1 at a later stage of learning. This example highlights the
complexity of cross-linguistic interaction, where an ‘L1-coloured’ sound from
the learners’ L3 repertoire can find its way back into their L1. Similarly, the
production of uvular trills [ʀ] in German increases from T1 to T4. This can
also be argued to be evidence for regressive CLI, as the learners produce more
of the L1 variant with the L3-like manner of articulation. Hence, on this
feature, these two languages seem to be more prone to mutual influence for
some of the learners with their distinct yet articulatorily more similar rhotics
than the English approximant [ɹ]. The L1 rhotic sounds (and other articulato-
rily similar L1 sounds) clearly seem to be the preferred basis for those
struggling with the Polish tap or trill, which in turn also seems to make their
L1 German more receptive to Polish-like sounds.
Figure 5 thus also shows highly individual relationships between variability
at the micro level, i.e., single realisations at one particular data collection point,
and long-term development at the macro level, i.e., a potentially stable state
after ten months of learning. Of those multilinguals that show target-like
pronunciation of the rhotic in L3 Polish at T4, three had target-like pronun-
ciations from T1 onwards (DIMO03, LYBO29, MESC03). SYLÜ08 pro-
duced both [r] and [ɾ] at T1, varied between [r], [ɾ] and [ʁ] at T2 and returned
Phonetics and Phonology in Multilingual Language Development 35
the change it was 0.97. This change occurred with a 95 per cent confidence and
level 4 was associated with it, indicating the importance of the change (the
lower the level, the higher the importance). ROGI18, in turn, showed no
significant changes in realising vowel reduction in L3 Polish but
a developmental shift in her L2 English instead. At around week 7, her vowel
reduction ratio dropped from 0.78 to 0.64, a change that occurred with
94 per cent confidence and was detected on a third pass of the data (level 3).
For both speakers, the identified developmental shifts exhibited a target-like
direction. Analyses of respective shifts as per standard deviation changes
yielded no significant results, which suggests that they were not preceded by
extreme levels of increased variability.
Figure 9 presents selected results for significant changes in the duration
parameters of final obstruent (underlying) voicing in three speakers in their
L2 English or L3 Polish. It shows that all three speakers changed their conson-
ant duration (rather than vowel or consonant burst duration that were also
measured) in realising final obstruents in their L2 English or L3 Polish. It is
notable that these shifts approach the expected target in that voiced realisations
are to be expected to show shorter consonant duration than devoiced realisations
(as manifested in REBA03’s production of shorter voiced final obstruents in
English, and SYLÜ08’s and ROGI18’s production of prolonged devoiced final
obstruents in Polish).
SYLÜ08 evidenced one significant shift in his realisation of L3 consonant
duration (with 99 per cent confidence, level 1 change) at around the same
time as his realisation of vowel reduction changed (see Figure 8), which
coincides with his move to Poland for a stay abroad. Interestingly, REBA03
evidenced two significant shifts in her production of L2 English consonants
in that she lengthened her word-final obstruents from around week 10 of the
data collection (97 per cent confidence, level 1 change) but five weeks later
returned to her original manner of realising the sounds (the average length
stabilising at 0.14 ms; 94 per cent confidence, level 2 change), which is
suggestive of the speaker’s preferred, ‘attractor’ length. The third speaker,
ROGI18, significantly changed her production of L3 Polish word-final
obstruents in terms of both consonant duration and burst duration at around
week 10 of the data collection (with 95 per cent and 97 per cent confidence,
respectively, both changes identified at level 1). An inspection of change
point analyses for standard deviation and the identified developmental shifts
indicated no significant results. There is thus little evidence in the present
dense data on vowel reduction and final obstruent (de)voicing for extreme
variability as an index of upcoming major developmental changes for the
three speakers.
Phonetics and Phonology in Multilingual Language Development 39
Figures 10 and 11 display the accuracy of producing both /w/ and the rhotics
across the four data points for each speaker. A clear interaction can be seen
between the development of the two sounds in their L2 English for the four
speakers shown on the top row, DIMO03, REBA03, JUEB20 and SMSC15.
Although on different levels of accuracy (e.g., ceiling performance for
DIMO03 for both sounds but relatively low accuracy for /ɹ/ produced by
JUEB20), the production of those two sounds for these four speakers show
parallel increases and decreases of accuracy from T1 to T4. No such relation-
ship could be found for the remaining twelve speakers, where the accuracy of
their L2 English /w/ and /ɹ/ production seems to develop independently,
40 Phonetics
Figure 10 Relationship between accuracy of producing /w/ (grey line) and /ɹ/
(black line) in L2 English for the sixteen speakers.
Figure 11 Relationship between accuracy of producing /w/ (grey line) and /r/
(black line) in L3 Polish for the sixteen speakers.
Phonetics and Phonology in Multilingual Language Development 41
though perhaps one might suggest some linkage between the two sounds for
SYLÜ08.
There is little evidence for these sixteen speakers exhibiting parallel devel-
opment in producing /w/ and /r/ in their L3 Polish (see Figure 11). Many
speakers never pronounce Polish /r/ correctly during the first ten months of
learning this language, especially those shown on the last row, SISC11,
CHSC09, REBA03 and EDMÜ06, and also JUEB20 on the top row, while
BISC14 and JOHA09 only produce one token correctly at T3. Some tendencies
for parallel developments of the production of the two sounds are evident for
DIMO03 from T3 and LYBO24 at T4, but in general the two sounds do not
develop together in the L3 Polish.
Tables 2–5 show how many individual learners evidenced different patterns
of a perception-production relationship for L2 and L3 rhotics and /w/ at each of
the main testing times, as well as across the testing times (the last column in the
tables). The categorisation of these patterns is based on the calculation of
within-subject effect size (Cohen’s d) for perception and production differences
in mean scores (d ≤ 0.7 = small to no effects, d ≥ 0.7 = medium effects and
d ≥ 1.0 = large effects; cf. Plonsky & Oswald, 2014). That is, a small within-
subject effect size was interpreted as suggesting no meaningful difference
between the production and perception scores and thus some evidence for a
co-evolution of the skills for a learner (‘perception = production’). Medium and
large within-subject effect sizes suggested a precedence of one skill over another
for a learner (either ‘perception > production’ or ‘production > perception’). If
one or both of the skills equalled or fell below a 50 per cent accuracy threshold,
this was interpreted as pointing at a ‘dissociation’ between a learner’s perception
L2 rhotic sounds
T1 T2 T3 T4 T1–T4
(n=16) (n=16) (n=13) (n=16)
perception = production 11 8 5 10 6
perception > production 4 3 2 2 0
dissociation 1 5 6 4 3
The last column excludes those participants who showed an inconsistent pattern of
perception-production development across the testing times.
42 Phonetics
L3 rhotic sounds
T1 T2 T3 T4 T2–T4
(no data) (n=16) (n=15) (n=16)
perception = production 6 6 6 6
perception > production 0 0 1 0
dissociation 10 9 9 9
The last column excludes those participants who showed an inconsistent pattern of
perception-production development across the testing times.
L2 /w/
T1 T2 T3 T4 T1–T4
(n=16) (n=16) (n=14) (n=16)
perception = production 13 12 11 13 10
dissociation 3 4 3 3 0
The last column excludes those participants who showed an inconsistent pattern of
perception-production development across the testing times.
L3 /w/
T1 T2 T3 T4 T1–T4
(n=16) (n=16) (n=15) (n=16)
perception = production 13 14 11 14 10
dissociation 3 2 4 2 1
The last column excludes those participants who showed an inconsistent pattern of
perception-production development across the testing times.
and production, suggesting lack of ability and hence little direct relationship. For
an overview of individual learner trajectories and perception-production relation-
ships, see the Appendix.
Phonetics and Phonology in Multilingual Language Development 43
6 Discussion
It was one aim of this study to describe the phonetic and phonological develop-
ment of multilingual speakers within the framework of CDST and to thereby
both test some of the main tenets of this theory and contribute to it. Our results
showed that, as hypothesised in H1, learning trends in phonetic and phono-
logical development based on group scores do not model well the learning
processes of individual learners. Confirming findings by van Dijk et al. (2011)
for the development of grammar, our data also showed, for each of the four
analysed phonetic and phonological features and for both modalities,
46 Phonetics
months and a period of dense data collection covering only eighteen weeks, our
data might not have been optimally suited for capturing long-term trends. Thus,
more research is needed to explore the relationship between variability at the
micro level and long-term developments.
Another aspect of variability and its relationship with the dynamicity of the
phonetic and phonological learning process that was explored in this study
concerns the hypothesis that extreme variability at one point in time signals
(upcoming) major developmental changes, as suggested by van Dijk and van
Geert (2007) and Evans and Larsen-Freeman (2020). No clear evidence for this
hypothesis could be found: while the analysis of the four observation times
spread across the first ten months of learning the L3 Polish showed that extreme
variability does occur, such as five different realisations of one speech sound at
one point in time, it is difficult to link them to major developmental changes. For
one learner, extreme variability preceded a restructuring of the L3 Polish
subsystem that then included realisations from the L1, L2 and L3; for another
learner, extreme variability was followed by a phase of reduced variability.
However, due to the fact that the data collection points were spread out over
several weeks or even months, which was especially true for the L1 data
collection, it is not possible to say what developments occurred between them
and what effect the extreme variability may have had. Yet, even in the dense
data that included data collected at weekly intervals, no clear evidence was
found for increased variability before developmental changes. Future research
will have to investigate whether this applies only to phonetic and phonological
development or whether this seriously questions the suggestion by van Dijk and
van Geert (2007) and Evans and Larsen-Freeman (2020) that high variability
indicates developmental changes. In this connection, it will be important in
future research to carefully consider the criteria that characterise patterns of
discontinuity in the development of a particular feature under scrutiny (cf. van
Dijk & van Geert, 2007). The present results suggest that extreme variability,
understood as an unexpectedly large, local peak in the data, is not a prerequisite
for major developmental changes.
This study traced the development of various phonetic and phonological
features and, with the fourth hypothesis, aimed at exploring whether these
develop in interaction with each other or rather independently. To this end,
the development of the pronunciation of /w/ and the rhotics was compared in the
learners’ L2 English and also their L3 Polish. The results suggest that the
learning of the pronunciation of these two speech sounds proceeds largely
independently and thus stands in contrast to findings by Yu and Lowie (2019),
who found an interaction between the development of oral complexity and
accuracy in their L1 Chinese learners’ L2 English that developed from an
48 Phonetics
7 Conclusion
We have shown that employing longitudinal and dense data with a focus on
individual learners contributes important insights for describing and under-
standing multilingual phonetic and phonological development and that this
approach can complement traditional, more product-oriented research designs.
The new methods presented in Section 3 and applied in our own study are more
suited to showing dynamic development of phonology and phonetics in terms of
the interaction between different features as well as the interplay of the different
50 Phonetics
L2 English rhotics
T1 T2 T3 T4
*Adult learners
52 Appendix
L3 Polish rhotics
T2 T3 T4
*Adult learners
Appendix 53
L2 English /w/
T1 T2 T3 T4
*Adult learners
54 Appendix
L3 Polish /w/
T1 T2 T3 T4
*Adult learners
References
Amengual, M., Meredith, L., & Panelli, T. (2019). Static and dynamic phonetic
interactions in the L2 and L3 acquisition of Japanese velar voiceless stops. In
S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the
19th International Congress of Phonetic Sciences (pp. 964–8). Australasian
Speech Science and Technology Association Inc.
Ankerstein, C. A., & Morschett, R. (2013). Do you hear what I hear?
A comparison of phoneme perception in native and Saarlandian German
nonnative speakers of English. Saarland Working Papers in Linguistics, 4,
1–8.
Antoniou, M., Liang, E., Ettlinger, M., & Wong, P. (2015). The bilingual
advantage in phonetic learning. Bilingualism: Language and Cognition, 18
(4), 683–95.
Aoki, R., & Nishihara, F. (2013). Sound feature interference between
two second languages: An expansion of the feature hypothesis to the multi-
lingual situation in SLA. In C. Cathcart, I.-H. Chen, G. Finley et al. (Eds.),
Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society
(pp. 18–32). Berkeley Linguistics Society.
Aoyama, K., Guion, S. G., Flege, J. E., Yamada, T., & Akahane-Yamada, R.
(2008). The first years in an L2-speaking environment: A comparison of
Japanese children and adults learning American English. IRAL: International
Review of Applied Linguistics in Language Teaching, 46(1), 61–90. https://
doi.org/10.1515/IRAL.2008.003.
Archibald, J. (2009). Phonological feature re-assembly and the importance of
phonetic cues. Second Language Research, 25(2), 231–3. https://doi.org/
10.1177/0267658308100284.
Beckmann, E. (2012). ‘A gup of dea, please’: Crosslinguistic influence in
English and German word-initial stops produced by German L3 learners of
Dutch. Dutch Journal of Applied Linguistics, 1(2), 252–62. https://doi.org/
10.1075/dujal.1.2.06bec.
Best, C. T. (1995). A direct realist view of cross-language speech perception. In
W. Strange (Ed.), Speech perception and linguistic experience: Issues in
cross-language research (pp. 171–204). York Press.
Best, C. T., & Tyler, M. D. (2007). Nonnative and second-language speech
perception: Commonalities and complementarities. In O.-S. Bohn &
M. J. Munro (Eds.), Language experience in second language speech learn-
ing: In honor of James Emil Flege (pp. 13–34). John Benjamins Publishing.
56 References
David Deterding
Universiti Brunei Darussalam
David Deterding is a Professor at Universiti Brunei Darussalam. His research has involved
the measurement of rhythm, description of the pronunciation of English in Singapore,
Brunei and China, and the phonetics of Austronesian languages such as Malay, Brunei
Malay, and Dusun.
Advisory Board
Bill Barry, Saarland University
Anne Cutler, Western Sydney University
Jette Hansen Edwards, Chinese University of Hong Kong
John Esling, University of Victoria
Ulrike Gut, Münster University
Jane Setter, Reading University
Marija Tabain, La Trobe University
Benjamin V. Tucker, University of Alberta
Weijing Zhou, Yangzhou University
Carlos Gussenhoven, Radboud University