0% found this document useful (0 votes)
14 views24 pages

Resolving Lexical Ambiguity With Sub-Phonemic Information: I Scream For Ice Cream

Uploaded by

hala Al-obeidey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Resolving Lexical Ambiguity With Sub-Phonemic Information: I Scream For Ice Cream

Uploaded by

hala Al-obeidey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

866870

research-article2019
LAS0010.1177/0023830919866870Language and SpeechLee et al.

Language
and Speech
Article

Language and Speech


2020, Vol. 63(3) 526­–549
I Scream for Ice Cream: © The Author(s) 2019
Article reuse guidelines:
Resolving Lexical Ambiguity sagepub.com/journals-permissions
DOI: 10.1177/0023830919866870
https://doi.org/10.1177/0023830919866870
with Sub-phonemic Information journals.sagepub.com/home/las

Yoonjeong Lee
Department of Head and Neck Surgery, University of California, USA

Elsi Kaiser
Louis Goldstein
Department of Linguistics, University of Southern California, USA

Abstract
This study uses a response mouse-tracking paradigm to examine the role of sub-phonemic
information in online lexical ambiguity resolution of continuous speech. We examine listeners’
sensitivity to the sub-phonemic information that is specific to the ambiguous internal open juncture
/s/-stop sequences in American English (e.g., “place kin” vs. “play skin”), that is, voice onset time
(VOT) indicating different degrees of aspiration (e.g., long VOT for “kin” vs. short VOT for
“skin”) in connected speech contexts. A cross-splicing method was used to create two-word
sequences (e.g., “place kin” or “play skin”) with matching VOTs (long for “kin”; short for “skin”)
or mismatching VOTs (short for “kin”; long for “skin”). Participants (n = 20) heard the two-word
sequences, while looking at computer displays with the second word in the left/right corner (“KIN”
and “SKIN”). Then, listeners’ click responses and mouse movement trajectories were recorded.
Click responses show significant effects of VOT manipulation, while mouse trajectories do not.
Our results show that stop-release information, whether temporal or spectral, can (mis)guide
listeners’ interpretation of the possible location of a word boundary between /s/ and a following
stop, even when other aspects in the acoustic signal (e.g., duration of /s/) point to the alternative
segmentation. Taken together, our results suggest that segmentation and lexical access are highly
attuned to bottom-up phonetic information; our results have implications for a model of spoken
language recognition with position-specific representations available at the prelexical level and also
allude to the possibility that detailed phonetic information may be stored in the listeners’ lexicons.

Keywords
Lexical ambiguity, internal open juncture, spoken word recognition, sub-phonemic information,
VOT

Corresponding author:
Yoonjeong Lee, Department of Head and Neck Surgery, University of California, Los Angeles, School of Medicine, 31-19
Rehabilitation Center, Los Angeles, CA 90095, USA.
Email: [email protected]
Lee et al. 527

1  Introduction
Finding when new words begin in casual speech is not a trivial task for listeners, as word bounda-
ries may not be reliably marked in connected speech. Identifying words in the continuous stream
of speech becomes particularly challenging in cases where listeners hear impressionistically
homophonous sound strings such as “ice cream” and “I scream” in the famous song titled “I
scream, you scream, we all scream for ice cream.”
Several studies investigated the question of whether listeners are able to disambiguate such
approximately homophonous sequences and found that in lexical segmentation listeners often effi-
ciently exploit available phonetic correlates of the juncture between words (Gårding, 1967; Lehiste,
1960; Shatzman & McQueen, 2006). In her seminal study, Lehiste (1960) identifies distinct tem-
poral and/or spectral properties of near-juncture segments that help listeners disambiguate various
internal open juncture examples in American English (e.g., “keeps ticking / keep sticking,” “grade
A / gray day,” “plump eye / plum pie,” etc.). The availability of such information to listeners raises
the possibility that word segmentation can take place early in the recognition process, such that
candidate lexical items that are consistent with the correct segmentation are strongly activated.
Subsequent work by Christophe and colleagues supports this hypothesis in both French adults
and 13-month-old American infants (Christophe, Peperkamp, Pallier, Block, & Mehler, 2004;
Gout, Christophe, & Morgan, 2004). Their results indicate that listeners interpret a phonological
phrase (PP) boundary—an intermediate intonational phrase often composed of one or more lexical
words—as a lexical word boundary in online lexical access tasks. This is perhaps not surprising.
Aside from syntactic information, there are several robust phonetic correlates to PP boundaries
such as pre-boundary lengthening and pitch accent (Beckman & Pierrehumbert, 1986; Wightman,
Shattuck-Hufnagel, Ostendorf, & Price, 1992). Such robust information seems to help listeners
identify the location of a lexical word boundary that coincides with a PP boundary. However, the
comparison made in the study with American infants (Gout et al., 2004) is limited only to the case
where the potential boundary between the two target syllables is a PP boundary, a word boundary
with clear prosodic edge demarcation, compared to the case of no lexical word boundary at all
(e.g., “. . . highest pay performs . . .” vs. “. . . biggest paper forms . . .,” respectively). A poten-
tially more revealing case is the situation in which the intervening word boundary between lexical
items is not marked by clear phonetic correlates of prosodic structure, since in this case the acoustic
support for segmentation is more local and allophonic, and therefore weaker than in the case of a
phrasal boundary.
The phonetic correlates of a lexical word boundary are often sub-phonemic/allophonic but play
a significant role in helping listeners resolve lexical ambiguity. Broadly speaking, one of the most
effective phonetic correlates of a word boundary (across its prosodic contexts), especially in a situ-
ation where two alternative readings of an ambiguous two-word sequence are available, is the
duration of the edge (i.e., marginal) segments (in English, Lehiste, 1960; Nakatani & Dukes, 1977;
in Swedish, Gårding, 1967; in Dutch, Quené, 1992; Shatzman & McQueen, 2006; among others).
While previous studies have made important contributions to our understanding of the role of dura-
tion of marginal segments in speech recognition, it is presently unknown to what extent listeners
are sensitive to sub-phonemic information in the signal. How subtle can such information be
and still be exploited by listeners to modulate the lexical competition process? Will listeners, for
example, be able to exploit very subtle allophonic differences that correspond to only sub-featural
information of a single segment?
In this study, we use the case of /s/-stop sequences such as “keeps ticking / keep sticking” as a
testbed for whether the low-level phonetic information resulting from an allophonic relation is
528 Language and Speech 63(3)

available to listeners in lexical segmentation. The English /s/-stop sequences simultaneously pre-
sent marginal (/s/#/t/ vs. #/st/) and segmental ([th] vs. [t] of /t/) allophones identified in Lehiste
(1960). For example, differences in temporal properties of all the marginal (edge) segments (i.e.,
durations of /s/, /t/) congruently help listeners identify correct lexical word boundary locations. An
edge segment (e.g., /s/) is always longer at the beginning than at the end of a word. However, the
difference in /t/ such as in “keeps # ticking” and “keep # sticking” (# indicates a word boundary)
reflects an allophonic relation, the difference in aspiration degree of /t/. The aspiration difference
does result in a voice onset time (VOT) durational difference; VOT is longer in word-initial posi-
tion (/s/#/t/) than in word-medial position (#/st/). However, this durational difference is not exactly
at the segment’s edge1—the closure duration is the edge-most acoustic segment. Lehiste reported
that the presence or absence of aspiration is the most consistent phonetic difference between /s/#/C/
versus #/sC/ sequences (C indicates a voiceless stop consonant) in English. She further speculated
that different levels of aspiration of the voiceless stop can be effectively used by the listeners to
disambiguate such contrastive internal open juncture phrases. Christie (1974) partly confirmed this
speculation, by showing that listeners correctly perceive a syllable boundary within a synthesized
nonce-word sequence /asta/ (i.e., /as.ta/ vs. /a.sta/) based on silence and aspiration information of
the stop.
Moreover, no prior study on listeners’ perception of internal open juncture sequences maximally
controlled for confounding factors such as syntactic, semantic, and prosodic sources. Thus, this
leaves open the possibility that the observed results might have not been due solely to the low-level
phonetic details explicitly confined to a word boundary (in the absence of any prosodic informa-
tion). The study presented here tests if listeners are sensitive to the local sub-phonemic information
of aspiration degree of stop consonants after /s/, when identifying real words in ambiguous seg-
mentation sequences (word boundary vs. no boundary) during online word recognition. If there are
two competing segmentation options that do not differ in their syntactic or semantic plausibility,
does sub-phonemic information (mis)guide lexical ambiguity resolution? We first recorded approx-
imately homophonous /s/-stop sequences in English (e.g., “place # kin” and “play # skin”) to see
how these sequences are phonetically realized in phrase-internal position. We then carried out a
listening task using a mouse-tracking paradigm (e.g., Spivey, Grosjean, & Knoblich, 2005), as no
prior work on the English internal open juncture recognition employed online measures during
speech processing. Our aim is to test whether listeners make use of aspiration information strictly
associated with a lexical word boundary (not coinciding with a prosodic phrasal boundary) when
building up the correct lexical representation of the spoken words in the ambiguous /s/-stop seg-
mentation sequences such as “place # kin” and “play # skin” in online segmentation. Our data will
shed light on our understanding on the graded decision process of lexical competition. In what
follows, we briefly discuss (a) why the ambiguous /s/-stop sequences are produced with different
degrees of aspiration in different contexts, and further (b) what aspect of connected speech can
potentially produce overlapping distributions of degree of aspiration (in particular VOT) in these
words.
The allophony that makes the voiceless stop unaspirated after /s/ can be understood as the by-
product of the coordination of the laryngeal abduction gesture during the production of the cluster
as characterized by Browman and Goldstein (1986). The critical articulatory constraint is that only
one single glottal gesture may be found at the word onset in English. For a word-initial stop, the
peak of the glottal opening gesture is coordinated with the release of the stop gesture, which is
responsible for a long VOT in the acoustic domain. In a #/sC/ onset cluster, however, the peak is
tightly aligned with the midpoint of the first member of the cluster, namely /s/. By the time the fol-
lowing stop is released, the glottis has already returned to an adducted position, and this results in
Lee et al. 529

a short VOT (hardly aspirated) for the stop. However, for the hetero-syllabic /s/-stop cluster that
spans a lexical word boundary (/s/#/C/), we would expect to have two distinct laryngeal gesture
peaks, as is indeed often observed in the literature (e.g., Yoshioka, Löfqvist, & Hirose, 1981). This
second and later laryngeal abduction allows for the longer VOT in the word-initial consonant.
While the domain of syllabification in English is generally assumed to be within a word (Kahn,
1976), this study considers resyllabification possibilities in continuous speech. Several studies
found rate-induced perceptual resyllabification in repetitive speech, in which listeners tend to per-
ceive a coda consonant in #/VC/# (e.g., “eep”) as an onset that is affiliated with the following
vowel of another #/VC/# (e.g., “pea”) when the syllables were repeated at fast speech rates (de
Jong, Lim, & Nagao, 2004; Stetson, 1951). Moreover, Munhall and Löfqvist (1992), with transil-
lumination and fiber-optic videos, observed rate-induced gestural overlap between the two sepa-
rate laryngeal gestures of the /s/#/C/ cluster such as in “Kiss Ted.” At slow rates, as expected, two
distinct glottal peaks were observed. However, at fast rates, a single smooth glottal opening gesture
was associated with the hetero-syllabic /s/-stop cluster. At intermediate rates, partially overlapping
glottal gestures were found. As such, in continuous speech resyllabification can occur even across
word boundaries. This type of variability in the speech signal resulting from overlapping gestures
might lead to challenges in segmentation for listeners.
Our study explicitly examines the case of internal open juncture sequences embedding English
/s/-stop sequences at a phrase-internal word boundary in connected speech. We first test if these
ambiguous sequences are realized distinctively from each other. According to Cho, Lee, and Kim
(2014), despite the possibility of resyllabification in phrase-medial position, the two ambiguous
sequences, /s/#/CV/ and #/sCV/ are temporally distinctive (pooled across different places of articu-
lation of the stop /p, t, k/). /s/ is longer word-initially (#/sCV/) than word-finally (/s/#/CV/). VOT
is longer in an underlyingly singleton onset (/s/#/CV/) than in a complex onset (#/sCV/). The
vowel is longer in #/sCV/ than in /s/#/CV/. Our production study will confirm the previous find-
ings by identifying what phonetic signatures are associated with different word boundary locations
and stop places. Phonetic correlates of different places of stop articulation have been well docu-
mented. The alveolar closure is shorter than those at other places; VOT gets progressively longer
from bilabials to velars (Byrd, 1993; Crystal & House, 1982; Lisker & Abramson, 1967; Luce &
Charles-Luce, 1985; among others). We will test whether our findings conform to general patterns
regardless of different word boundary locations (whether /s/#/CV/ or #/sCV/).
The perception study reported here tests if listeners are sensitive to VOT, which is part of the
sub-phonemic information associated with the stops after /s/, when identifying words in ambigu-
ous segmentation sequences. Prior work has found that gradient differences in VOT can affect
listeners’ lexical retrieval processes. For example, previous studies found listeners are sensitive to
within-category VOT information in a phonemic distinction between voiced stops /b, d, g/ and
voiceless stops /p, t, k/ (e.g., Andruski, Blumstein, & Burton, 1994; McMurray, Tanenhaus, &
Aslin, 2009). These findings suggest that low-level phonetic information affects lexical access,
and activation levels are graded. Building on this, the current study focuses on the role of VOT in
a situation where only a single stop is available, but more than one candidate word is activated.
Such situation clearly exists as in “place kin / play skin,” but we still do not know whether in con-
nected speech contexts without strong prosodic boundary correlates listeners can exploit local
sub-phonemic information such as VOT to segment these ambiguous sequences.2 Our investiga-
tion will further an understanding as to what degree listeners use phonetic information in online
word recognition.
Our study uses a mouse-tracking paradigm (e.g., Spivey et al., 2005) that records continuous
arm movements together with response choices and reaction time. While the results of response
choices indicate the final decision of listeners, the mouse trajectory measures allow us to observe
530 Language and Speech 63(3)

online effects of the decision process. With respect to the perception study, there are two possible
outcomes to consider. Previous reports have suggested that a PP boundary is perceived as a lexical
word boundary (Christophe et al., 2004; Gout et al., 2004). One possibility is that native speakers
of American English might not be able to disambiguate “place # kin” and “play # skin” when the
boundary between the words bears no strong prosodic effects. This might suggest that the low-
level phonetic properties demarcating a word boundary with no further prosodic information are
too subtle and thus not available for listeners to exploit in resolving lexical ambiguity.
The other possibility is that listeners are able to disambiguate between “place # kin” and “play
# skin” because the fine-grained phonetic details of these sequences, which are expected to be
systematically different even at a word boundary, are available to listeners. If this is the case, this
will allow us to further test the following question. Are listeners sensitive specifically to allo-
phonic, subsegmental information, that is, VOT differences? The experimental setup will let us
know if a mismatching local event (i.e., short VOTs for /s/#/CV/ and long VOTs for #/sCV/) biases
listeners’ lexical access of an ambiguous sequence or not. One possibility is that listeners might
ignore or not be sensitive to the manipulation of aspiration information of the stop consonant, when
the other phonetic information (likely durational) of the neighboring segments (e.g., a preceding
vowel, /s/, a following vowel) and the closure duration of the stop itself all point to the alternative
segmentation option. If this is the case, the local mismatch arising from inappropriate VOTs would
not affect listeners’ performance. Alternatively, however, if the VOT manipulation (among the
other phonetic aspects) is actively employed in distinguishing stop-initial versus /s/-initial words
in speech perception, listeners will show sensitivity to the mismatching VOTs that disconfirm other
interpretations of the signal.

2  Experiment 1: Production data


2.1 Experimental stimuli and elicitation
We tested English intervocalic /s/-stop sequences. The critical sequences were composed of a
voiceless fricative /s/ followed by a bilabial, alveolar or velar voiceless stop consonant (/p, t, k/).
Each critical consonant cluster was embedded in a two-word sequence in which a lexical word
boundary was placed either between /s/ and the following stop (e.g., “place # kin”) or before /s/
(e.g., “play # skin”), so that /s/ and the following stop could form a complex onset of the second
word. All elicited words were monosyllabic. For each /s/-stop sequence with the three places of
articulation, three item pairs with contrasting word segmentations were chosen, giving a total of
nine pairs (18 items). The full set of pairs is reported in Table 1.
To minimize possible confounding variables that might affect listeners’ perception of the target
word, the two-word sequences were carefully selected based on the following criteria: (a) spelling
parallelism between stop-initial and /s/-initial words, (b) target word frequency (e.g., frequency of
“kin”) and frequency difference between contrastive words (e.g., Is “kin” as frequent as “skin”?),
and (c) part of speech. We consider each criterion below:
From a full set of real words that start with either a stop onset or a /s/-stop onset, we narrowed
down the list of candidate target words based on spelling. The spelling between stop-initial words
(e.g., “KIN” presented in upper case to listeners) and /s/-initial words (e.g., “SKIN”) was kept con-
stant, differing only in having ‘s’ as the first letter of the word or not (e.g., “SKIN” vs. “KIN,”
“STOOL” vs. “TOOL,” “STAB” vs. “TAB”). This step excluded contrastive pairs like “KALE” and
“SCALE” or “TAIL” and “STALE.” Within the reduced set, we selected words with high raw fre-
quencies based on the Corpus of Contemporary American English (COCA, Davies, 2008). To
minimize the chance of potential frequency asymmetries between contrastive words (e.g., “kin” vs.
Lee et al. 531

Table 1.  A list of two-word pairs with /s/#/C/ and #/sC/ test sequences. (The second word is the target
word).

/sp/ base # pam bay # spam


lace # pan lay # span
lease # pine lee # spine
/st/ trace # tab tray # stab
piece # tag pea # stag
fleece # tool flea # stool
/sk/ grace # cope gray # scope
niece # coop knee # scoop
place # kin play # skin

“skin”) affecting lexical access (Luce & Pisoni, 1998; Marslen-Wilson, 1990), we then constructed
the critical pairs so that the contrastive items had comparable frequencies. For this step, each of the
raw word frequencies was converted into a value per million words, which is a conventional nor-
malization measure using a common base (i.e., per million words) in corpus linguistics. The nor-
malized word frequencies ranged from less than 1 to 98 per million (median = 7 per million). The
average frequency difference between target items within each pair was 22 observations per mil-
lion, ranging from a difference of only 2 (“tab” vs. “stab”) to a difference of 92 (“tool” vs. “stool”).
For most items, the difference in frequency was less than 30 occurrences per million (median = 12
per million). Parts of speech within each pair were matched, such that all the words we used were
nouns or could be clearly interpreted as nouns. These four criteria were applied to finalizing the
target word list. Of course, it was impossible to come up with a perfectly balanced set of word
pairs, but among the possible pairs in English, the best set of nine pairs was selected (see Table 1).
The word that preceded the target word (the first word in each pair, see Table 1) was selected
based on the same criteria. However, we allowed slightly more spelling variation. This is because
in our perception study (Experiment 2), participants only ever saw the second word written on the
screen; they heard both the first and the second word, but the first word was not shown on the
screen. Crucially, the vowel in the first word was always either /i/ or /eɪ/ minimizing any effects of
the syllable affiliation of the following consonant (/s/) on vowel quality.
After these winnowing procedures for the target word and the preceding word, word association
strengths in the proposed two-word sequences were finally checked using the Edinburgh Associative
Thesaurus (EAT, Kiss, Armstrong, Milroy, & Piper, 1973). It is critical to make sure there is no
semantic relatedness between the first (e.g., “play” or “place”) and the second words (e.g., “skin”
and “kin”), as this could also contribute to listeners’ behavior in the listening task (Experiment 2).
The association strength between the two words (e.g., between “play” and “skin” and between
“place” and “kin”) was always 0, indicating that all pairs that were selected were semantically
unassociated with each other.
For the stimulus elicitation, the critical two-word sequences were embedded in a specifically
designed carrier sentence, to ensure that they did not carry potentially confounding prosodic infor-
mation. In particular, it was important that speakers produce the two-word sequences without any
rich phonetic properties that arise from prosodically strong environments, such as accentuation or
the proximity of a major prosodic juncture.
With this in mind, we created a carrier frame with the following structure: “To say ‘blue [two-
word sequence]’ with SAM is going to be DIFFICULT. But to say ‘blue [two-word sequence]’ with
ME is going to be EASY.” For example, to elicit “place kin,” speakers were asked to say: “To say
‘blue place kin’ with SAM is going to be DIFFICULT. But to say ‘blue place kin’ with ME is going
532 Language and Speech 63(3)

to be EASY.” The production we are interested in is the second repetition of ‘blue place kin’ in the
second sentence.
In our carrier frames, the critical two-word sequence was always placed after the word “blue,”
and the speakers were instructed to treat the word “blue” and the two-word sequence as members
of a single chunk. This was orthographically signaled by putting the three words in single quotes
(e.g., ‘blue place kin’). The structure of the carrier frame means that speakers automatically put a
major prosodic boundary right before the verb “is” in the second sentence, in the location shown
here with ‘|’: “To say ‘blue place kin’ with SAM is going to be DIFFICULT. But to say ‘blue place
kin’ with ME | is going to be EASY.” As a consequence of phrasing, speakers produced a prosodic
phrase-internal word boundary between words in single quotes (e.g., a word boundary between
“blue” and “place,” and between “place” and “kin”).
In order to avoid any accentuation falling on the target pairs, contrastive focus occurred else-
where in the sentence, as indicated by the bold capitals in (a, b).

(a)  To say ‘blue place kin’ with SAM is going to be DIFFICULT.


(b)  But to say ‘blue place kin’ with ME is going to be EASY.

As mentioned earlier, we are interested in the second production of the critical sequence in sen-
tence because the repeated item (in b), which is old information, rarely gets prominence.
In addition to the 36 target items, 36 filler items were created. Like target items, filler items
were also two-word sequences. Three different types of fillers were used: (a) sequences where one
of the words was a member of a homophone pair (e.g., “time” vs. “thyme”), (b) sequences where
one of the words was a member of an onset-cohort pair (e.g., “candy” vs. “candle”), and (c)
sequences where one of the words was a member of a rhyme-cohort pair (e.g., “baker” vs.
“shaker”). Each filler group had six pairs.
We created two versions of each filler item, with two levels of semantic bias (strong bias vs.
weak bias) coming from the first word. Strong-bias fillers were two-word sequences where the first
word was in some way related to the upcoming second word (e.g., “clock time” or “herb thyme”).
Weak-bias fillers were two-word sequences with little bias coming from the first word (e.g., “cook-
ing time” or “cooking thyme”). All participants in the listening task (Experiment 2) were presented
with equal numbers of weak-bias fillers and strong-bias fillers. This was done to encourage partici-
pants to attend to both the first and the second word of the two-word sequence. Recall that the
actual mouse-click task only required participants to click on the written form of the second word
and could thus encourage people to only attend to the second word in the sequence. The strong-bias
fillers were included to counteract potential biases in this direction.
Fillers were placed in the same carrier sentence as targets.

2.2 Subjects and recording


Four native speakers of American English, two males and two females, participated in the record-
ing session. All speakers were from California, and the recordings were made in the USC Phonetics
Laboratory.
In order to collect enough sentences for the listening test, two repetition blocks were created
for each speaker. Specifically, for each repetition, we first randomized the presentation order of
items separately for targets and fillers. Then, we made sure that any closely related items (e.g.,
“play skin” and “place kin” for targets; “time” and “thyme” for fillers) did not appear consecu-
tively. After this pseudo-randomization step, targets and fillers were placed in an alternating order
(i.e., . . .filler-target-filler-target. . .).
Lee et al. 533

For the target words, a total of 144 tokens were collected and analyzed (3 item pairs x 2 word
boundaries x 3 stop places x 4 speakers x 2 repetitions).

2.3 Acoustic measurement variables


There were four segments of interest in the two-word sequence such as “play # skin” or “place #
kin.”

a)  the vowel that occurs right before the fricative /s/, which we will refer to as the pre-bound-
ary vowel (V1);
b)  the fricative /s/;
c)  the voiceless stop consonant; for example, /k/ in “play skin” and “place kin,” which we will
denote with C;
d)  the vowel that occurs right after the voiceless stop, which we will refer to as the post-
boundary vowel (V2).

Durations of the segments in the /V1sCV2/ were measured to see how the temporal structure of
the two ambiguous sequences (e.g., “play skin” vs. “place kin”) would be modified under different
word boundary conditions (/s/#/C/ and #/sC/) and stop place conditions (/p, t, k/). Crucially, what
we wanted to test was whether the acoustic consequences of different place of articulation of the
stop consonants interact with the location of the word boundary. If we find significant interactions,
we will need to consider those when analyzing the data from the perception study.
V1 duration was measured from the onset to the offset of the voicing of the vowel. Frication
duration of /s/ was measured by looking at the fricative noise indicated by high frequency noise
around 7000-8000 Hz in the spectrogram and visually confirmed by the presence of aperiodic
noise in the waveform. Acoustic measures for the following stop (/p, t, k/) included stop closure
duration and VOT. Stop closure duration was measured from the cessation of the preceding ape-
riodic noises for the preceding /s/ to the beginning of the stop burst seen in the spectrogram, indi-
cated by a silence seen in the waveform. With regard to VOT, in most cases there was a spike
indicating the burst of noise that occurred when the stop closure was released. VOT was measured
from the release burst to the onset of vocalic (periodic) voicing. In some cases, however, the inten-
sity of a stop burst was so low that there was no evidence for a clear spike in the spectrogram. In
such cases, VOT was measured from the onset of semi-random noises following a silence. In addi-
tion, the presence or absence of release bursts was recorded for each condition. The entire /s/-
stop cluster duration was also measured to see if there is any change in the distributional patterns
of /s/, stop closure and VOT influenced by the critical factors in this study. V2 duration was
measured from the onset to the offset of the voicing for the vowel after the /s/-stop cluster.
Segmentation boundaries were always aligned to zero crossing points in the waveform.

2.4 Statistical analyses


For the continuous variables, a statistical evaluation of the critical factors was made based on linear
mixed-effects regression models using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015)
in the statistical environment R (R Core Team, 2018). We first fit fully-crossed models with both
by-subject and by-item adjustments to the slopes and intercepts and then used model comparison
to identify the maximal model supported by the data, so that only random effects that contributed
significantly to the model were included (Baayen, Davidson, & Bates, 2008).
The fixed effects considered were (a) Word boundary (/s/#/C/ vs. #/sC/) and (b) Stop place (/p/
vs. /t/ vs. /k/), as well as their interactions. To evaluate the effects of Stop place with a three-way
534 Language and Speech 63(3)

distinction, Stop place was coded as a binary factor in which level comparisons were made between
(1) /p/ (coded as 2/3) and non-/p/s (i.e., /t/ = -1/3 and /k/ = -1/3) and (2) /t/ and /k/ (coded as /t/ =
1/2, /k/ = -1/2, /p/ = 0). With respect to the VOT burst occurrence, in which the outcome variables
were categorical (binary coding with 1 and 0), mixed-effects logistic regressions were conducted
(also from the lme4 R-package).
To evaluate the significance of fixed effects, several mixed-effects models were first built in a
stepwise manner from a maximal model, and these nested models were compared using likelihood-
ratio tests of significance. For post-hoc comparisons, the Tukey test from the emmeans package
(Lenth, Singmann, Love, Buerkner, & Herve, 2018) was used to correct for multiple comparisons.
In all cases, p-values less than .05 were considered significant.

2.5 Results
The main effects of Word boundary and Stop place on acoustic measures are provided in Figures 1
and 2.
V1 duration: Figure 1(a) shows the mean duration of V1 as a function of Word boundary, that
is, whether the vowel is in a closed syllable (e.g., “place,” /V1s/#) or open syllable (“play,”
/V1/#/s/)—collapsing across the three stops (/p/, /t/, and /k/). We find a significant main effect of
Word boundary presence/absence on V1 duration (χ2 = 11.59, df = 1, p < .001): V1 is shorter
when it is in the closed syllable (e.g., “place,” /V1s/#/C/) than in the open syllable (e.g., “play,”
/V1/#/sC/). There is no main effect of Stop place and no interaction between Stop place and Word
boundary.
/s/ duration: With respect to /s/ duration, there is a significant main effect of Word boundary
(χ2 = 6.09, df = 1, p < .05). As shown in Figure 1(c), /s/ is longer in word-initial position (#/sC/)
than in word-final position (/s/#/C/). Stop place has a significant main effect on /s/ duration
(χ2 = 6.46, df = 2, p < .05: /p/ < /t/ & /k/, /t/ = /k/). As shown in Figure 2(a), /s/ is shorter before
/p/ than before /t/ or /k/. Crucially, there is no interaction between factors.
Stop closure duration: Word boundary shows no significant main effect on stop closure
duration (Figure 1(d)). There is a significant main effect of Stop place on stop closure duration
(χ2 = 18.03, df = 2, p < .001: /p/ > /t/ & /k/, /k/ > /t/). Closure duration is the longest for /p/,
intermediate for /k/, and shortest for /t/ (Figure 2(b)). No interaction between predictors is found.
VOT: With respect to VOT for the stops, there is a significant main effect of Word boundary
(χ2 = 11.92, df = 1, p < .001). As shown in Figure 1(e), VOTs are significantly shorter for stops
in #/sC/ than in /s/#/C/. A significant main effect of Stop place reveals that VOTs are significantly
shorter for bilabial stops than for alveolar or velar stops (χ2 = 14.76, df = 2, p < .001: /p/ < /t/ &
/k/, /t/ = /k/, Figure 2(c)). There is a significant interaction (χ2 = 7.06, df = 2, p < .05). This inter-
action is due to the fact that the effect of Word boundary is larger in /k/ (/s/#/k/ > #/sk/: mean diff.
39 ms) than in /t/ (/s/#/t/ > #/st/: mean diff. 31 ms) (figure not given).
Release burst occurrence: With respect to burst occurrence during stop release, there is a sig-
nificant main effect of Word boundary (χ2 = 8.21, df = 1, p < .005) and no effect of Stop place.
Figure 1(f) shows release burst occurrences in different test conditions (visually presented in per-
centage). Stops in the stop-initial word (69 observations out of 72) are more likely to be produced
with VOT accompanied by a clear burst spike at the beginning compared to those in the /s/-initial
word (57 out of 72). That is, stop consonants produced in /s/#/C/ are associated with release bursts
significantly more often than those in #/sC/. There is no interaction between predictors.
/s/-stop cluster duration: Word boundary shows a significant main effect on /s/-stop cluster
duration (χ2 = 33.02, df = 1, p < .001; figure not given). The sum of /s/ duration, stop closure
duration, and VOT is longer in /s/#/C/ than in #/sC/. We find no effect of Stop place or interaction
between predictors.
Lee et al. 535

Figure 1.  Main effects of Word boundary on (a) V1 duration, (b) V2 duration, (c) /s/ duration, (d) stop
closure duration, (e) VOT, and (f) release burst occurrence, pooled across speakers and stop place
conditions (The error bars show +1 SE. *sig.).

V2 duration: Figure 1(b) shows the mean duration of the post-boundary vowel, V2. There is a
significant main effect of Word boundary on the post-boundary vowel (χ2 = 5.13, df = 1, p < .05),
showing that V2 duration is shorter after a singleton onset #/C/ than after a complex onset #/sC/.
Stop place shows a significant main effect on V2 duration (χ2 = 67.7, df = 2, p < .001; /p/ > /t/
& /k/, /t/ > /k/). As shown in Figure 2(d), the vowel duration is generally shorter after /sk/ com-
pared to after /sp/ or after /st/. Again, no interaction between predictors was found.

2.6 Discussion
Our results show that segments in the approximately homophonous /s/-stop sequences are distinc-
tively realized as a function of word boundary location even in a potentially neutralizing context.
First, both the pre-boundary vowel (V1) and post-boundary vowel (V2) are differently realized
in different lexical word boundary locations. V1 is shorter in a closed syllable (/V1s/#/C/) than in
536 Language and Speech 63(3)

Figure 2.  Main effects of Stop place on (a) /s/ duration, (b) stop closure duration, (c) VOT, and (d) V2
duration, pooled across speakers and word boundary conditions (The error bars show +1 SE. *sig.).

an open syllable (/V1/#/sC/), which can be explained by closed syllable vowel shortening (e.g.,
Fowler, 1981; Maddieson, 1984; Munhall, Fowler, Hawkins, & Saltzman, 1992). V2 duration is
longer after a complex onset (#/sCV2/) than after a singleton onset (/s/#/CV2/).
Second, /s/ is longer word-initially (e.g., “play skin”) than word-finally (e.g., “place kin”). This
finding is in accordance with the word-initial strengthening often observed in the literature (e.g.,
Keating, Wright, & Zhang, 1999; Byrd, 1996, for articulatory evidence), suggesting that the lexical
word boundary between /s/#/C/ is still preserved in the prosodically weak context.
The main reason for looking at effects of Stop place was to see if it would interact with the loca-
tion of the word boundary. We find a significant interaction between Word boundary and Stop
place in VOT measure, which is due to a greater VOT difference between /s/#/C/ and #/sC/ for /k/
than for /t/. However, there are no interactions found with other duration measures.
We do find main effects of the place of articulation of the stop consonant with many measure-
ments: The results show significant effects of Stop place in the production of the stop, which are in
agreement with general patterns reported in the literature (e.g., Byrd, 1993; Crystal & House, 1982;
Lisker & Abramson, 1967; Luce & Charles-Luce, 1985). The stop closure duration is the longest
for /p/, intermediate for /k/, and shortest for /t/. A trade-off relation among members of the /s/-stop
cluster is also observed with VOTs, showing significantly shorter duration for /p/ than for /t/ or /k/.
As regards the length of the post-boundary vowel, it is shorter after /sk/ compared to after /sp/ or
after /st/. As can be seen in Table 1, this is perhaps due to the fact that the vowels used in /sp/- and
/st/-items are always followed by voiced consonants, whereas the vowels in /st/ conditions are
often followed by a voiceless stop (e.g., Luce & Charles-Luce, 1985; Mack, 1982).
Finally, with respect to the stop consonant, even though there is no significant effect of Word
boundary on closure duration, VOT is longer in /s/#/C/ than in #/sC/. It is worth mentioning that
Lee et al. 537

VOTs in /s/#/C/ are still too long (mean 55 ms) to be categorized as unaspirated even in running
speech. This might suggest that, at least with these speakers, there is not much blending between
the two laryngeal gestures (i.e., no evidence of syllable re-affiliation), one coordinated with /s/
gesture and the other one specified with the stop gesture in /s/#/C/ (cf. Munhall & Löfqvist, 1992).
Moreover, the entire cluster duration (/s/-duration + stop closure duration + VOT) is longer for
/s/#/C/ than #/sC/. Recall that /s/ duration is much longer in #/sC/ than in /s/#/C/ (mean diff. 17 ms),
while there is no difference in closure duration. This suggests that VOT significantly contributes to
the entire cluster duration (#/sC/ < /s/#/C/: mean diff. 33 ms). In addition to temporal distinction,
we identify another phonetic property that might have contributed to the significant /s/#/C/ and
#/sC/ distinction. The word-initial stops (e.g., [k] in “kin”) are more likely to be produced with a
clear burst spike at the beginning of the stop release compared to stops in the /s/-initial words (e.g.,
[k] in “skin”). (cf. /t/ is always produced with VOT accompanying burst spikes in both /s/#/t/ and
#/st/.) However, it should be noted that the speakers were not recorded with a head-mounted micro-
phone, suggesting that there could have been linguistically unrelated variability in burst measure.
In short, the production study results confirm that underlying syllable structures are differenti-
ated in detailed temporal patterns even in the potentially ambiguous phrase-internal condition. The
distinct phonetic characteristics of edge segments might be expected to help listeners with lexical
segmentation of otherwise ambiguous sequences such as “play # skin” versus “place # kin” (see
e.g., for English, Lehiste, 1960; for Dutch, Shatzman & McQueen, 2006; for Swedish, Gårding,
1967). As expected, the most robust distinction is made with VOT modulations—that is, shorter for
a stop in #/sC/ than in /s/#/C/.

3  Experiment 2: Perception data


Our perception experiment investigates (a) whether the phonetic correlates of a lexical word
boundary with no further prosodic information are strong enough to guide listeners in segmenting
the ambiguous /s/-stop sequences; (b) whether specifically the allophonic, subsegmental informa-
tion provided by VOT can bias listeners during the task of word segmentation, even in presence of
competing temporal information; and (c) whether the complexity in decision making of segmenta-
tion is reflected in online arm movement, as measured by mouse-tracking.
To be more specific, to test the role of the phonetic details that are strictly local to a word
boundary, listeners were exposed to (a) the ambiguous two-word sequences produced with no
accentuation or major prosodic boundary. Listeners were also exposed to (b) a conflicting piece
of subsegmental information in the target sequences. That is, we used cross-splicing to create a
test condition to test whether listeners would correctly detect the location of a word boundary
when there is a mismatching VOT while the other temporal information points to the alternative
segmentation option. To examine (c) the online lexical competition process, we used a mouse-
tracking paradigm. Mouse-tracking allows us to measure listeners’ final response choices,
response times, and online arm movements. A potential inhibitory effect coming from a phono-
logical competitor during lexical access can be assessed by all three measures (e.g., Spivey et al.,
2005). Along with inhibitory behaviors indicated by an increase in reaction time and error rates in
the responses, the online arm movement trajectories might show a continuous attraction towards
a competitor.

3.1 Splicing method


For each target word with varying locations of word boundary, two spliced conditions were cre-
ated. Splicing was always done with within-speaker and within-stop-place tokens produced in
538 Language and Speech 63(3)

Figure 3.  An example of /sk/ test words with different splice types and word boundaries.

different repetitions. We first extracted the ambiguous sequences (e.g., “place kin” or “play skin”)
from the carrier sentences. These carriers (e.g., “place # []in”), to which the two versions of a stop
(e.g., short [k] or long [k]) were added, were physically the same across both spliced conditions. In
one version (henceforth, cross-spliced), VOTs for the stop in /s/#/C/ syllables were replaced by
mismatching VOTs taken from #/sC/ tokens. This cross-splicing method will allow us to test
whether listeners, when asked to indicate how they perceive the second word in the sequence (e.g.,
“kin” or “skin”), use allophonic aspiration degree. For example, VOT for /k/ in “place kin” was
replaced by the relatively short VOT taken from “play skin.” Only VOTs were spliced and replaced;
everything else in the signal remained identical across spliced conditions. The same logic for the
splicing procedure was applied to #/sC/ syllables (e.g., “play skin”).
In the other version (henceforth, identity-spliced), VOTs for the stop in /s/#/C/ syllables were
replaced by VOTs taken from another /s/#/C/ token. For example, VOT for /k/ in “place kin” was
replaced by that from another “place kin.” This was done in order to avoid a potential confound of
having one condition spliced and the other un-spliced. Therefore, each item pair had four possible
splice conditions; an example of splicing with schematic illustrations of the /sC/ cluster (i.e., /s/
duration + closure duration + VOT) is given in Figure 3.
No listeners made any comments about the audio files being in any way unnatural in the debrief-
ing session.
Four Latin square designed lists were created. Each list had 108 experimental trials consisting
of 36 targets (2 splice types x 2 word boundaries x 3 stop places x 3 item pairs) and 72 fillers (3
filler types x 4 bias conditions x 3 items x 2 repetitions) evenly distributed into four within-list
blocks. All four lists had the identical number of tokens of each experimental condition spoken by
four speakers from Experiment 1. The trial order for targets and fillers was pseudo-randomized.
Fillers were placed in the same position across lists. With respect to target items, a total of 720 data
points was collected and analyzed (20 listeners x 36 targets).

3.2 Participants, perception study procedure


Twenty native speakers of American English from the University of Southern California commu-
nity participated in the perception study. They were paid for their participation and were naïve as
to the purpose of this study. Participants were tested and monitored individually in the USC
Phonetics Laboratory. They were all right-handed.
We used the MouseTracker software (Freeman & Ambady, 2010) to record listeners’ behaviors
during the trials. Each trial had the following procedure. The participants were seated with
Lee et al. 539

Figure 4.  (a) An example screen display for the audio “play skin” in the identity-splice condition. (b) The
red line exemplifies an actual mouse path; the black straight line refers to the ideal mouse path of a length
of 1.8 X; the dotted line indicates the maximum deviation (MD) measurement; the red shaded area under
the red line is the under the curve (AUC) measure of a trajectory (not shown to participants).

headphones and a wired computer mouse. Two response boxes depicting two orthographic words
(e.g., “KIN” and “SKIN”) were presented in the upper left and right corners of a computer screen
(see Figure 4). The software automatically normalizes the task space, which converts the screen to
a standard x and y coordinate space in pixels (X)—a 2 x 1.5 rectangle. The top left corner of the
screen corresponds to [-1.0, 1.5], and the bottom right corner corresponds to [1.0, 0.0], leaving the
start location of the mouse (the bottom center) with the coordinates, [0.0, 0.0]. An example display
of an experimental trial is shown in Figure 4(a). Because the same set of words appeared multiple
times, the locations of the two response choices were counter-balanced (e.g., “KIN” and “SKIN”
vs. “SKIN” and “KIN”). The participants were instructed to click the “START” box in the bottom
center of the screen, when they were ready to begin a trial.
As soon as “START” was clicked, the spoken two-word sequence words started to play. In this
mouse-tracking task, listeners always heard two-word sequences on both target and filler trials that
had been extracted from the carrier frames elicited during the production study (Experiment 1).
After clicking the “START” button, participants were instructed to start moving the mouse right
away to click on the second word of the two-word sequences they heard (e.g., click on “KIN” if
they heard “kin” of “place kin”). In other words, their task in this forced-choice word monitoring
task was to click on the second word of the two-word sequence as quickly as possible.

3.3 Mouse-tracking measurements


Five types of measurements were used from the mouse-tracking experiment to test our predictions
about listeners’ sensitivity to low-level phonetic information. We measured (a) response choices
between the intended target and its competitor, (b) reaction time from click to click, (c) mouse-
movement-initiation time, (d) velocity/acceleration measures, and (e) spatial attraction of a mouse
movement trajectory. In what follows, we describe how these measurement types were computed
along with their relations to our predictions.
Participants’ response choices were recorded based on which response box they clicked on
(the intended target or the competitor). Recall that our two-word audio stimuli were original,
unprocessed recordings of /s/-stop words with either matching (identity-spliced) or mismatching
540 Language and Speech 63(3)

(cross-spliced) VOT portions. Listeners’ click responses made based on the original audio files
were recorded as the intended target choice. For example, after hearing the audio file “place [k]in”
(either with long or short VOTs), clicking on “KIN” was counted as a target response, whereas
clicking on “SKIN” was counted as a competitor response.
If listeners are sensitive to low-level phonetic information present at a word boundary with no
prosodic correlates when disambiguating /s/#/C/ from #/sC/, then listening to identity-spliced
(with matching/appropriate VOTs) files would lead to mostly clicking on the intended target words.
In addition, if listeners are especially sensitive to VOT, which reflects the allophonic relation of the
edge segment(s), listening to cross-spliced files would increase the proportion of clicks on com-
petitor responses. If listeners’ perception of the ambiguous /s/-stop sequences largely depends on
the local manipulation of allophonic information, that is, VOT, we might be also able to observe
graded online effects of mismatching VOT pulling the arm movement in the competitor’s direc-
tion. This would result in slower reaction time, longer mouse-movement initiation time, slower
velocity, and more deviated mouse movement trajectories in the cross-spliced condition than in the
identity-spliced condition.
Raw reaction time was measured from the time point of clicking the “START” button to the
next click on the response box. Data points that were more than three standard deviations away
from the mean reaction time within a single listener were excluded from the analysis. The trim-
ming affected less than 0.9% of the data. In order to test whether the complexity in the listener’s
decision is reflected early on—that is, immediately after the beginning of the audio stimuli, the
movement initiation time of the mouse was measured from the time point of clicking the “START”
button to the start of arm movement of the listeners. Additionally, values of the mouse-movement
peak velocity and maximum acceleration were measured to see if the decision complexity would
be reflected in the kinematic measures.
For the spatial attraction measures, each trajectory was visually inspected. We found one
mouse trajectory that was hovering over the middle of the screen, indicating a pause in move-
ment, and removed it from the analysis (1 out of 720 data points). Each mouse movement trajec-
tory was time-normalized to have 101 time-steps, and each time step had a corresponding x and
y coordinate. As each recorded trajectory tended to have a different length, time normalization
was required for spatial attraction measures to permit averaging across multiple trials with dif-
ferent numbers of coordinate pairs. We computed two measures for spatial attraction (following
Freeman & Ambady, 2010; see Figure 4(b) for an illustration of these spatial measures.). One is
the maximum deviation (MD) of a trajectory. A straight line between each trajectory’s start
and end points was first computed. Among all the normalized time steps, the largest perpendicu-
lar deviation between this ideal straight line (length = 1.8 X) and the actual trajectory was cal-
culated. Thus, the higher the MD, the more the trajectory deviated toward the unselected
alternative. The MD values of the trajectories that went under the ideal trajectory (20% of the
data) were coded as 0, indicating no attraction toward the unselected response. The area under
the curve (AUC) of a trajectory, that is, the geometric area between the actual trajectory and
the idealized trajectory, was also calculated. The AUC values of those trajectories with 0 MD
values were also coded as 0.

3.4 Statistical analyses


The basic structure of significance testing is the same as in the elicited stimuli analysis in Experiment
1: we fit fully-crossed models with by-subject and by-item adjustments to the slopes and intercepts
and used model comparison to identify the maximal model supported by the data. Only random
effects that contributed significantly to the model were included. The fixed effect predictors
Lee et al. 541

Figure 5.  Main effect of Splice type on response choices (*sig.).

included Splice type (identity-splice vs. cross-splice), Word boundary (/s/#/C/ vs. #/sC/), and Stop
place (/p/ vs. /t/ & /k/ [coded as /p/ = 2/3, /t/ = -1/3, /k/ = -1/3] and /t/ vs. /k/ [coded as /p/ = 0,
/t/ = 1/2, /k/ = -1/2]) and were evaluated in nested model comparisons using likelihood-ratio tests.
Linear mixed-effects models were run over the continuous mouse trajectory measures (i.e., reac-
tion time, initiation time, velocity/acceleration, MD, & AUC measures). For the response choice
measure (binary coding with 1 and 0), logistic mixed-effects regression models were conducted.
These mixed-effects models were performed using the lme4 package (Bates et al., 2015) and the
Tukey tests for post-hoc model comparisons were conducted using the emmeans package (Lenth
et al., 2018) in R (R Core Team, 2018). Additionally a sign test was conducted on the level of par-
ticipants to test whether a participant prefers “target” over “competitor.” Again, effects were judged
as significant if p <.05.

3.5 Results
Click responses: There is a significant main effect of Splice type on which word participants
clicked on, which indicates how they perceive the second (potentially ambiguous) word that they
heard (χ2 = 8.89, df = 1, p < .005). Listeners choose the competitor over the intended target more
often in the cross-spliced condition than in the identity-spliced condition (visualized in percentage
in Figure 5).
Rates of selecting the competitor response in the cross-spliced condition are near chance (56%).
Given that there were only two possible outcomes (target or competitor), a sign test was addition-
ally carried out to test whether the number of the competitor choices is significantly different from
50%. As the sign test p-value turns out to be greater than .05, this does not reject the null hypothesis
that probability of choosing competitor responses in the cross-spliced condition is equal to .5
(number of subjects who preferred competitor choices = 13, number of subjects = 20, p = .26).
As expected, the probability of choosing target responses in the identity-spliced condition is sig-
nificantly above chance (number of subjects who preferred target responses = 16, number of
subjects = 20, p < .05).
Although there is no main effect of Word boundary or Stop place, there is a significant Word
boundary and Stop place interaction (χ2 = 41.04, df = 2, p < .001; figure not given). The post-hoc
542 Language and Speech 63(3)

Figure 6.  Splice type x Word boundary interaction on reaction time (The error bars show +1 SE).

analysis reveals that the Word boundary effect is only significant in conditions with /k/. Listeners
tend to choose the target 24% more often when they hear #/sk/ (e.g., “play skin”) than /s/#/k/ (e.g.,
“place kin”). It might be the case that different parsing strategies are used for /k/ than for /p/ or /t/.
Crucially, however, there is no further interaction with Splice type, so this interaction will not be
discussed further.
Reaction time: Overall, we find no main effects of Splice type, Word boundary, or Stop place
on reaction time (time to click). There is a significant interaction between Splice type and Word
boundary (χ2 = 4.45, df = 1, p < .05). However, the post-hoc Tukey test reveals no significant
difference in any pairwise comparisons. This interaction is probably due to the opposite direction
in different Word boundary conditions (Figure 6).
Initiation time: For the mouse-movement initiation time, we find no main effects of Splice
type, Word boundary, and Stop place, and no between-predictor interaction.
Velocity, acceleration: For the kinematic measures of the mouse trajectories, no predictors
show significant main effects, and no interaction between predictors is found.
Mouse trajectory measurements: Neither of the spatial attraction measures—the maximum
deviation (MD) or the area under the curve (AUC) of a trajectory—shows effects of any of the
three predictors. For maximum deviation, there is no difference in the amount of deviation toward
the unselected alternative between identity-spliced and cross-spliced conditions (Splice type),
between /s/#/C/ and #/sC/ (Word boundary), or between different stop places (Stop place).
In line with what is observed with maximum deviation, there are no significant main effects of
the predictors on the area under the curve measure. There is no between-predictor interaction.
Response-contingent analyses: Given the significance of Splice type on click responses, we
conducted a separate analysis to closely evaluate the null effect of Splice type on the trajectory
measures. Although our audio stimuli are inherently ambiguous and therefore, difficult to process,
one possibility is that the lack of a strong effect of Splice type on the trajectory measures might
stem from pooling data points from both the intended target and competitor click responses. Thus,
we additionally analyzed only that subset of data where listeners clicked on the intended target
responses. The results confirm that there is no effect of Splice type on the five mouse trajectory
measures, reaction time, initiation time, velocity/acceleration, AUC, and MD.
Additional exploratory analyses: A further analysis was conducted to examine whether the
stop release itself, without a burst occurring in the signal, contributes to listeners’ behaviors. We
looked at a subset of the “bay spam–base pam” and “lee spine–lease pine” pairs (120 click
responses), in which there are no bursts in the signal. The intended target is chosen significantly
more often in the identity-spliced condition (73%) than in the cross-spliced condition (48%)
(z = -2.57, p < .05). No significant effect is found with any mouse-trajectory measures.
Lee et al. 543

In sum, the results are straightforward. When asked to click on what they heard, listeners
choose the intended target over the competitor significantly more often in the identity-spliced
(matching VOTs) condition than in the cross-spliced (mismatching VOTs) condition. The sign test
results show that the participants reliably choose target responses in the identity-spliced condition.
More importantly, when VOT is cross-spliced (i.e., mismatching), the intended target responses are
no longer favored over the competitor responses. For example, after hearing the audio file “place
[k]in,” participants clicked on “KIN” and “SKIN” almost equally when the [k] had short VOTs.
This points to listeners being sensitive to the local VOT information. However, some discrepancy
in click response patterns is observed with different stop items. None of the dynamic mouse trajec-
tory measures shows sensitivity to different splice type conditions.

4  General discussion
By using a response mouse-tracking method, this study examines the role of sub-phonemic infor-
mation in online lexical ambiguity resolution of continuous speech. The experiment uses the case
of ambiguous /s/-stop sequences in English (e.g., “play skin” / “place kin”) as a testbed for whether
listeners are able to detect the very subtle phonetic consequence of allophony, which corresponds
to a subpart of a single segment. Our experimental stimuli are designed to present a context where
there are two competing segmentation options that do not differ in their syntactic or semantic plau-
sibility and are prosodically controlled so that only the phonetic details strictly confined to a word
boundary are available. This type of thorough stimulus control was not present in prior work, and
the methodological improvement of our study offers an opportunity to test to what extent listeners
are sensitive to purely sub-phonemic information in the signal to modulate the lexical competition
process. Moreover, given that online measures of lexical access for this type of data (internal open
juncture sequences) in English were previously not available, our mouse-tracking experiment
examines the complexity in decision making of such ambiguous sequences. Our investigation of
online word segmentation provides novel evidence for (a) the role of phonetic correlates of a word
boundary in spoken word recognition, (b) the listener’s sensitivity to the fine-grained subsegmen-
tal information and the resulting reinterpretation of word boundary location, and (c) the overall
difficulty in decision making of highly ambiguous sequences in connected speech.
The acoustic analysis of the elicited /s/-stop stimuli confirms that both marginal (edge) and seg-
mental allophones of /s/-stop sequences are distinctively realized even when the adjacent word
boundary is as small as a phrase-internal word boundary. The first hypothesis we tested was that the
phonetic information distinguishing “play skin” from “place kin” would be effectively used by listen-
ers in connected speech, even at word boundaries. The hypothesis was supported. Our study shows
that listeners in general are able to distinguish the ambiguous /s/-stop sequences fairly well (67%
response rate for the intended target words in the identity-spliced condition), even when these
sequences were produced in a phrase-internal unaccented position, where the acoustic support for
segmentation is only local and allophonic. In the identity-spliced condition (splicing from the same
word boundary condition), the sub-phonemic information of aspiration degree in the stimulus dove-
tails well with the other available information in the signal, helping listeners identify a lexical item
with the intended segmentation. For example, when listeners heard “place kin” in the identity-spliced
condition, the available duration information including the reasonably long VOT activates “kin”
more strongly than its competitor “skin.” The duration information altogether helps listeners place a
word boundary in the correct position—that is, after /s/ in “place”—as the audio stimuli unfolds, as
reflected in the dominant responses of the intended target in the identity-spliced condition. The same
pattern obtains for short VOT in “skin” in “play skin.” This supports the hypothesis that listeners are
sensitive to the subtle allophonic and durational correlates of a word boundary during online word
544 Language and Speech 63(3)

recognition and confirms that segmentation and lexical access are highly attuned to bottom-up pho-
netic information (e.g., Andruski et al., 1994; Blazej & Cohen-Goldberg, 2015; Dahan, Magnuson,
Tanenhaus, & Hogan, 2001; Davis, Marslen-Wilson, & Gaskell, 2002; Gow, 2002; McMurray et al.,
2009; Salverda, Dahan, & McQueen, 2003; Shatzman & McQueen, 2006).
The next hypothesis we tested was that listeners would specifically make use of sub-segmental
information, that is, degree of aspiration (possibly among other sources of information), when
disambiguating contrastive internal open juncture phrases like “place kin” and “play skin.” Here is
where the cross-spliced condition becomes critical. In that condition, VOT for /k/ in “place kin”
was replaced by the relatively short VOT taken from “play skin.” And the same was done with
#/sC/ tokens (e.g., “play skin”). The hypothesis was supported, as our results show that when lis-
teners hear the cross-spliced “place kin” with relatively short VOTs (or “play skin” with relatively
long VOTs), they do not distinguish between the intended target “place kin” and the competitor
“play skin.” (It is worth recalling that it was only VOTs that were spliced and replaced. Everything
else in the signal remained exactly the same.) This strongly suggests that stop release information
indeed biases listeners’ segmentation. Put differently, even though the other phonetic correlates are
consistent with the intended target, when listeners hear a mismatching VOT that ran counter to that
segmentation, the intended word boundary location is no longer preferred. Previous studies have
shown the role of VOT in perceiving phonemic distinctions between voiced and voiceless stops in
English (e.g., Andruski et al., 1994; McMurray et al., 2009). Our study extends this finding to the
case where that fine-grained stop release information directly activates/deactivates lexical com-
petitors: English listeners make use of the allophonic information, which is smaller than a segment
(aspiration degree of the stop after /s/ as reflected in VOT), that they apply to word segmentation
(“place kin” vs. “play skin”).
Before we discuss the theoretical implications of our findings, what still remains unclear is
whether the observed effects stem solely from the VOT differences in /s/#/C/ and in #/sC/. Recall
that stops in /s/#/C/ are associated with bursts significantly more often than those in #/sC/. When
splicing was carried out, burst spikes were also included in the VOT portions. It might be the case
that together with VOT, the durational property of stop release, a sharp increase in noise energy
that is aligned with the endpoint of a silence aid listeners’ lexical access in the ambiguous /s/-stop
sequences. A further analysis that we conducted including only those tokens with no visible bursts
in the signal suggests that VOT itself can contribute to listeners’ behaviors. In the matching VOT
condition, listeners favor the target responses over the competitors, while in the mismatching VOT
condition that the disharmonious VOT inhibits the intended parsing. This at first glance seems to
suggest that the perceptual effect largely comes from temporal information. However, there could
be other spectral information in the release burst or during the release period that signals the com-
position of the syllable onset. In any case, our findings suggest that both temporal and spectral
aspects of stop release may bias listener perception.
Our findings clearly constrain models of spoken-word recognition that incorporate prelexical
representations that convert the low-level signal into some abstract form prior to lexical access
(e.g., TRACE: McClelland & Elman, 1986; Shortlist: Norris, 1994; Norris, McQueen, Cutler, &
Butterfield, 1997; and the distributed cohort model: Gaskell & Marslen-Wilson, 1997). Rather than
assuming phonemic prelexical representations, our results are better accounted for by an amended
version of such models, in which low-level position-specific representations are available at the
prelexical level (see McQueen, Dahan, & Cutler, 2003, for discussion). On this account, listeners
successfully access a specific lexical item when hearing both segmental and prosodic information
(here, phrase-internal word boundary and syllable structures) appropriately aligned for the right
parsing (e.g., Shatzman & McQueen, 2006; Soto-Faraco, Sebastián-Gallés, & Cutler, 2001;
Spinelli, McQueen, & Cutler, 2003).
Lee et al. 545

Alternatively, it could also be that listeners have stored instances of stop release information,
whether temporal or spectral, for #/C/- or #/sC/-initial words. In this type of exemplar-based model
(Goldinger, 1996, 1998), stored exemplars for #/C/ have longer VOTs often accompanied by
release bursts compared to exemplars for #/sC/. More broadly, our results imply that this kind of
information may be stored in the listeners’ lexicons. What listeners retrieve may be gestural repre-
sentations (Browman & Goldstein, 1986), and/or it might also be the case that listeners make lexi-
cal judgments based on previously stored instances of fine-grained phonetic representations,
whether gestural or otherwise (e.g., Goldinger, 1996; or see Creel & Tumlin, 2011; Hawkins,
2003). The data presented here are insufficient to favor one account over the others, and further
examination of distinguishing the alternative accounts remains for future work.
Additionally, the broader relevance of our findings regarding the role of subtle sub-featural
information of a single segment in lexical access could extend beyond English and beyond the
sequences we tested here. While our hypotheses specifically formulated for English are not rele-
vant to languages where the aspiration difference is not the primary information (e.g., Dutch) in
distinguishing the ambiguous /s/-stop sequences, other languages that exhibit the similar allo-
phonic relation to English /s/-stop sequences could provide further empirical supports. Moreover,
our findings could extend to other cases of English open juncture sequences, some of which are
arguably allophonic (“night rate” vs. “nitrate”) and some not (“no notion” vs. “known ocean”).
Despite the fact that there are other multiple phonetic correlates of a word boundary, listeners
show some specific sensitivity to the misaligned information coming from the cross-spliced stop
releases. As click responses show significant effects of manipulation sensitivity, one might wonder
if the other incompatible phonetic aspects, such as /s/ duration information that suggests different
segmentation from VOT, would add additional complexity to listeners’ online mouse movement.
This possibility is not supported given the null effect in the reaction time measure. The lack of fine-
grained temporal dynamics in the mouse-movement may be due to the nature of the measure,
which can be affected both by online processing and by post-perceptual decision processes. It
might be the case that the listeners had already made their minds up before they initiated the mouse
movements. This possibility is supported by the null effect of splice condition on mouse movement
initiation time. Our use of the prosodically controlled target phrases that were somewhat unfamiliar
compounds, which may hinder or delay online lexical access, might have caused this.
Moreover, no mouse-movement trajectory measures show significant effects of different splic-
ing conditions. This may seem surprising, given that mouse movements are often regarded as more
sensitive than eye movements (Spivey et al., 2005). One possible explanation for the null effect on
dynamic measures might arise from the very short durations of our audio stimuli. However, previ-
ous studies with short stimuli that are comparable in duration to ours have found decision complex-
ity reflected in reaction time and/or mouse trajectory measures (Bartolotti & Marian, 2012; Blazej
& Cohen-Goldberg, 2015). The major difference between our stimuli and the ones used in Bartolotti
and Marian (2013) and Blazej and Cohen-Goldberg (2015) is the structure of lexical competition
between items. In these studies, the two competing items are designed to have a clear disambigua-
tion point (e.g., disyllabic words with a phonemically identical initial syllable, root vs. suffixed
words sharing the homophonous initial syllable). In contrast, for our stimuli there is no phonemic
disambiguation point in the items; the two competing word phrases have the exact same phonemic
makeup from the onset to the offset of the phrase, though they do differ allophonically in the
(VOT) interval immediately after stop release. It could be the case that the magnitude of the deac-
tivation process for the competitor (or the boosted activation for the target) is smaller for internal
juncture sequences compared to the cohorts that can be phonemically deactivated. Or, again, it
could have been due to the inhibitory effect from the semantically anomalous phrases. Thus, with
considerable ambiguity and a relatively subtle manipulation, the current stimulus design can only
546 Language and Speech 63(3)

test the gradient effects of acoustic properties in lexical access. In any case, segmenting the ambig-
uous sequences in connected speech, where the robust phonetic correlates of a juncture are absent,
appears not to be an easy task (e.g., de Jong et al., 2004).

5  Conclusions
In sum, this study elucidates the role of sub-phonemic information in online lexical ambiguity
resolution of continuous speech. Building on previous studies, the present study shows that
listeners are indeed sensitive to sub-phonemic information when identifying words in ambigu-
ous segmentation sequences such as “place kin” versus “play skin,” even in the context of con-
nected speech. The ambiguous /s/-stop sequences used in the study offer an opportunity to
investigate whether listeners are able to detect very subtle distinctions between allophones,
where the allophonic difference is restricted to an acoustic subinterval of a single segment.
Click responses show significant effects of VOT sensitivity: when asked to click on what they
heard, listeners choose the intended target over the competitor significantly more in an identity-
spliced condition than in a condition in which VOT only is cross-spiced. Moreover, we show
that stop release information, whether temporal or spectral, can (mis)guide listeners’ interpreta-
tion of the possible location of a word boundary between /s/ and a following stop, even when
other aspects in the acoustic signal (e.g., duration of /s/) point to the alternative segmentation.
These are novel findings given that these effects are obtained in the absence of large prosodic
boundaries or prominences and that the listeners’ reinterpretations of a word boundary location
are indeed triggered by the low-level subsegmental phonetic aspects (aspiration degree reflected
in acoustic stop release). As a whole, our study suggests that segmentation and lexical access
are highly attuned to bottom-up phonetic information. Our results have implications for a model
of spoken language recognition with position-specific representations available at the prelexi-
cal level and also allude to the possibility that detailed phonetic information may be stored in
the listeners’ lexicons.

Authors’ Note
Yoonjeong Lee is also affiliated with Department of Linguistics, University of Southern California, USA.

Acknowledgements
We are grateful to Dr. Dani Byrd and Dr. Jason Zevin for insightful comments and suggestions. We would
also like to convey our thanks to the editor and the four anonymous reviewers for their constructive feedback
and suggestions.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publica-
tion of this article: This work was supported by NIH DC003172. Manuscript revision was supported by NIH
DC01797 and NSF IIS-1704167.

ORCID iD
Yoonjeong Lee https://orcid.org/0000-0003-1323-049X

Notes
1. Note, however, that previous studies have shown listeners’ sensitivity to durational differences in seg-
ments that are not immediately adjacent to a word boundary (e.g., Miller & Liberman, 1979).
Lee et al. 547

2. Shatzman and McQueen (2006) examined the segmentation issue in ambiguous /s/-stop sequences in
Dutch, but their primary interest was in the durational property of /s/ at the juncture (i.e., pre- and post-
boundary /s/s), as unlike English aspiration degree of the stop after /s/ is not reliable information to
distinguish the ambiguous sequences in Dutch.

References
Andruski, J. E., Blumstein, S. E., & Burton, M. (1994). The effect of subphonetic differences on lexical
access. Cognition, 52(3), 163–187.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects
for subjects and items. Journal of Memory and Language, 59(4), 390–412.
Bartolotti, J., & Marian, V. (2012). Language learning and control in monolinguals and bilinguals. Cognitive
Science, 36(6), 1129–1147.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.
Journal of Statistical Software, 67(1).
Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonation structure in Japanese and English. Phonology,
3(01), 255–309.
Blazej, L. J., & Cohen-Goldberg, A. M. (2015). Can we hear morphological complexity before words are
complex? Journal of Experimental Psychology: Human Perception and Performance, 41(1), 50–68.
Browman, C. P., & Goldstein, L. M. (1986). Towards an articulatory phonology. Phonology Yearbook, 3,
219–252.
Byrd, D. (1993). 54,000 American stops. UCLA Working Papers in Phonetics, 83, 97–116.
Byrd, D. (1996). Influences on articulatory timing in consonant sequences. Journal of Phonetics, 24(2), 209–244.
Cho, T., Lee, Y., & Kim, S. (2014). Prosodic strengthening on the /s/-stop cluster and the phonetic implemen-
tation of an allophonic rule in English. Journal of Phonetics, 46(1), 128–146.
Christie, W. M. (1974). Some cues for syllable juncture perception in English. Journal of the Acoustical
Society of America, 55(4), 819–821.
Christophe, A., Peperkamp, S., Pallier, C., Block, E., & Mehler, J. (2004). Phonological phrase boundaries
constrain lexical access I. Adult data. Journal of Memory and Language, 51(4), 523–547.
Creel, S. C., & Tumlin, M. A. (2011). On-line acoustic and semantic interpretation of talker information.
Journal of Memory and Language, 65(3), 264–285.
Crystal, T. H., & House, A. S. (1982). Segmental durations in connected speech signals: preliminary results.
Journal of the Acoustical Society of America, 72(3), 705–716.
Dahan, D., Magnuson, J. S., Tanenhaus, M. K., & Hogan, E. M. (2001). Subcategorical mismatches and the
time course of lexical access: Evidence for lexical competition. Language and Cognitive Processes,
16(5–6), 507–534.
Davies, M. (2008). The Corpus of Contemporary American English (COCA): 560 million words, 1990–
present. Available online at https://Corpus.Byu.Edu/Coca/.
Davis, M. H., Marslen-Wilson, W. D., & Gaskell, M. G. (2002). Leading up the lexical garden path:
Segmentation and ambiguity in spoken word recognition. Journal of Experimental Psychology: Human
Perception and Performance, 28(1), 218–244.
de Jong, K. J., Lim, B.-J., & Nagao, K. (2004). The perception of syllable affiliation of singleton stops in
repetitive speech. Language and Speech, 47(3), 241–266.
Fowler, C. A. (1981). A relationship between coarticulation and compensatory shortening. Phonetica,
38(1–3), 35–50.
Freeman, J. B., & Ambady, N. (2010). MouseTracker: Software for studying real-time mental processing
using a computer mouse-tracking method. Behavior Research Methods, 42(1), 226–241.
Gårding, E. (1967). Internal Juncture in Swedish. Lund, Sweden: Gleerup.
Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: A distributed model of
speech perception. Language and Cognitive Processes, 12(5–6), 613–656.
Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition
memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(5), 1166–1183.
548 Language and Speech 63(3)

Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review,
105(2), 251–279.
Gout, A., Christophe, A., & Morgan, J. L. (2004). Phonological phrase boundaries constrain lexical access II.
Infant data. Journal of Memory and Language, 51(4), 548–567.
Gow, D. W., Jr. (2002). Does English coronal place assimilation create lexical ambiguity? Journal of
Experimental Psychology: Human Perception and Performance, 28(1), 163–179.
Hawkins, S. (2003). Roles and representations of systematic fine phonetic detail in speech understanding.
Journal of Phonetics, 31(3–4), 373–405.
Keating, P. A., Wright, R. A., & Zhang, J. (1999). Word-level asymmetries in consonant articulation. UCLA
Working Papers in Phonetics, 97(July 2014), 157–173.
Kiss, G. R., Armstrong, C., Milroy, R., & Piper, J. (1973). An associative thesaurus of English and its com-
puter analysis. The computer and literary studies. Edinburgh, UK: University Press.
Lehiste, I. (1960). An acoustic-phonetic study of internal open juncture. Phonetica, 5, 5–54.
Lenth, R., Singmann, H., Love, J., Buerkner, P., & Herve, M. (2018). Package “emmeans.” Mran.Microsoft.
Com, 34(1), 126.
Lisker, L., & Abramson, A. S. (1967). Some effects of context on voice onset time in English stops. Language
and Speech, 10, 1–28.
Luce, P. A., & Charles-Luce, J. (1985). Contextual effects on vowel duration, closure duration, and the conso-
nant/vowel ratio in speech production. Journal of the Acoustical Society of America, 78(6), 1949–1957.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear &
Hearing, 19(1), 1–36.
Mack, M. (2005). Voicing-dependent vowel duration in English and French: Monolingual and bilingual pro-
duction. Journal of the Acoustical Society of America, 71(1), 173–178.
Maddieson, I. (1985). Phonetic cues to syllabification. UCLA Working Papers in Phonetics, 59, 85–101.
Marslen-Wilson, W. D. (1990). Activation, competition and frequency in lexical access. Cognitive Models of
Speech Processing, 148–172.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech production. Cogitive Psychology,
18(1), 1–86.
McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2009). Within-category VOT affects recovery from “lexi-
cal” garden-paths: Evidence against phoneme-level inhibition. Journal of Memory and Language, 60(1),
65–91.
McQueen, J. M., Dahan, D., & Cutler, A. (2003). Continuity and gradedness in speech processing. In
N. O. Schiller & A. S. Meyer (Eds.), Phonetics and Phonology in Language Comprehension and
Production. Differences and Similarities. (pp. 39–78). Berlin: Mouton de Gruyter.
Miller, J. L., & Liberman, A. M. (1979). Some effects of later-occurring information on the perception of stop
consonant and semivowel. Perception & Psychophysics, 25(6), 457–465.
Munhall, K., Fowler, C. H., Hawkins, S., & Saltzman, E. (1999). “Compensatory shortening” in monosyl-
lables of spoken English. Journal of Phonetics, 20(2), 225–239.
Munhall, K., & Löfqvist, A. (1992). Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics,
20, 111–126.
Nakatani, L. H., & Dukes, K. D. (1977). Locus of segmental cues for word juncture. Journal of the Acoustical
Society of America, 62(3), 714–719.
Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52(3), 189–234.
Norris, D., McQueen, J. M., Cutler, A., & Butterfield, S. (1997). The possible-word constraint in the segmen-
tation of continuous speech. Cognitive Psychology, 34(3), 191–243.
Quené, H. (1992). Durational cues for word segmentation in Dutch. Journal of Phonetics, 20, 331–350.
R Core Team. (2018). R Development Core Team. R: A Language and Environment for Statistical Computing.
https://doi.org/http://www.R-project.org.
Salverda, A. P., Dahan, D., & McQueen, J. M. (2003). The role of prosodic boundaries in the resolution of
lexical embedding in speech comprehension. Cognition, 90(1), 51–89.
Shatzman, K. B., & McQueen, J. M. (2006). Segment duration as a cue to word boundaries in spoken-word
recognition. Perception & Psychophysics, 68(1), 1–16.
Lee et al. 549

Soto-Faraco, S., Sebastián-Gallés, N., & Cutler, A. (2001). Segmental and suprasegmental mismatch in lexi-
cal access. Journal of Memory and Language, 45(3), 412–432.
Spinelli, E., McQueen, J. M., & Cutler, A. (2003). Memory and language processing resyllabified words in
French. Journal of Memory and Language, 48(2), 233–254.
Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward phonological competitors.
In Proceedings of the National Academy of Sciences (Vol. 102, pp. 10393–10398).
Stetson, R. H. (1951). Motor phonetics: A study of speech movements in action. Oxford, UK: North-Holland
Publishing Co.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P. J. (1992). Segmental durations in the
vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America, 91(3), 1707–1717.
Yoshioka, H., Löfqvist, A., & Hirose, H. (1981). Laryngeal adjustments in the production of consonant clus-
ters and geminates in American English. Journal of the Acoustical Society of America, 70(6), 1615–1623.

You might also like